'Is there a way to get log the descriptive stats of a dataset using MLflow?
Is there a way to get log the descriptive stats of a dataset using MLflow? If any could you please share the details?
Solution 1:[1]
Generally speaking you can log arbitrary output from your code using the mlflow_log_artifact() function. From the docs:
mlflow.log_artifact(local_path, artifact_path=None) Log a local file or directory as an artifact of the currently active run.
Parameters:
local_path – Path to the file to write. artifact_path – If provided, the directory in artifact_uri to write to.
As an example, say you have your statistics in a pandas dataframe, stat_df
.
## Write csv from stats dataframe
stat_df.to_csv('dataset_statistics.csv')
## Log CSV to MLflow
mlflow.log_artifact('dataset_statistics.csv')
This will show up under the artifacts section of this MLflow run in the Tracking UI. If you explore the docs further you'll see that you can also log an entire directory and the objects therein. In general, MLflow provides you a lot of flexibility - anything you write to your file system you can track with MLflow. Of course that doesn't mean you should. :)
Solution 2:[2]
There is also the possibility to log the artifact as an html file such that it is displayed as an (ugly) table in mlflow.
import seaborn as sns
import mlflow
mlflow.start_run()
df_iris = sns.load_dataset("iris")
df_iris.describe().to_html("iris.html")
mlflow.log_artifact("iris.html",
"stat_descriptive")
mlflow.end_run()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Adrien Pacifico |