AutoML – using autosklearn in Python
I’ve written some previous posts about AutoML and how to use AutoML with Oracle OML4Py (part 1 and part 2) and AutoML UI.
Building upon these, in this post I’ll demonstrate how to use autosklearn Python Package to do something similar, using the same data set I used in my previous posts.
To install the package run the typical pip command
pip3 install auto-sklearn
I did have some challegenges with installing this package, and this seems to be common, with different people having slightly different issues. These mainly revolved around having to install/update the swiff and pyrfr Python packages. Once done, then autosklearn package installed.
Let’s do a simple test
import autosklearn print('autosklearn: %s' % autosklearn.__version__) autosklearn: 0.12.5
Just like in my previous examples, I’m just going to use autosklearn to build a Classification model, as that is what the data set is designed for.
from sklearn.metrics import accuracy_score
# define search
model = autosklearn.classification.AutoSklearnClassifier()
# perform the search
model.fit(X_train, y_train)
The code above is a very basic configuration, and if this is the first time you are going to run this, then DON’T. There are a lot of parameter you can set, with one of them being ‘time_left_for_this_task’. The default value for this parameter is 360, which is one hour. Not a good idea! Set this to being much lower, say for an initial run of 3-5 minutes. This should be enough time for it to build many different models. I like to set the time for this using a multiplier of 60 (seconds). That way you don’t have to do any calculations! Two other parameters to consider setting/changing are
- n_jobs: this is the number of jobs to run in parallel. Default is -1, which uses all processors, or set to to a number, eg. 4
- metric: what evaluation metric to use for the models. For classification we have, accuracy, balanced_accuracy, f1, f1_marco, f1_micro, f1_samples, f1_weighted, roc_auc, precision, precision_macro, precision_micro, precision_samples, precision_weighted, average_percision, recall, recall_macro, recall_micro, recall_samples, recall_weighted and log_loss. For regression problems, r2, mean_squared_error, mean_absolute_error and median_absolute_error
Using these parameters let’s run a search.
# define search
model2 = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=2*60,
n_jobs=-1,
metric=autosklearn.metrics.accuracy)
# perform the search
model2.fit(X_train, y_train)
Out[]: AutoSklearnClassifier(metric=accuracy, n_jobs=-1, per_run_time_limit=48,
time_left_for_this_task=120)
After about 2 minutes we explore the models.
print(model2.show_models())
[(0.520000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'mean', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'standardize', 'feature_preprocessor:__choice__': 'no_preprocessing', 'classifier:random_forest:bootstrap': 'True', 'classifier:random_forest:criterion': 'gini', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.5, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 1, 'classifier:random_forest:min_samples_split': 2, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.01},
dataset_properties={
'task': 1,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'classification',
'signed': False})),
(0.480000, SimpleClassificationPipeline({'balancing:strategy': 'none', 'classifier:__choice__': 'random_forest', 'data_preprocessing:categorical_transformer:categorical_encoding:__choice__': 'no_encoding', 'data_preprocessing:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer', 'data_preprocessing:numerical_transformer:imputation:strategy': 'most_frequent', 'data_preprocessing:numerical_transformer:rescaling:__choice__': 'standardize', 'feature_preprocessor:__choice__': 'feature_agglomeration', 'classifier:random_forest:bootstrap': 'True', 'classifier:random_forest:criterion': 'entropy', 'classifier:random_forest:max_depth': 'None', 'classifier:random_forest:max_features': 0.48846965177813817, 'classifier:random_forest:max_leaf_nodes': 'None', 'classifier:random_forest:min_impurity_decrease': 0.0, 'classifier:random_forest:min_samples_leaf': 1, 'classifier:random_forest:min_samples_split': 5, 'classifier:random_forest:min_weight_fraction_leaf': 0.0, 'data_preprocessing:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.01087424610670389, 'feature_preprocessor:feature_agglomeration:affinity': 'cosine', 'feature_preprocessor:feature_agglomeration:linkage': 'complete', 'feature_preprocessor:feature_agglomeration:n_clusters': 17, 'feature_preprocessor:feature_agglomeration:pooling_func': 'median'},
dataset_properties={
'task': 1,
'sparse': False,
'multilabel': False,
'multiclass': False,
'target_type': 'classification',
'signed': False})),
]
In this particular case it has evaluated two models and we can display some basic statistics about this process.
# summarize
print(model2.sprint_statistics())
auto-sklearn results:
Dataset name: ecd21bb4-912e-11eb-8af6-acde48001122
Metric: accuracy
Best validation score: 0.895218
Number of target algorithm runs: 12
Number of successful target algorithm runs: 2
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 10
Number of target algorithms that exceeded the memory limit: 0
It only had time to create and evaluate 2 models, returning the best model. This can use this model to evaluate results from the holdout test data set.
# evaluate best model
y_predictions = model2.predict(X_test)
acc = accuracy_score(y_test, y_predictions)
print("Accuracy: %.3f" % acc)
Accuracy: 0.900
Now change the run time to see how many extra models will be evaluated in the time. The following increases the run time from 2 to 3 minutes. The evaluation metric has been changed to the f1 score.
# define search
model3 = autosklearn.classification.AutoSklearnClassifier(time_left_for_this_task=3*60,
n_jobs=4,
metric=autosklearn.metrics.f1) #accuracy) #roc_auc f1)
# perform the search
model3.fit(X_train, y_train)
AutoSklearnClassifier(metric=f1, n_jobs=4, per_run_time_limit=72,
time_left_for_this_task=180)
The statistics tells us it evaluated 7 models, out of a target of 15.
# summarize
print(model3.sprint_statistics())
auto-sklearn results:
Dataset name: 752a4fc6-9135-11eb-8af6-acde48001122
Metric: f1
Best validation score: 0.473426
Number of target algorithm runs: 15
Number of successful target algorithm runs: 7
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 8
Number of target algorithms that exceeded the memory limit: 0
The output from the ‘show_models’ function is too long to show here, but you should run it to see the details.
There is a package/library called PipelineProfiler, which is a VERY useful tool for inspecting the various models created and evaluated in the above process. It allows us to see, for each model run, what steps and algorithms were part of it, and by clicking on one we get a flow chart of the pipleline. An example is shown below.
import PipelineProfiler
profiler_data= PipelineProfiler.import_autosklearn(model3)
PipelineProfiler.plot_pipeline_matrix(profiler_data)