The AutoML application you built today has less than 200 lines of code on exactly 171 lines.
1.1. Technical stacks
The web application is built in Python using the following libraries:
streamlit– network frame
pandas– handles data frames
numpynumerical data processing
base64– data to be coded for download
scikit-learn– perform optimization of hyperparameters and build a machine learning model
1.2. User interface
The web application has a simple interface consisting of two panels: (1) The left panel accepts the entered CSV data and parameter settings, while (2) the main panel displays the result consisting of printing the input data set frame, model performance meter, best parameters for hyperparameter tuning as well as the 3D outline of the tuned hyperparameters.
1.3. Introduction to AutoML
Let’s take a look at the web app according to the two screenshots below so you can feel the app you’re building.
1.3.1. An AutoML application using an example data set
The easiest way to try out a web application is to click on the set of sample data provided
Press to use Example Dataset button on the main panel that loads the Diabetes dataset as an example file.
1.3.2. An AutoML application that uses downloaded CSV data
Alternatively, you can also upload your own CSV datasets by either dragging and dropping the file directly into the upload box (as shown in the screenshot below) or by clicking
Browse files button and select the input file to download.
In both of the above screenshots, when either an example file or a downloaded CSV data set is delivered, the application prints the data set data frame, automatically builds multiple machine learning models using the included input training parameters to optimize hyperparameters, followed by print model performance metrics. Finally, the interactive 3D outline of the tuned hyperparameters is shown at the bottom of the main panel.
You can also test run the application by clicking the following link:
Let’s now dive into the internal workings of the AutoML application. As you can see, the entire application uses only 171 lines of code.
Note that all comments in the code (marked on a line that contains hash symbols)
#) is used to improve the readability of the code by documenting what each block of code does.
Import the necessary libraries that consist of
set_page_config() allows us to specify the title of the web page
page_title=‘The Machine Learning Hyperparameter Optimization App’ as well as setting the page layout to full-width mode
layout=’wide’ input argument.
Here we use the st.write () function in conjunction with the markup syntax, we write the title text of the web page on line 20 by making
# tag in front of the title text
The Machine Learning Hyperparameter Optimization App. In the following lines, we write a description of the web application.
These blocks of code are associated with left-panel input widgets that accept user-entered CSV information and template parameters.
- Lines 29–33 – Line 29 prints the title text for the left sidebar panel
st.sidebar.header()a function in which the sidebar of the function dictates the location of the input widget, which it should place in the left sidebar. Line 30 accepts the CSV data entered by the user
st.sidebar.file_uploader()function. As we can see, there are 2 input arguments, the first of which is the text tag
Upload your input CSV filewhile the second input argument
type=[“csv”]restricts the acceptance of CSV files only. Lines 31-33 print a link to the example data set in Markdown syntax
- Line 36 – Prints the title text
- Line 37 shows the slider bar
st.sidebar.slider()a function where it allows the user to set the data sharing ratio simply by adjusting the slider. The first input argument prints the widget label text
Data split ratio (% for Training Set)where the next 4 values represent the minimum value, the maximum value, the default value, and the step size. Finally, the specified value is determined
- Lines 39-47 show the input widgets for learning parameters, while lines 49-54 show the input widgets for general parameters. Like the explanation of line 37, these lines of code also use code
st.sidebar.slider()as an input widget for accepting user-defined values for model parameters. Lines 56-58 combine the user-defined value from the slider input into the merged format, where it then acts as the input
GridSearchCV()function responsible for tuning hyperparameters.
Subtitle text that says
Dataset is added above the input data frame
This block of code encodes and decodes the performance of the model
base64 library as a downloadable CSV file.
At a high level, this block of code is
build_model() a custom function that takes input data and performs model building and hyperparameter tuning along with user-defined parameters.
- Lines 76-77 – The entered data frame is separated into sections
X(deletes the last column, which is a Y variable) and
Y(select the last column in particular) variables.
- Line 79 – Here w informs the user
st.markdown()the function that the model is built on. Then, in line 80, the column name of the Y variable is printed inside the data box
- Line 83 – Data sharing is performed
Yvariables as input data, while a user-defined value for the split ratio is specified
split_sizea variable that takes its value from the slider described in line 37.
- Lines 87–95 – Transmits a random forest model
RandomForestRegressor()function that is specified
rfvariable. As you can see, all the parameters of the model as defined
RandomForestRegressor()function takes its parameter values from input widgets that are defined by the user, as described above on lines 29-58.
- Lines 97-98 – Performs hyperparameter tuning.
→ Line 97 – Random forest model above
rfthe variable is specified as an input argument
GridSearchCV()a function that performs tuning of hyperparameters. The value range of the hyperparameters to be examined in the tuning of the hyperparameters is defined
param_grida variable that in turn takes a value directly from a user-defined value obtained from the slider (lines 40-43) and preprocessed
param_gridvariable (lines 56-58).
→ Line 98 – The hyperparameter tuning process begins with input
- Line 100 – Print the Model Performance footer text
st.subheader()function. The following lines then print the model’s performance metrics.
- Line 102 – Best model of the hyperparameter tuning process saved
gridthe variable is used to make predictions
- Lines 103-104 – Prints the R2 score
Y_pred_testas an input argument.
- Lines 106-107 – Prints the MSE score
Y_pred_testas an input argument.
- Lines 109-110 – Prints the best parameters and rounds to two decimal places. The best parameter values are obtained
- Line 112-113 – Line 112 prints the subheading
st.subheader()function. Line 113 prints the model parameters stored in the model
- Lines 116-125 – Model performance indicators are obtained
→ Line 116 – We intend to selectively extract data
grid.cv_results_which is used to create a data frame containing 2 combinations of hyperparameters and their corresponding performance meters, which in this case are R2 points. Specially
pd.concat()function is used to combine two hyperparameters (
params) and a performance meter (
→ Line 118 – Data formatting is now performed to prepare the data in a suitable format to create a shape graph. Specially
pandasthe library is used for the literal grouping of the data frame in column 2 (
n_estimators), in which case the content of the first column (
max_features) is merged.
- Rows 120-122 – The data is now translated into a m ⨯ n matrix, so that the rows and columns match
- Lines 123-125 – Finally, edit the data to match
zvariables, which are then used to form the outline.
- Lines 128-146 – These blocks of code now create a 3D outline using a graph
- Lines 149-152 –
zthe variables are then combined into a
- Line 153 – The model performance results stored in the variable grid_results are now available for download
filedownload()custom function (lines 69-73).
At a high level, these blocks of code execute the logic of the application. This consists of two blocks of code. The first is
if code block (lines 156-159) and the other is
else code block (lines 160-171). Whenever a web application loads, it runs by default
else code block when
if the code block is activated when the entered CSV file is loaded.
For both code blocks, the logic is the same, the only difference is
df data dataframe (whether it is from outgoing CSV data or sample data). Next, the content
df the data frame is displayed
st.write() function. Finally, the model building process begins
build_model() custom function.
Now that we’ve coded the app, let’s continue launching it.
3.1. Create a conda environment
Let’s start by creating a new one
conda environment (to ensure code reproducibility).
First, create a new one
conda called the environment
automl as follows on the terminal command line:
conda create -n automl python=3.7.9
Second, we log in
automl in the neighborhood of
conda activate automl
3.2. Install the required libraries
Second, install the libraries as shown below
pip install -r requirements.txt
3.3. Download application files
You can either download web application files hosted in the Data Professor GitHub repo, or you can also use the 171 lines of code above.
Next, extract the contents of the file
main through the directory
Now that you’re inside
main directory, you should be able to see
3.4. Launch the web application
Launch the application by typing the following commands on the terminal (that is, make sure that
ml-opt-app.py file is in the current working directory):
streamlit run ml-opt-app.py
After a few seconds, the next message on the terminal.
> streamlit run ml-opt-app.pyYou can now view your Streamlit app in your browser.Local URL: http://localhost:8501
Network URL: http://10.0.0.11:8501
Finally, the browser should open and the application will appear.
You can also test AutoML from the following link:
Now that you have created an AutoML application as described in this article, what next? You may be able to modify the application for another machine learning algorithm. Additional features, such as feature priority drawing, can also be added to the application. The possibilities are endless and fun to customize the app! Drop a comment about how you have customized the app for your own projects.