Companies are very interested in clearly communicating ML-based proactive analysis to their customers. No matter how accurate the model is, customers want to know how machine learning models make predictions about data. For example, if an order-based company is interested in looking for customers who are at high risk of canceling their order, they can use their previous customer data to predict the likelihood of someone leaving.

That’s where they wantalyse the factors that drive this event. By understanding the guiding factors, they can act, for example, through targeted offers or discounts to prevent a customer from leaving. Using machine learning models in decision making is difficult without understanding the factors that affect any outcome.

A common way for companies to communicate data insights and the results of machine learning models is through analytics dashboards. Tools like Tableau, Alteryx, or even a custom tool that uses mesh frames like Django or Flask make it easy to create these aggregate screens.

In practice, however, creating these types of dashboards is often very expensive and time consuming. So a good alternative to more traditional approaches is Streamlit. Streamlit is a Python-based library that allows you to easily create free machine learning applications. You can easily read and interact with a saved template with an intuitive and user-friendly interface. It allows you to display descriptive text and model outputs, visualize data and model performance, edit model inputs through the interface using sidebars, and more.

All in all, Streamlit is an easy-to-learn framework that allows data processing teams to create free web applications for proactive analysis in just a few hours. Streamlit Gallery shows many open source projects that have used it for analytics and machine learning. You will also find the Streamlit documentation here.

Thanks to its ease of use and versatility, you can use Streamlit to take advantage of a variety of data. This includes information from research data (EDA), outcomes of supervised learning models such as classification and regression, and even insights into unsupervised learning models. For our purpose, we consider a classification task to predict whether or not a customer will stop making purchases with a company. We use fiction Telco churn information for this project.

The work of this post has been inspired by the Streamlit guide Data Professor YouTube channel. A tutorial can be found here. A Medium article version can also be found in the guide here.

Building and saving a classification model

We start by building and saving a simple hoof classification model using random forests. To get started, create a folder in the terminal with the following command:

mkdir my_churn_app

Next, change directories to a new folder:

cd my_churn_app

Now use a text editor to create a new Python script called churn-model.py. Here I use the vi text editor:

vi churn-model.py

Now bring a few packages. We work with Scandit-learn and Pickle’s RandomForestClassifier Pandos:

import pandas as pdfrom sklearn.ensemble import RandomForestClassifierimport pickle

Now let’s relax the display constraints on Pandas data frame rows and columns, then read and display the data:

pd.set_option(‘display.max_columns’, None)pd.set_option(‘display.max_rows’, None)df_churn = pd.read_csv(‘telco_churn.csv’)print(df_churn.head())

Filter our data frame to include only the Gender, Payment Method, Monthly Fees, Access, and Change columns. The first four of these columns are fed into our classification model, and our output is ‘Churn’:

pd.set_option(‘display.max_columns’, None)pd.set_option(‘display.max_rows’, None)df_churn = pd.read_csv(‘telco_churn.csv’)df_churn = df_churn[[‘gender’, ‘PaymentMethod’, ‘MonthlyCharges’, ‘tenure’, ‘Churn’]].copy()print(df_churn.head())

Next, save a copy of our data frame to a new variable called df and replace the missing values ​​with zero:

df = df_churn.copy()df.fillna(0, inplace=True)

Next, create machine-readable dummy variables for our categorical columns Gender and Payment Method:

encode = [‘gender’,’PaymentMethod’]for col in encode:dummy = pd.get_dummies(df[col], prefix=col)df = pd.concat([df,dummy], axis=1)del df[col]

Next, the values ​​in the churn column are mapped to binary values. We map the saturation value from Yes to one and No to zero:

import numpy as npdf[‘Churn’] = np.where(df[‘Churn’]==’Yes’, 1, 0)Now, let’s define our input and output :X = df.drop(‘Churn’, axis=1)Y = df[‘Churn’]

We then define a RandForestClassifier instance and fit our model to the data:

clf = RandomForestClassifier()clf.fit(X, Y)

Finally, we can save our model to a Pickle file:

pickle.dump(clf, open(‘churn_clf.pkl’, ‘wb’))

Now run the Python script in the terminal with the following command:

python churn-model.py

This should create a file called in the churn_clf.pkl folder. This is a saved template.

Next, install Streamlit on the terminal with the following command:

pip install streamlit

Configure a new Python script called churn-app.py. We use this file to run our Streamlit application:

vi churn-app.py

Bring more libraries now. We bring Streaml, Pandas, Numpy, Pickle, Base64, Seaborn and Matplotlib:

import streamlit as stimport pandas as pdimport numpy as npimport pickleimport base64import seaborn as snsimport matplotlib.pyplot as plt

The text is displayed

The first thing we go through is how to add text to our application. We do this using the Streamlit object write method. Creating an app title called the Churn Prediction App:

We can run our applications locally with the following command:

streamlit run churn-app.py

We should see this:

From the drop-down menu in the top right corner of our app, we can change the theme from dark to light:

Now our app should look like this:

Finally, add a little more descriptive text to the UI and run the application again:

st.write(“””# Churn Prediction AppCustomer churn is defined as the loss of customers after a certain period of time. Companies are interested in targeting customerswho are likely to churn. They can target these customers with special deals and promotions to influence them to stay withthe company.This app predicts the probability of a customer churning using Telco Customer data. Herecustomer churn means the customer does not make another purchase after a period of time.“””)

Allow users to download data

The next thing we can do is customize our app so that users can download the information that trained their model. This is useful for performing analyzes that are not supported by the application. To do this, read our information first:

df_selected = pd.read_csv(“telco_churn.csv”)df_selected_all = df_selected[[‘gender’, ‘Partner’, ‘Dependents’, ‘PhoneService’,’tenure’, ‘MonthlyCharges’, ‘target’]].copy()

The following is the function that allows us to download the read data:

def filedownload(df):    csv = df.to_csv(index=False)    b64 = base64.b64encode(csv.encode()).decode() # strings <->   bytes conversions    href = f’<a href=”data:file/csv;base64,{b64}”     download=”churn_data.csv”>Download CSV File</a>’    return href

Next, set the showPyplotGlobalUse removal warning to false.

st.set_option(‘deprecation.showPyplotGlobalUse’, False)st.markdown(filedownload(df_selected_all), unsafe_allow_html=True)

And when we run our application again, we should see the following:

Numeric input slider and categorical input check box

Another useful thing we can do is create input sidebars for users to change their input values ​​and see how it affects the likelihood of changes. To do this, define a function called user_input_features:

def user_input_features():    pass

Next, create a sidebar for the category columns Gender and Payment Method.

In the categorical columns, we call the Selectbox method in the sidebar object. This first argument of the Selectbox method is the name of the categorical column:

def user_input_features():    gender = st.sidebar.selectbox(‘gender’,(‘Male’,’Female’))    PaymentMethod = st.sidebar.selectbox(‘PaymentMethod’,(‘Bank transfer (automatic)’, ‘Credit card (automatic)’, ‘Mailed check’, ‘Electronic check’))    data = {‘gender’:[gender],‘PaymentMethod’:[PaymentMethod],}    features = pd.DataFrame(data)    return features

Call our function and save the return value as a variable called input:

input_df = user_input_features()

Let’s run our application now. We should see a drop-down menu option Gender and Payment Method:

This technique is effective because users can choose different methods and see how much more likely a customer is to switch based on a payment method. For example, if bank transfers lead to a higher probability, a company can create targeted messages for these customers by encouraging them to change their payment method. They may also decide to provide some form of financial incentive to change their payment method. The thing is, these types of insights can drive companies to make decisions, allowing companies to better retain customers.

We can also add MonthlyCharges and tenancy:

def user_input_features():    gender = st.sidebar.selectbox(‘gender’,(‘Male’,’Female’))    PaymentMethod = st.sidebar.selectbox(‘PaymentMethod’,(‘Bank   transfer (automatic)’, ‘Credit card (automatic)’, ‘Mailed check’, ‘Electronic check’))    MonthlyCharges = st.sidebar.slider(‘Monthly Charges’, 18.0,118.0, 18.0)    tenure = st.sidebar.slider(‘tenure’, 0.0,72.0, 0.0)    data = {‘gender’:[gender],‘PaymentMethod’:[PaymentMethod], ‘MonthlyCharges’:[MonthlyCharges],‘tenure’:[tenure],}    features = pd.DataFrame(data)    return featuresinput_df = user_input_features()

The next thing we can do is show the output of our model. To do this, we must first set the default input and output if the user does not select anything. We can add a user input function to the if / else statement, which says use the default input if the user does not specify an input. Here we also allow the user to upload a CSV file containing the input values ​​using the sidebar method file_uploader ():

uploaded_file = st.sidebar.file_uploader(“Upload your input CSV file”, type=[“csv”])if uploaded_file is not None:    input_df = pd.read_csv(uploaded_file)else:    def user_input_features():        … #truncated code from above        return featuresinput_df = user_input_features()

Next, we need to show the output of our model. The default parameters are displayed first. We read about our data:

churn_raw = pd.read_csv(‘telco_churn.csv’)churn_raw.fillna(0, inplace=True)churn = churn_raw.drop(columns=[‘Churn’])df = pd.concat([input_df,churn],axis=0)Encode our features:encode = [‘gender’,’PaymentMethod’]for col in encode:    dummy = pd.get_dummies(df[col], prefix=col)    df = pd.concat([df,dummy], axis=1)    del df[col]df = df[:1] # Selects only the first row (the user input data)
df.fillna(0, inplace=True)

Select the features we want to display:

features = [‘MonthlyCharges’, ‘tenure’, ‘gender_Female’, ‘gender_Male’,‘PaymentMethod_Bank transfer (automatic)’,‘PaymentMethod_Credit card (automatic)’,‘PaymentMethod_Electronic check’, ‘PaymentMethod_Mailed check’]df = df[features]

Finally, we show the default income using the spelling:

# Displays the user input featuresst.subheader(‘User Input features’)print(df.columns)if uploaded_file is not None:    st.write(df)else:    st.write(‘Awaiting CSV file to be uploaded. Currently using example input parameters (shown below).’)    st.write(df)

Now we can make predictions and display them with either default or user inputs. First, we need to read the saved template that is in the Pickle file:

load_clf = pickle.load(open(‘churn_clf.pkl’, ‘rb’))

Create binary points and prediction probabilities:

prediction = load_clf.predict(df)prediction_proba = load_clf.predict_proba(df)

And write the result:

churn_labels = np.array([‘No’,’Yes’])st.write(churn_labels[prediction])st.subheader(‘Prediction Probability’)st.write(prediction_proba)

We see that new male customers with a monthly fee of $ 18 and using bank transfer as a payment method have a 97 percent probability of staying with the company. We have now manufactured our applications. The next thing we do is install it on a live website using Heroku.

Deploying the application

Deploying web applications is another time consuming and expensive step in the ML tube. Heroku makes it quick and easy to deploy web applications. First we need to add a few extra files to the application folder. Add the setup.sh file and Procfile. Streamlit and Heroku define the environment with these files before running the application. Create a new file named setup.sh in the application folder in the terminal:

vi setup.sh

Attach the following file to the copy:

mkdir -p ~/.streamlit/echo “[server]nport = $PORTnenableCORS = falsenheadless = truenn“ > ~/.streamlit/config.toml

Save and leave the file. The next thing we need to create is Procfile.

vi Procfile

Copy and paste the following file:

web: sh setup.sh && streamlit run churn-app.py

Finally, we need to create a requirement.txt file. We will add package versions to the libraries used there:

streamlit==0.76.0numpy==1.20.2scikit-learn==0.23.1matplotlib==3.1.0seaborn==0.10.0

To check for package versions, do the following on the terminal:

pip freeze

We are now ready to deploy our application. Follow these steps to install:

  1. To get started, log in to your Github account, if you have one. If not, create a Github account first.
  2. In the left panel, click the green New button next to It, which says Inventories.
  3. Create a name for the archive. {yourname} churn should be fine. For me, it would be a sadrach-churn app.
  4. Click the Download existing file link and click Select Files.
  5. Add all the codecrew_churn_app-main files to the repo and click Commit.
  6. Go to Heroku.com and create an account.
  7. Log in to your account.
  8. Click the New button in the upper right corner and click Create New Application.
  9. You can name the application by any name you want. I named my app as follows: {name} -churn-app. That is: sadrach-churn-app and click Create App.
  10. In the deployment method, select GitHub
  11. Connect the GitHub repo.
  12. Log in and copy and paste the name of your repo. Click Find and Connect.
  13. Scroll down to Manual Deployment and click Enable Branch.
  14. Wait a few minutes and your app should show up!

You can find a version of the churn app here and the GitHub archive here.

Conclusions

Streamlit is a powerful library that enables fast and easy deployment of machine learning and data applications. It allows developers to create intuitive interfaces for machine learning models and data analytics. In machine learning model predictions, this means greater explanability and transparency of the model, which can help companies make decisions. A well-known problem that companies face in many machine learning models is that regardless of accuracy, there must be an intuitive explanation of which factors drive events.

Streamlit offers many possibilities for model explanability and interpretation. Sidebar objects allow developers to create easy-to-use sliders that allow users to edit numeric input values. It also provides a check box method that allows users to see how changing categorical values ​​affects event predictions. The file download method allows users to download the feed as a csv file and display model predictions later.

Although our application focused on the blend classification model, Streamlit can be used for other types of machine learning models in both supervised and unsupervised. For example, building a similar web application for a regression machine learning model, such as housing price forecasting, would be relatively simple. In addition, you can use the streamlit application to develop an interface to an unattended learning tool that uses methods such as K-means or hierarchical clustering. Finally, Streamlit is not limited to machine learning. You can use the streamlit function for any data analysis task, such as data visualization and retrieval.

In addition to direct user interface development, using Streamlit and Heroku takes a lot of effort to deploy web applications. As we saw in this article, we can deploy a real-time machine learning web application in a few hours compared to the traditional approach months.

If you are interested in learning the basics of python programming, data processing with Panda, and machine learning in Python, check Python for Data Science and Machine Learning: Python Programming, Pandas and Scikit-Learn Guides for Beginners. I hope you found this post useful / interesting.

This post was originally published BuiltIn blog. The original song can be found here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here