Gain a deep understanding of logistical and softmax regression by implementing them from scratch in the same style as Scikit-Learn

Cover Image – by Luke Newman

Hey everybody! Here is another article that implements machine learning algorithms from scratch. But wait, this is different!

In this ML From Scratch series, we create a library of machine learning algorithms similar to Scikit-Learn using object-oriented programming. This means you can install the library and play all the available templates in a style you are familiar with.

If you’ve never implemented any learning algorithms from scratch, I’ll give you three good reasons to follow and get started now:

  1. You will learn in detail the specific nuances of each model, which will deeply increase your understanding.
  2. You level your coding skills by documenting the code and creating test cases to ensure that all functions work as expected.
  3. You will develop a better understanding of object-oriented programming to help you improve your machine learning project workflow.

You can view the Github repo visit here.

This project is intended to increase understanding of learning algorithms and is not intended to be used in actual data science work.


Use classes and functions for testing create a virtual environment and install the project.

    $ pip install mlscratch==0.1.0

To download all the source code from the local repository, create a virtual environment and run the following commands on your terminal.

    $ git clone
$ cd mlscratch
$ python install

Logistic regression is a general regression algorithm used in classification. It estimates the probability that an instance belongs to a particular category.

If the estimated probability is greater than or equal to 50%, the model predicts the occurrence to be in the positive category. If the probability is less than 50%, the model predicts a negative class.

Probability assessment

Logistic regression works just like a linear regression model. It calculates the weighted sum of the property group (plus the pre-value term), but instead of printing directly like a linear regression model, it produces logistic of this result.

Equation 1. Estimated probability of logistic regression

Logistics – noticed σ

is a sigmoid function that converts its product to a number between 0 and 1.

Equation 2. Logistic function

Once the model has estimated the probability that an instance falls into a positive category, it simply makes its prediction based on the following rules.

Equation 3. Prediction of a logistic regression model

For implementation, we need to create a sigmoid function that can transform our input into probabilities. Let’s see how this is done.

Module 1.

Here we did a class and gave it one method. Suppose that this particular __call __ method behaves as a function when called. We will be using this feature soon when creating the Logistic Regression class.

Training and publishing

Now that we know all about how logistic regression estimates probabilities and makes predictions, let’s look at how it is trained. The aim of the training is to find the parameters Θ so that the model predicts high probabilities for positives and low probabilities for negatives. The publishing function captures this idealog loss


Equation 4. Logistic regression cost function (log loss) Because we use Gradient Descent in our implementation, we need to know the partial derivatives of the cost function with respect to the model parametersΘ


Equation 5. Partial derivatives of aid loss

This equation is very similar to the partial derivatives of the cost function of linear regression. For each instance, it calculates a prediction error and multiplies it by the instance property value, and then calculates the average of all occurrences.

When logistic regression is implemented by batch gradient descent, we do not want to average all occurrences. We want a gradient vector that contains all the partial derivatives so that we can update the parameters in the opposite direction to what they show.

Let’s go to implementation. Read the instructions to understand what each code does.

Module 2.

That’s where we go! We have a fully functional logistic regression model capable of performing binary classification. Write a test case to make sure it works correctly.

Test method – binary classification of iris

In this test case, we use the famous iris data set and convert it to a binary classification problem, perform a train / test division, perform some preprocessing, and then school our logistic regression model and test over 80% accuracy in a test series.
Module 3.

Figure 1. The test passed 0.92 s

Now, if we ever make changes to our code for any reason, we always have a test case to make sure our changes didn’t break the pattern. Trust me, this will save you a lot of headaches later, especially as our code base grows. The logistic regression model we implemented only supports binary classification, but it can be generalized to support multiple categories. This is calledSoftmax regression

. The idea is simple: The Softmax regression model calculates a score for each instance for each class and then estimates the probability that the instance belongs to each class by applying softmax function


Equation 6. Softmax function (not normalized)

  • In this equation:
  • K is the number of categories.

s (x) is a vector containing the points of each class for example x.

Just like the logistic regression classifier, the Softmax regression classifier predicts the class with the highest estimated probability.

Practical issues: Numerical stability Implementing the Softmax function from scratch is a bit tricky. When you divide exponents that can be very large, you run into a questionnumerical stability

. To avoid this, we use the normalization trick. Note that multiplying the top and bottom of a fraction by a constant C and pushing it into the sum yields the following corresponding expression:

Equation 7. Softmax function (normalized)

We are free to choose the value of C, but the general choice is to set log (C) equal to the maximum negative of Example x. This shifts the values ​​so that the maximum value is zero. Let’s look at this in the code:

Module 4.

Here, the softmax class was added to the same module as our sigmoid class using the __call__ method, so our class behaves as a function when called.

Training and publishing Let us now consider the training of the Softmax regression model and its cost function. The idea is the same as logistic regression. We want a model that predicts high probabilities for the target category and low probabilities for other categories. The publishing function captures this ideacross entropy


Equation 8. Crossed entropy cost function

  • In this equation:
  • y is the target probability that the instance belongs to the target category. In our case, this value is always equal to 1 or 0, depending on whether the instance belongs to a class or not.

In the case of two classes, the cross-entropy corresponds to the log loss. Because we implement Softmax regression with Gradient Descent, we need the gradient vector of this cost function.Θ


Equation 9. Cross entropy gradient vector

When calculating gradients, we need our target class (y) to have the same dimension as the estimated probabilities (p). Our estimated probabilities are in the form (n_samples, n_categories) because for each occurrence we have a probability associated with that instance that belongs to each class.[1, 2, 0]For example, if we have three classes labeled 0, 1, and 2, we need to convert a target vector that contains (

([[0, 1, 0],
[0, 0, 1],
[1, 0, 0]])

) into a table that looks like this: The first column represents class 0, the second column represents class 1, and the third column represents class 2. You may have noticed that we can achieve this by doing One hot coding

target vector (y). Let’s look at this in the code:

Module 5.

Now we have all the help we need to complete our Softmax regression model. The code is well documented, so read the explanations of what each part does.

Module 6.

And that’s it! We have successfully added Softmax regression to the machine learning library. Let’s take a quick look at a test case to make sure it’s working properly.

Test case – Iris multi-category classification

We use iris data, but this time try to classify all three categories with an accuracy of at least 85 percent. First, the data set is loaded, the division is practiced / tested, standardization is performed, and then our Softmax regression model is trained and the test set is predicted.
Module 7.

Figure 2. Test passed in 1.83 seconds

We are ready! Our test case passed. We successfully added Softmax Regression to the machine learning library.

  • This concludes this part of the ML From Scratch series. We now have a machine learning library that includes the most common regression and classification models, including:
  • Linear regression
  • Polynomial regression
  • Ridge
  • Lasso
  • Elastic mesh
  • Logistic regression

Softmax regression

As always, thanks for reading and your feedback is greatly appreciated!


Please enter your comment!
Please enter your name here