## Gain a deep understanding of logistical and softmax regression by implementing them from scratch in the same style as Scikit-Learn

Hey everybody! Here is another article that implements machine learning algorithms from scratch. But wait, this is different!

In this ML From Scratch series, we create a library of machine learning algorithms similar to Scikit-Learn using object-oriented programming. This means you can install the library and play all the available templates in a style you are familiar with.

If you’ve never implemented any learning algorithms from scratch, I’ll give you three good reasons to follow and get started now:

1. You will learn in detail the specific nuances of each model, which will deeply increase your understanding.
2. You level your coding skills by documenting the code and creating test cases to ensure that all functions work as expected.
3. You will develop a better understanding of object-oriented programming to help you improve your machine learning project workflow.

You can view the Github repo visit here.

This project is intended to increase understanding of learning algorithms and is not intended to be used in actual data science work.

## Installation

Use classes and functions for testing create a virtual environment and install the project.

`    \$ pip install mlscratch==0.1.0`

To download all the source code from the local repository, create a virtual environment and run the following commands on your terminal.

`    \$ git clone https://github.com/lukenew2/mlscratch\$ cd mlscratch\$ python setup.py install`

Logistic regression is a general regression algorithm used in classification. It estimates the probability that an instance belongs to a particular category.

If the estimated probability is greater than or equal to 50%, the model predicts the occurrence to be in the positive category. If the probability is less than 50%, the model predicts a negative class.

## Probability assessment

Logistic regression works just like a linear regression model. It calculates the weighted sum of the property group (plus the pre-value term), but instead of printing directly like a linear regression model, it produces logistic of this result.

Logistics – noticed σ

Equation 2. Logistic function

Equation 3. Prediction of a logistic regression model

Module 1. activations.py

## Here we did a class and gave it one method. Suppose that this particular __call __ method behaves as a function when called. We will be using this feature soon when creating the Logistic Regression class.

Training and publishing

Now that we know all about how logistic regression estimates probabilities and makes predictions, let’s look at how it is trained. The aim of the training is to find the parameters Θ so that the model predicts high probabilities for positives and low probabilities for negatives. The publishing function captures this idealog loss

Equation 4. Logistic regression cost function (log loss) Because we use Gradient Descent in our implementation, we need to know the partial derivatives of the cost function with respect to the model parametersΘ

Equation 5. Partial derivatives of aid loss

This equation is very similar to the partial derivatives of the cost function of linear regression. For each instance, it calculates a prediction error and multiplies it by the instance property value, and then calculates the average of all occurrences.

When logistic regression is implemented by batch gradient descent, we do not want to average all occurrences. We want a gradient vector that contains all the partial derivatives so that we can update the parameters in the opposite direction to what they show.

Module 2. logistic.py

## That’s where we go! We have a fully functional logistic regression model capable of performing binary classification. Write a test case to make sure it works correctly.

Test method – binary classification of iris

Figure 1. The test passed 0.92 s

Now, if we ever make changes to our code for any reason, we always have a test case to make sure our changes didn’t break the pattern. Trust me, this will save you a lot of headaches later, especially as our code base grows. The logistic regression model we implemented only supports binary classification, but it can be generalized to support multiple categories. This is calledSoftmax regression

. The idea is simple: The Softmax regression model calculates a score for each instance for each class and then estimates the probability that the instance belongs to each class by applying softmax function

Equation 6. Softmax function (not normalized)

• In this equation:
• K is the number of categories.

s (x) is a vector containing the points of each class for example x.

## Just like the logistic regression classifier, the Softmax regression classifier predicts the class with the highest estimated probability.

Practical issues: Numerical stability Implementing the Softmax function from scratch is a bit tricky. When you divide exponents that can be very large, you run into a questionnumerical stability

Equation 7. Softmax function (normalized)

Module 4. activations.py

## Here, the softmax class was added to the same module as our sigmoid class using the __call__ method, so our class behaves as a function when called.

Training and publishing Let us now consider the training of the Softmax regression model and its cost function. The idea is the same as logistic regression. We want a model that predicts high probabilities for the target category and low probabilities for other categories. The publishing function captures this ideacross entropy

Equation 8. Crossed entropy cost function

• In this equation:
• y is the target probability that the instance belongs to the target category. In our case, this value is always equal to 1 or 0, depending on whether the instance belongs to a class or not.

In the case of two classes, the cross-entropy corresponds to the log loss. Because we implement Softmax regression with Gradient Descent, we need the gradient vector of this cost function.Θ

Equation 9. Cross entropy gradient vector

When calculating gradients, we need our target class (y) to have the same dimension as the estimated probabilities (p). Our estimated probabilities are in the form (n_samples, n_categories) because for each occurrence we have a probability associated with that instance that belongs to each class.[1, 2, 0]For example, if we have three classes labeled 0, 1, and 2, we need to convert a target vector that contains (

`([[0, 1, 0],[0, 0, 1],[1, 0, 0]])`

) into a table that looks like this: The first column represents class 0, the second column represents class 1, and the third column represents class 2. You may have noticed that we can achieve this by doing One hot coding

Module 5. preprocessing.py

Module 6. logistic.py

## And that’s it! We have successfully added Softmax regression to the machine learning library. Let’s take a quick look at a test case to make sure it’s working properly.

Test case – Iris multi-category classification

Figure 2. Test passed in 1.83 seconds

We are ready! Our test case passed. We successfully added Softmax Regression to the machine learning library.

• This concludes this part of the ML From Scratch series. We now have a machine learning library that includes the most common regression and classification models, including:
• Linear regression
• Polynomial regression
• Ridge
• Lasso
• Elastic mesh
• Logistic regression

Softmax regression