This guide is the result of my own frustration with finding a holistic job of developing, evaluating, implementing models with AWS. All the guides, the guides I saw there, only cover part of the picture and never connect the dots completely. I wanted to write something that will help people understand the whole work involved in building and deploying a template so that UI developers can use it in their websites and applications.
So, let’s get started!
I have divided this guide into two parts.
1. Model development and evaluation using AWS Sagemaker Studio.
2. Implementing the model using AWS Lambda and REST APIs
· AWS Account – The cost of completing the entire tutorial is less than $ 0.50, so don’t worry.
· Understanding Python – Most machine learning work is done in Python today.
· Patience – Failure is the most important prerequisite for success, so try until it works.
We are setting up a project at Sagemaker Studio to build our development pipeline.
1. log in AWS account and select Sagemaker from the list of services.
2. Select Sagemaker Studio and use Quick start to Create a Studio |
When the Studio is Ready, Open Studio with the user you just created. It may take a few minutes to create the app, but once everything is set up, we can create our project. The thing to understand is that we can only create one studio, but multiple users in that studio, and each user can create multiple projects in the studio.
3. Select Sagemaker components and registers from the left navigator and select Create projects.
By default, Sagemaker provides models that can be used to build and evaluate host models. We use one such model and customize it to suit our use case.
4. Select MLOps model for model development, evaluation and implementation from the list and create a project.
Once the new project is created, you will find 2 pre-built archives. The first defines model development and evaluation, and the second builds your model into a package and places it at the endpoint for API consumption. In this guide, we will modify the first template for your own use.
5. Clone the first archive so we can edit the files we need.
The use case we are dealing with is one of the customer exchange models to predict if a customer is likely to stop subscribing to services in the future. Since the idea of this notebook is to learn how to develop and deploy a model in the cloud, I’m not going to go into data exploration and move directly to pipeline development.
This is the file structure of the just cloned archive. Let us go through some of the files we work with.
· Folder piping contains the file needed to create our model development tube, by default this tube is named abalone.
· pipeline.py defines the components of our piping, it is currently configured by default, but we are changing the code for our use case.
· preprocess.py and assessment.py specify the code we need to run for the pre-processing and evaluation steps.
· codebuild-buildspec-yml creates and orchestrates the pipe.
You can add more steps pipeline.py and corresponding processing files, the models also have a test the specified folder, and a test_pipelines.py a file that can be used to build a separate test tube.
6. Rename the folder abalone that customer exchange make a change codebuild-buildspec-yml file to match the change.
run-pipeline --module-name pipelines.customer_churn.pipeline
7. We need to download the data to our default AWS s3 bucket for consumption, we can do this on our laptop. Create a new notebook in the archive from the Studio File tab, select the kernel that contains the basic data python package, and paste the code below into the cell and run.
!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/synthetic/churn.txt ./
prefix = 'sagemaker/DEMO-xgboost-churn'
region = boto3.Session().region_name
default_bucket = sagemaker.session.Session().default_bucket()
role = sagemaker.get_execution_role()
RawData = boto3.Session().resource('s3')
print(os.path.join("s3://",default_bucket, prefix, 'data/RawData.csv'))
Now we need to edit the code inside pipeline.py, assessment.py and preprocess.py to meet our needs.
8. Copy the code from the guide link update the code in pipeline.py, preprocess.py and assessment.py but be sure to go through the code to better understand the details.
All ready, when we update the code in these three files, we are ready to run the first pipeline, but when we try to implement the CI / CD model, this will take care of it automatically once we have committed and pushed our code.
9. Select GIT tab in the page navigation bar and select the files you have edited to add to the staging area and make the changes forward.
Go now tube tab on the project page and select the pipe you created to check the performance, you should find one It worked work that was done automatically when we cloned an archive to which another would be Performing the space you just made by executing the code, double-click here to check the pipe diagram and more information.
Hurrah! Congratulations on completing your first training assignment.
Unless something goes wrong, you should see your work Succeeded, but remember, if it were easy, anyone would do it. Failure is the first step to success.
When the pipeline is complete, it creates the model and adds it to the model group because we have added the model acceptance condition as “Manual”, we need to select the model and approve it manually in order to create an endpoint that can be used for reasoning.
10. Mene Model group tab on the project home page and select the created template for you to review Metrics page to view the results of the evaluation phase.
11. If you are happy with the metrics, you can choose Adoption the option in the upper right corner to accept the model.
Here is when our second repository comes into the picture after you accept the model deployment pipe specified in the second repository that performs the new deployment and hosting ENDPOINT which allows us to draw conclusions about our API.
I’ve tried to keep this guide to the extent I use Sagemaker because it’s long anyway and part 2 is still coming. The goal is to give a quick overview of the different parts of Sagemaker by implementing a simple project. I suggest to readers that you don’t follow the instructions step by step and try your own ideas and steps, you often fail, but you learn a lot, and that’s the agenda. I hope you enjoy reading this guide as much as I have enjoyed compiling it. Feel free to drop any suggestions or feedback in the comments, you would like to hear them.
A Practical Guide to MLOps in AWS Sagemaker – Part II