Factor Natassha Selvaraj, Data Scientist

Picture: Jo Szczepanska on Loosen

As an aspiring data scientist, you must have heard advice “do data processing projects“More than a thousand times.

Computer science projects are not only a great learning experience, but they also help stand out from the crowd of computing enthusiasts who want to break into the field.

However, not all computing projects will help your resume stand out. In fact, listing the wrong projects in a portfolio can do more harm than good.

In this article, I am going to go through the projects that are mandatory in your resume.

I’ll give it to you too example data sets try out each project along with its associated tutorials to help you complete the project.

Skill 1: Data collection

Picture: James Harrison on Loosen

Data collection and pre-processing is one of the most important skills he has as a data scientist.

In my computer science work, most of my work involves collecting and cleaning data in Python. Once we understand the business requirement, we need to have access to relevant information on the Internet.

This can be done through APIs or web buddies. Once this is done, the data must be cleaned and stored in data frames in a format that can be entered as input to the machine learning model.

This is the most time consuming part of a data scientist’s job.

I suggest you demonstrate your skills in data collection and pre-processing by implementing the following projects:

Web Scraping – Food Assessment Site

Teaching program: Zomato Web Scraping with BeautifulSoup

Language: Python

Gathering reviews from the Food Distribution website is an interesting and practical project that is on your resume.

Simply create a web scraper so you can gather all the review information from all the websites on this site and store it in a data frame.

If you want to take this project a step further, you can use the data collected to gather an opinion analysis model and categorize which of these reviews are positive and which are negative.

Next time you’re looking for something to eat, choose a restaurant with ratings that have the best overall opinions.

Web Scraping – An online course site

Teaching program: Build a web scraper with Python in 8 minutes

Language: Python

Want to find the best online course in 2021? It’s hard to browse hundreds of computing courses to find an affordable but highly regarded course.

You can do this by scratching the online course website and saving all the results in a data frame.

As you take this project a step further, you can also create visualizations around variables such as price and rating to find both an affordable and high-quality course.

You can also create an opinion analysis model and come up with a public opinion for each online course. You can then choose whether to complete the course with the highest overall mind.

Bonus

Create some projects where you collect data with an API or other external tool. These skills are usually helpful when you start working.

Most companies that rely on third-party data often purchase an API license, and you must perform data collection using these external tools.

Example project you can do: Use the Twitter API to gather information related to a specific hashtag and store the information in a data frame.

Skill 2: Research data analysis

Picture: Luke Chesser on Loosen

After collecting and storing the data, you need to analyze all the variables in your data frame.

You need to observe how each variable is distributed, and understand their relationship to each other. You also need to be able to answer questions using the information available.

You do this work very often as a data scientist, perhaps even more than proactive modeling.

Here are some EDA project ideas:

Identification of risk factors for heart disease

Material: Framingham Heart Study

Teaching program: Framingham Heart Study: Decision Trees

Language: Python or R

This material consists of predictors such as cholesterol, age, diabetes, and family history used to predict the onset of heart disease in a patient.

You can use Python or R to analyze the relationships in this data set and come up with answers to questions such as:

  • Are diabetic patients more likely to have heart disease at an early age?
  • Is there a certain population group at higher risk of heart disease than others?
  • Do exercises often reduce the risk of heart disease?
  • Are smokers more likely to have heart disease than non-smokers?

Being able to answer these questions with the available data is an important skill for the data scientist.

This project will help not only to strengthen your skills as an analyst, but also to demonstrate your ability to gain knowledge of large datasets.

World Happiness Report

Material: World Happiness Report

Teaching program: World Happiness Report EDA

Language: Python

The World Happiness Report monitors six factors to measure global happiness – life expectancy, economy, social support, absence of corruption, freedom and generosity.

You can answer the following questions when you perform an analysis on this data set:

  • Which country is the happiest in the world?
  • What are the main factors influencing the happiness of the people?
  • Has overall happiness increased or decreased?

Again, this is a project that will help improve your skills as an analyst. A feature I’ve seen in the most successful data analysts is curiosity.

Data researchers and analysts are always looking for influential factors.

They are always looking for a relationship between the variables and are constantly asking.

If you are an aspiring data scientist, doing such projects will help you develop an analytical mind.

Skill 3: Data visualization

Picture: Lukas Blazek on Loosen

When you start as an information scientist, your customers and stakeholders are usually non-technical people.

You need to share your views and present your findings to a non-technical audience.

The best way to do this is in the form of visualizations.

Presenting an interactive dashboard will help you communicate your insights much better because the charts are easy to understand at a glance.

Because of this, many companies list data visualization a must be skill in data processing tasks.

Here are some projects you can showcase in your portfolio to demonstrate your data visualization skills:

Construction of the Covid-19 dashboard

Material: Covid-19 Data Repository at Johns Hopkins University

Teaching program: Building a Covid-19 dashboard with Python and Tableton

Language: Python

You must first preprocess the data set above with Python. You can then create an interactive Covid-19 dashboard with Tableau.

Tableau is one of the most sought after data visualization tools and is a prerequisite for most entry-level data science tasks.

Building a dashboard with Tableau and presenting it in your portfolio will help you stand out as it demonstrates your skills in using the tool.

Building the IMDB-Movie Dataset Dashboard

Material: IMDb’s most popular movies

Teaching program: Exploring IMDb Top 250 with Tableau

You can try the IMDb dataset and create an interactive movie control panel with Tableau.

As I mentioned above, showcasing the Tableau dashboards you’ve built can help your portfolio stand out.

Another great thing about Tableau is that you can upload your visualizations to Tableau Public and share the link with anyone who wants to use your dashboard.

This means that potential employers can interact with your dashboard, which arouses interest. When they are interested in your project and can really play with the end product, you are already one step closer to getting the job done.

If you want to start using Tableau, you can visit the tutorial here.

Skill 4: Machine learning

Picture: Kevin Ku on Loosen

Finally, you will need to present projects that demonstrate your skills in machine learning.

I suggest you do both – supervised and unsupervised machine learning projects.

Mood analysis of food reviews

Material: Amazon Fine Food Reviews dataset

Teaching program: A beginner’s guide to analyzing opinions in Python

Language: Python

Sentiment analysis is a very important part of machine learning. Companies often use it to assess the overall reaction of customers to their products.

Customers usually talk about products on social media and customer feedback forums. This information can be collected and analyzed to get an idea of ​​how different people react to different marketing strategies.

Based on the opinion analysis, companies can place their products differently or change their target audience.

I suggest showing one opinion analysis project in your portfolio because almost all companies have social media present and the need to evaluate customer feedback.

Life expectancy services

Material: Life expectancy

Teaching program: Life expectancy regression

Language: Python

In this project, you predict a person’s life expectancy based on variables such as education, infant mortality, alcohol consumption, and adult mortality.

The opinion analysis project listed above is a classification problem, which is why I am adding a regression problem to the list.

It is important to present different projects on your resume so that you can demonstrate your expertise in different areas.

Breast cancer analysis

Material: Breast Cancer Data Set

Teaching program: Cluster analysis of breast cancer data set

Language: Python

In this project, you use the K-means clustering algorithm to detect breast cancer based on target attributes.

K-means clustering is an uncontrolled learning technique.

It’s important to have clustering projects in your portfolio because most of the real-world data is untagged.

Even the massive materials collected by companies usually do not have training stickers. As a data scientist, you may need to make your own notes using unsupervised learning techniques.

Conclusion

You will be introduced to projects that show a variety of skills – including data collection, analysis, visualization and machine learning.

Online courses are not enough to gain skills in all of these areas. However, you can find tutorials for almost any type of project you want to do.

All you need is a basic knowledge of Python, and you can follow these guides.

Once you get all the code to work and be able to track it properly, you can copy the solution and work on a wide variety of projects.

Remember that it is important to present projects in your portfolio if you are a beginner in the field of computer science and do not have a degree or master’s degree in the subject.

Portfolio projects are one of the best ways to show your skills to a potential employer, especially to get a first entry level job in the field.

Learn how I got my first data science internship here.

Sooner or later, the winners are the ones who think they can – Paul Tournier

Bio: Natassha Selvaraj (LinkedIn) I currently hold a degree in Computer Science with a major in Computer Science. My interest is in the field of machine learning, and I have worked on several projects in this field. I also enjoy the problem solving and programming I do on a daily basis.

Original. Re-posted with permission.

Related to:

LEAVE A REPLY

Please enter your comment!
Please enter your name here