Let’s say you want to create a project that includes code for analyzing movie reviews. There are three essential steps in creating a good project structure:

Creating a project template Image by the author
Icon @NounProject | CC: Creative Commons

There is no clear consensus in the Community on best practices for organizing machine learning projects. Therefore, they are numerous choices, and this ambiguity leads to confusion. Fortunately, the workaround is thanks to the people of DrivenData. They have created a tool called Cookiecutter Data Science which is a standardized but flexible project structure for doing and sharing data science work. A few lines of code create a whole set of subdirectories and make it easier to start, parse, and share analysis. You can read more about the tool in their project homepage. Let’s go to the interesting part and look at it in action.


pip install cookiecutterorconda config --add channels conda-forge
conda install cookiecutter

Starting a new project

Go to your terminal and run the following command. It automatically fills the directory with the necessary files.

cookiecutter https://github.com/drivendata/cookiecutter-data-science
Using Cookiecutter DataScience Image by author

A tunnel analysis directory is created for the specified path, which in the previous case is a desktop.

Directory structure of a newly created project Image by the author

Entry : Cookiecutter science is about to move to version 2, so there will be a small change in future use of the command. This means you have to use ccds ... rather than cookiecutter ... in the above command. According to the Github archive, this version of the model is still available, but it must be explicitly used -c v1 to select it. Keep an eye on documentaionwhen the change occurs.


Please enter your comment!
Please enter your name here