Let’s say you want to create a project that includes code for analyzing movie reviews. There are three essential steps in creating a good project structure:
There is no clear consensus in the Community on best practices for organizing machine learning projects. Therefore, they are numerous choices, and this ambiguity leads to confusion. Fortunately, the workaround is thanks to the people of DrivenData. They have created a tool called Cookiecutter Data Science which is a standardized but flexible project structure for doing and sharing data science work. A few lines of code create a whole set of subdirectories and make it easier to start, parse, and share analysis. You can read more about the tool in their project homepage. Let’s go to the interesting part and look at it in action.
pip install cookiecutterorconda config --add channels conda-forge
conda install cookiecutter
Starting a new project
Go to your terminal and run the following command. It automatically fills the directory with the necessary files.
A tunnel analysis directory is created for the specified path, which in the previous case is a desktop.
Entry : Cookiecutter science is about to move to version 2, so there will be a small change in future use of the command. This means you have to use
ccds ...rather than
cookiecutter ...in the above command. According to the Github archive, this version of the model is still available, but it must be explicitly used
-c v1to select it. Keep an eye on documentaionwhen the change occurs.