Template your data science projects with cookiecutter

This is an incredible way to create a project template for a type of analysis that you know you will need to repeat a number of times, while inputting the necessary data and/or parameters just once.

The example I am going to walk through in this blogpost is very trivial, but the be reminded that the purpose is to understand how cookiecutter works.

Once that is done, you just need to get creative and adapt it to your needs!Note: cookiecutter must be part of your environment if you want to use it.

If you use Anaconda, type conda list in your terminal and see if it shows up in the list of installed packages, otherwise just type pip install cookiecutterLet’s pretend I want to create a template of folders (one containing the notebook and one containing files that I will need to save) and I want the notebook to perform some kind of calculations on a dataframe.

The structure that I want to duplicate every time I run the cookiecutter, is shown in the snapshot below.

The purpose of the notebook is to create a dataframe with customizable number of columns and rows.

The dataframe will be populated with integers bounded between two values that also can be changed every time.

The template will also allow me to choose the numpy function that I want to run over rows (or column) and store the results into a file that will be saved in the deliverables folder.

Of course, each time I want to create a folder containing a project like this, I would like to be able to input the title of such folder, as well as the name of the file I am going to save.

The template contains weird syntax such as {{cookiecutter.

folder_title}}, where folder_title is one the customizable variables contained in the json file.

Just remember that each time you clone the template, all the variables contained in the double curly braces (in the notebook ,as well as the folders’ names) will be replaced with the respective values passed in the json file.

All this information goes into the cookiecutter.

json file that must be saved at the top level of the template folder, as shown in the snapshot above.

Populate cookiecutter.

jsonThe json file is a dictionary containing all the default values of the variables that I want to change every time I create a new copy of this type of project.

Create the template notebookWrite the code that you want to duplicate in your template notebook, and assign the variables by using the notation I mentioned above, as shown in the lines of code below:To have a better idea of what is going on, the entire notebook can be found at this link.

Work on your projects without copying and pasting ever again!Ok, great!.Now it’s time to use this, but… how do I change the values of my input every time I clone my template?.Easy!From your terminal, move into the folder where you want the project to be cloned and type cookiecutter <absolute_path_of_Cookiecutter_folder>.

Once you do this, the terminal will ask you to input the values for all the variables included in the json file, one at the time.

If you press enter without inputting anything, the cookiecutter will use the default value from the json file.

Once this is done, voilà, the copy of the project is created!.You will see the derivables and notebook folders appearing in your current directory with all their content!.You can now open the notebook and run it as is!This is a tough topic to explain, not because of its difficulty, but because it’s much easier done than described.

For this reason I made the example available on my GitHub page, so you can clone it and try it out!.After playing with it a bit, you will understand how powerful this is and hopefully will make your (analytics) life much easier, depending on your needs!Feel free to check out:the GitHub repo for this post.

my other Medium posts.

my LinkedIn profile.

Thank you for reading!.. More details

Leave a Reply