Docker for Data Science Without the Hassle

The entire environment is saved as a docker image on our machine which we can see with docker image list.To start this container again, select the IMAGE ID and run the command:docker run -p 12345:8888 IMAGE ID jupyter notebook –ip 0.0.0.0 This will start the container, publish the container’s port 8888 to the host port 12345, and run a jupyter notebook accessible at 127.0.0.1:12345..You can once again access the Jupyter Notebook in your browser with all the requirements ready to go..(Thanks to this issue on GitHub for providing this solution. Also, see the docs on docker for more options)repo2docker is under continuous work, and there is an active pull request for automatically using a pre-built image if the configuration files in the GitHub repository have not changed..The above command will always work though and can also be used for images not created through repo2docker ..Once you learn the basics from repo2docker , try working through some of the Docker guides to see how to effectively make use of Docker.ConclusionsMuch as you don’t have to worry about the details of backpropagation when building a neural network in Keras, you shouldn’t have to master complex commands to practice reproducible data science..Fortunately, the tools of data science continue to get better, making it easier to adopt best practices and hopefully encouraging more diverse people to enter the field.Repo2docker is one of these technologies that will make you more efficient as a data scientist..Sure, you can be the old grouchy man: “I spent all that time learning docker and now the young kids can’t even write a dockerfile” or you can take the high road and celebrate the ever-increasing layers of abstraction..These layers separate you from the tedious details allowing you to concentrate on the best part of data science: enabling better decisions through data.As always, I welcome feedback and constructive criticism..I can be reached on Twitter @koehrsen_will.. More details

Leave a Reply