Advanced Jupyter Notebooks: A Tutorial

If we create a file imports.

py containing the following code:We can load this simply by writing a one-line code cell, like so:Executing this will replace the cell contents with the loaded file.

Now we can run the cell again to import all our modules and we’re ready to go.

The %run magic is similar, except it will execute the code and display any output, including Matplotlib plots.

You can even execute entire notebooks this way, but remember that not all code truly belongs in a notebook.

Let's check out an example of this magic; consider a file containing the following short script.

When executed via %run, we get the following result.

If you wish to pass arguments to a script, simply list them explicitly after the filename %run my_file.

py 0 "Hello, World!" or using variables %run $filename {arg0} {arg1}.

Additionally, use the -p option to run the code through the Python profiler.

Scripted ExecutionAlthough the foremost power of Jupyter Notebooks emanates from their interactive flow, it is also possible to run notebooks in a non-interactive mode.

Executing notebooks from scripts or the command line provides a powerful way to produce automated reports or similar documents.

Jupyter offers a command line tool that can be used, in its simplest form, for file conversion and execution.

As you are probably aware, notebooks can be converted to a number of formats, available from the UI under “File > Download As”, including HTML, PDF, Python script, and even LaTeX.

This functionality is exposed on the command line through an API called nbconvert.

It is also possible to execute notebooks within Python scripts, but this is already well documented and the examples below should be equally applicable.

It’s important to stress, similarly to %run, that while the ability to execute notebooks standalone makes it possible to write all manor of projects entirely within Jupyter notebooks, this is no substitute for breaking up code into standard Python modules and scripts as appropriate.

On the Command LineIt will become clear later how nbconvert empowers developers to create their own automated reporting pipelines, but first let's look at some simple examples.

The basic syntax is:jupyter nbconvert –to <format> notebook.

ipynbFor example, to create a PDF, simply write:jupyter nbconvert –to pdf notebook.

ipynbThis will take the currently saved static content of notebook.

ipynb and create a new file called notebook.

pdf.

One caveat here is that to convert to PDF requires that you have pandoc (which comes with Anaconda) and LaTeX (which doesn't) installed.

Installation instructions depend on your operating system.

By default, nbconvert doesn't execute your notebook code cells.

But if you also wish to, you can specify the –execute flag.

jupyter nbconvert –to pdf –execute notebook.

ipynbA common snag arises from the fact that any error encountered running your notebook will halt execution.

Fortunately, you can throw in the –allow-errors flag to instruct nbconvert to output the error message into the cell output instead.

jupyter nbconvert –to pdf –execute –allow-errors notebook.

ipynbParameterisation with Environment VariablesScripted execution is particularly useful for notebooks that don’t always produce the same output, such as if you are processing data that change over time, either from files on disk or pulled down via an API.

The resulting documents can easily be emailed to a list of subscribers or uploaded to Amazon S3 for users to download from your website, for example.

In such cases, it’s quite likely you may wish to parameterise your notebooks in order to run them with different initial values.

The simplest way to achieve this is using environment variables, which you define before executing the notebook.

Let’s say we want to generate several reports for different dates; in the first cell of our notebook, we can pull this information from an environment variable, which we will name REPORT_DATE.

The %env line magic makes it easy to assign the value of an environment variable to a Python variable.

Then, to run the notebook (on UNIX systems) we can do something like this:REPORT_DATE=2018-01-01 jupyter nbconvert –to html –execute report.

ipynbAs all environment variables are strings, we will have to parse them to get the data types we want.

For example:A_STRING="Hello, Tim!" AN_INT=42 A_FLOAT=3.

14 A_DATE=2017-12-31 jupyter nbconvert –to html –execute example.

ipynbAnd we simply parse like so:Parsing dates is definitely less intuitive than other common data types, but as usual there are several options in Python.

On WindowsIf you’d like to set your environment variables and run your notebook in a single line on Windows, it isn’t quite as simple:cmd /C "set A_STRING=Hello, Tim!&& set AN_INT=42 && set A_FLOAT=3.

14 && set A_DATE=2017-12-31&& jupyter nbconvert –to html –execute example.

ipynb"Keen readers will notice the lack of a space after defining A_STRING and A_DATE above.

This is because trailing spaces are significant to the Windows set command, so while Python will successfully parse the integer and the float by first stripping whitespace, we have to be more careful with our strings.

Parameterisation with PapermillUsing environment variables is fine for simple use-cases, but for anything more complex there are libraries that will let you pass parameters to your notebooks and execute them.

With over 1000 stars on GitHub, probably the most popular is Papermill, which can be installed with pip install papermill.

Papermill injects a new cell into your notebook that instantiates the parameters you pass in, parsing numeric inputs for you.

This means you can just use the variables without any extra set-up (though dates still need to be parsed).

Optionally, you can create a cell in your notebook that defines your default parameter values by clicking “View > Cell Toolbar > Tags” and adding a “parameters” tag to the cell of your choice.

Our earlier example that produced an HTML document now becomes:papermill example.

ipynb example-parameterised.

ipynb -p my_string "Hello, Tim!" -p my_int 3 -p my_float 3.

1416 -p a_date 2017-12-31 jupyter nbconvert example-parameterised.

ipynb –to html –output example.

htmlWe specify each parameter with the -p option and use an intermediary notebook so as not to change the original.

It is perfectly possible to overwrite the original example.

ipynb file, but remember that Papermill will inject a parameter cell.

Now our notebook set-up is much simpler:Our brief glance so far uncovers only the tip of the Papermill iceberg.

The library can also execute and collect metrics across notebooks, summarise collections of notebooks, and it provides an API for storing data and Matplotlib plots for access in other scripts or notebooks.

It’s all well documented in the GitHub readme, so there’s no need to reiterate here.

It should now be clear that, using this technique, it is possible to write shell or Python scripts that can batch produce multiple documents and be scheduled via tools like crontab to run automatically on a schedule.

Powerful stuff!Styling NotebooksIf you’re looking for a particular look-and-feel in your notebooks, you can create an external CSS file and load it with Python.

This works because IPython’s HTML objects are inserted directly into the cell output div as raw HTML.

In fact, this is equivalent to writing an HTML cell:To demonstrate that this works let’s use another HTML cell.

Using HTML cells would be fine for one or two lines, but it will typically be cleaner to load an external file as we first saw.

If you would rather customise all your notebooks at once, you can write CSS straight into the ~/.

jupyter/custom/custom.

css file in your Jupyter config directory instead, though this will only work when running or converting notebooks on your own computer.

Indeed, all of the aforementioned techniques will also work in notebooks converted to HTML, but will not work in converted PDFs.

To explore your styling options, remember that as Jupyter is just a web app you can use your browser’s dev tools to inspect it while it’s running or delve into some exported HTML output.

You will quickly find that it is well-structured: all cells are designated with the cell class, text and code cells are likewise respectively demarked with text_cell and code_cell, inputs and outputs are indicated with input and output, and so on.

There are also various different popular pre-designed themes for Jupyter Notebooks distributed on GitHub.

The most popular is jupyterthemes, which is available via pip install jupyterthemes and it's as simple as running jt -t monokai to set the "monokai" theme.

If you're looking to theme JupyterLab instead, there is a growing list of options popping up on GitHub too.

Hiding CellsAlthough it’s bad practice to hide parts of your notebook that would aid other people’s understanding, some of your cells may not be important to the reader.

For example, you might wish to hide a cell that adds CSS styling to your notebook or, if you wanted to hide your default and injected Papermill parameters, you could modify your nbconvert call like so:jupyter nbconvert example-parameterised.

ipynb –to html –output example.

html –TagRemovePreprocessor.

remove_cell_tags="{'parameters', 'injected-parameters'}"In fact, this approach can be applied selectively to any tagged cells in your notebook, making the TagRemovePreprocessor configuration quite powerful.

As an aside, there are also a host of other ways to hide cells in your notebooks.

Working with DatabasesDatabases are a data scientist’s bread and butter, so smoothing the interface between your databases and notebooks is going to be a real boon.

Catherine Devlin’s IPython SQL magic extension let’s you write SQL queries directly into code cells with minimal boilerplate as well as read the results straight into pandas DataFrames.

First, go ahead and:pip install ipython-sqlWith the package installed, we start things off by executing the following magic in a code cell:This loads the ipython-sql extension we just installed into our notebook.

Let's connect to a database!'Connected: @None'Here, we just connected to a temporary in-memory database for the convenience of this example, but you’ll probably want to specify details appropriate to your database.

Connection strings follow the SQLAlchemy standard:dialect+driver://username:password@host:port/databaseYours might look more like postgresql://scott:tiger@localhost/mydatabase, where driver is postgresql, username is scott, password is tiger, host is localhost and the database name is mydatabase.

Note that if you leave the connection string empty, the extension will try to use the DATABASE_URL environment variable; read more about how to customise this in the Scripted Execution section above.

Next, let’s quickly populate our database from the tips dataset from Seaborn we used earlier.

* sqlite:// 'Persisted tips'We can now execute queries on our database.

Note that we can use a multiline cell magic %% for multiline SQL.

* sqlite:// Done.

You can parameterise your queries using locally scoped variables by prefixing them with a colon.

* sqlite:// Done.

The complexity of our queries is not limited by the extension, so we can easily write more expressive queries such as finding all the results with a total bill greater than the mean.

* sqlite:// Done.

And as you can see, converting to a pandas DataFrame was easy too, which makes plotting results from our queries a piece of cake.

Let's check out some 95% confidence intervals.

The ipython-sql extension also integrates with Matplotlib to let you call .

plot(), .

pie(), and .

bar() straight on your query result, and can dump results direct to a CSV file via .

csv(filename='my-file.

csv').

Read more on the GitHub readme.

Wrapping UpFrom the start of the tutorial for beginners through to here, we’ve covered a wide range of topics and really laid the foundations for what it takes to become a Jupyter master.

These articles aim to serve as a demonstration of the breadth of use-cases for Jupyter Notebooks and how to use them effectively.

Hopefully, you have gained a few insights for your own projects!There’s still a whole host of other things we can do with Jupyter notebooks that we haven’t covered, such as creating interactive controls and charts, or developing your own extensions, but let’s leave these for another day.

Happy coding!Originally published on Dataquest.

io.

.

. More details

Leave a Reply