Machine Learning for Radiology — Where to Begin

The constellation of new terms can be overwhelming: Deep Learning, TensorFlow, Scikit-Learn, Keras, Pandas, Python and Anaconda.

There is a head-spinning amount of new information to get under your belt before you can get started.

Personally, I want to be able use machine learning (ML) capabilities in some of my iOS apps using Apple’s CoreML framework as well.

This means another set of complexities to navigate before you can actually get down to work.

Once we have our tools configured properly, the job will be easier.

The dominant language in machine learning is Python.

There is an entire ecosystem that you need to get familiar with before you can start working on the many great tutorials out there.

There are a myriad amount of resources online as well as books to help you get started (a job for another post).

Let’s see what we need to do to take our first steps.

Photo by Malte Wingen on Unsplash“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.

”- Abraham Lincoln (probably never said this)Let’s talkThe most common development language for ML is Python.

If you don’t know Python, many of the resources for ML beginners start off with quick Python intros.

This post is not intended to teach Python, but to demonstrate one developer’s path to getting started with the vast ML tool chain.

The first thing you need to do is download Python and the necessary Python tools for machine learning.

Hello, UnixWe need to use the command line interface to install and manage our Python tools.

Enough bash to be dangerous:In Linux or a Mac, we use the Terminal.

You can find the program at Finder>Applications>Utilities>Terminal .

In Windows, we use the Command Prompt.

You click on the Windows icon>Windows System>Command Prompt or click on the Windows icon and type cmd .

Before the cursor you see a string of text which refers to:machinename:directory username$You type after the dollar sign.

Here are some common commands:List files in current directory: lsShow hidden files as well: ls -aNavigate to a new directory: cd <directory_path>To go to home directory: cd ~ or just type: cd Go navigate up one level: cd .

To go to the last folder you were in: cd -To show the current working directory: pwdThe Up Arrow retypes the last command.

You can travel back to previous commands by pressing the Up Arrow over again.

Clear the window: clearOpen a file in a text editor, ex: atom <filename>To cancel an application (ex.

ping) Ctrl-CPythonPython is an interpreted language, so it is read line by line, rather than a compiled language, where you have to bake the cake before you can use it.

There are two separate versions of Python currently available, Python 2.

7 and Python 3.

Python 2.

7 will be reaching end of life January 1, 2020, and Python 3.

x is not backwards-compatible.

So why would you want to use an older version?.Unfortunately some of the frameworks only support 2.

7, and many tutorials in books and online were written specifically for that version.

How do we deal with this?Fortunately you can have both flavors of Python on your computer, and run different virtual environments in different folders on your hard drive, so you can do most of your ML work in, say Python 3.

7, and have version 2.

7 in different folders if you have a project that requires a library that only works on 2.

7.

There are several ways to manage the different Python virtual environments using virtualenv, Python Environment Wrapper (pew), venv, pyvenv.

The easiest is to use Conda, which installed with Python when you use Anaconda.

https://imgs.

xkcd.

com/comics/python_environment.

pngAnacondaAnaconda is an open-source platform that is perhaps the easiest way to get started with Python machine learning on Linux, Mac OS X and Windows.

It helps you manage the programing environments, and includes common Python packages used in data science.

You can download the distribution for your platform at https://www.

anaconda.

com/distribution/ .

Once you install the appropriate version of Python for your system, you will want to set up some environments.

CondaConda is the Python package manager and environment management system used by Anaconda.

Conda installs most, but not all of the packages you need.

The rest can be installed through the command line using pip— more about that later.

To see the packages in your current environment:conda list envWhich version of Conda you are running:conda -version(if below 4.

1.

0 — then you can update Conda with conda update conda)You can create an environment with the Anaconda Navigator by choosing Environments from the left menu and then clicking the Create button.

Creating a new environment “new_env” using Anaconda NavigatorYou can also create the environment from the command line.

For example here we create an environment named “py27” using Python 2.

7:conda create -n py27 python=2.

7To activate an environment:conda activate <your_env_name>To deactivate the current environment:conda deactivateTo remove an environment:conda remove -name <your_env_name> -allTo list all of your Conda environments:conda info –envsList of Conda managed Python environmentsThe environment with the asterisk is the current active environment.

To see the packages in your current environment:conda list envWhich version of Conda you are running:conda -version(if below 4.

1.

0 — then you can update Conda with conda update conda)Two of the major machine learning packages TensorFlow and Keras should be installed using pip.

For Apple’s machine learning frameworks, you would also install Turi Create.

The Scientific StackThere is a set of Python packages referred to as the scientific stack that are useful across multiple disciplines.

These include:NumPy http://www.

numpy.

org/ — library for efficient handling of arrays and matricesSciPy https://www.

scipy.

org/ — collection of packages with math and science capabilitiesmatplatlib https://matplotlib.

org/ — the standard 2D plotting library in Pythonpandas https://pandas.

pydata.

org/ — library of matrix-like data structures, labeled indices, time functions, etc.

Scikit-learn https://scikit-learn.

org/stable/ — library of machine learning algorithmsJupyter https://jupyter.

org/ — an interactive Python shell in a web-based notebookSeaborn https://seaborn.

pydata.

org/index.

html — statistical data visualizationsBokeh https://bokeh.

pydata.

org/en/latest/ — interactive data visualizationsPyTables https://www.

pytables.

org/ — a Python wrapper for HDF5 libraryYou can install these packages and their dependencies using Anaconda.

In your newly created environment search for the package you want.

Search Anaconda Navigator for the packages you want to installThen you select them from the list by checking the box and clicking apply.

Adding new packages and dependencies via Anacona NavigatorAs I mentioned earlier, you use pip to install TensorFlow and Keras (and Turi Create for Apple’s CoreML).

pippip is python’s standard package manager https://pypi.

org/project/pip/pip install — upgrade pipTo install a package with pip:pip install <package_name>Editing Python FilesYou interact with python in Terminal on a Mac or Console in Windows.

To write your code, most people use a code editor such as Atom https://atom.

io/ or Sublime Text https://www.

sublimetext.

com/ .

There are whole religious wars over code editors, but life is too short for that.

My favorite (and free) text editor is Atom https://atom.

io/ , from the GitHub folks.

A cool feature of Atom is that you can extend the app with features such as an integrated Terminal window.

Once installed, you can add this feature by going to Settings / Install Packages and search for platformio-ide-terminalRunning Python FilesAt the command prompt ($ or >) type python <filename.

py>To exit python use exit()or Ctrl-D (Ctrl-Z in Windows)To see which python version you are currently using, type:python –version or python -VTo see where the Python installation you are using is located, type:which pythonEnvironment FilesAn environment file is a file in your project’s root directory that lists all the included packages and their version numbers specific to your project’s environment.

This allows you to share projects with others, and for you to reuse in other projects.

You create the file using:conda env export -file environment.

yamlYou recreate the Conda environment and its packages using:conda env create -n <conda-env> -f environment.

yamlIn some projects or tutorials you will see requirements.

txt which is utilized by pip as the package manager instead of the environment.

yaml used by Conda.

These are created by freezing the environment:pip freeze > requirements.

txtTo reconstitue, use:pip install -r requirements.

txtJupyter NotebooksJupyter Notebook https://jupyter.

org/ is an open-source web browser based application.

This allows you to run your python code directly in a more user friendly environemnt and see the results step by step.

It is great for teaching, as you can add text and images in between your code cells in markup cells.

The application is extensible, so you can add many other useful features.

You also can install Jupyter Notebook with the Anaconda Navigator:Install Jupyter Notebook with Anaconda NavigatorType the following at the prompt to create a new Jupyter Notebook app in your browser:jupyter notebookCreate a new notebook with Python 3To launch a specific notebook:jupyter notebook <notebook-name>By the way, it is not recommended to run multiple instances of the Jupyter Notebook App simultaneously.

Jupyter Notebook helps keep you organized!To run a cell, click on the Run button in the Jupyter toolbar or type Shift + Enter.

To shut down a notebook, close the Terminal window or type:jupyter notebook stopor press Ctrl+COK, so now what?Now that our axe is sharpened, how can you get started on actual radiology informatics.

Here are a few links to get you stared:Machine Learning for Medical Imaging https://pubs.

rsna.

org/doi/10.

1148/rg.

2017160130Deep Learning: A Primer for Radiologists https://pubs.

rsna.

org/doi/10.

1148/rg.

2017170077For a deeper dive, here are two entire journal issues devoted to the subject:JACR March 2018 Volume 15 Number 3PB Special Issue Data Science: Big Data, Machine Learning and Artificial IntelligenceJDI June 2018 Volume 31 Number 3 Special Focus Issue on Open Source SoftwareIf you are still awake at this point, here are some useful GitHub refences:https://github.

com/ImagingInformatics/machine-learninghttps://github.

com/slowvak/MachineLearningForMedicalImagesLet’s Get Started!A really terrific introduction is in the above mentioned Journal of Digital Imaging, June 2018:Hello World Deep Learning in Medical Imaging JDI (2018) 31: 283–289 Lakhani, Paras, Gray, Daniel L.

, Pett, Carl R.

, Nagy, Paul, Shih, GeorgeInstead of creating a prototypical Cat v.

Dog classifier, you create a chest v.

abdomen x-ray classifier (CXR v.

KUB)!.This is a great place to start your AI journey.

All of the above is a lot to unpack, but I hope this introduction will help get you started.

I am far from an expert, and wrote this initially as a memory aid for myself.

The more practitioners that have a basic undestanding of the process, the better.

Best of luck!.. More details

Leave a Reply