How to Visualize a Decision Tree from a Random Forest in Python using Scikit-LearnA helpful utility for understanding your modelWill KoehrsenBlockedUnblockFollowFollowingAug 18, 2018Here’s the complete code: just copy and paste into a Jupyter Notebook or Python script, replace with your data and run:Code to visualize a decision tree and save as png (on GitHub here).
The final result is a complete decision tree as an image.
Decision Tree for Iris DatasetExplanation of codeCreate a model train and extract: we could use a single decision tree, but since I often employ the random forest for modeling it’s used in this example.
(The trees will be slightly different from one another!).
from sklearn.
ensemble import RandomForestClassifiermodel = RandomForestClassifier(n_estimators=10)# Trainmodel.
fit(iris.
data, iris.
target)# Extract single treeestimator = model.
estimators_[5]2.
Export Tree as .
dot File: This makes use of the export_graphviz function in Scikit-Learn.
There are many parameters here that control the look and information displayed.
Take a look at the documentation for specifics.
from sklearn.
tree import export_graphviz# Export as dot fileexport_graphviz(estimator_limited, out_file='tree.
dot', feature_names = iris.
feature_names, class_names = iris.
target_names, rounded = True, proportion = False, precision = 2, filled = True)3.
Convert dot to png using a system command: running system commands in Python can be handy for carrying out simple tasks.
This requires installation of graphviz which includes the dot utility.
For the complete options for conversion, take a look at the documentation.
# Convert to pngfrom subprocess import callcall(['dot', '-Tpng', 'tree.
dot', '-o', 'tree.
png', '-Gdpi=600'])4.
Visualize: the best visualizations appear in the Jupyter Notebook.
(Equivalently you can use matplotlib to show images).
# Display in jupyter notebookfrom IPython.
display import ImageImage(filename = 'tree.
png')ConsiderationsWith a random forest, every tree will be built differently.
I use these images to display the reasoning behind a decision tree (and subsequently a random forest) rather than for specific details.
It’s helpful to limit maximum depth in your trees when you have a lot of features.
Otherwise, you end up with massive trees, which look impressive, but cannot be interpreted at all!.Here’s a full example with 50 features.
Full decision tree from a real problem (see here).
ConclusionsMachine learning still suffers from a black box problem, and one image is not going to solve the issue!.Nonetheless, looking at an individual decision tree shows us this model (and a random forest) is not an unexplainable method, but a sequence of logical questions and answers — much as we would form when making predictions.
Feel free to use and adapt this code for your data.
As always, I welcome feedback, constructive criticism, and hearing about your data science projects.
I can be reached on Twitter @koehrsen_will.