Last Updated on November 13, 2019Developing machine learning models in Python often requires the use of NumPy arrays.
NumPy arrays are efficient data structures for working with data in Python, and machine learning models like those in the scikit-learn library, and deep learning models like those in the Keras library, expect input data in the format of NumPy arrays and make predictions in the format of NumPy arrays.
As such, it is common to need to save NumPy arrays to file.
For example, you may prepare your data with transforms like scaling and need to save it to file for later use.
You may also use a model to make predictions and need to save the predictions to file for later use.
In this tutorial, you will discover how to save your NumPy arrays to file.
After completing this tutorial, you will know:Let’s get started.
How to Save a NumPy Array to File for Machine LearningPhoto by Chris Combe, some rights reserved.
This tutorial is divided into three parts; they are:The most common file format for storing numerical data in files is the comma-separated variable format, or CSV for short.
It is most likely that your training data and input data to your models are stored in CSV files.
It can be convenient to save data to CSV files, such as the predictions from a model.
You can save your NumPy arrays to CSV files using the savetxt() function.
This function takes a filename and array as arguments and saves the array into CSV format.
You must also specify the delimiter; this is the character used to separate each variable in the file, most commonly a comma.
This can be set via the “delimiter” argument.
The example below demonstrates how to save a single NumPy array to CSV format.
Running the example will define a NumPy array and save it to the file ‘data.
csv‘.
The array has a single row of data with 10 columns.
We would expect this data to be saved to a CSV file as a single row of data.
After running the example, we can inspect the contents of ‘data.
csv‘.
We should see the following:We can see that the data is correctly saved as a single row and that the floating point numbers in the array were saved with full precision.
We can load this data later as a NumPy array using the loadtext() function and specify the filename and the same comma delimiter.
The complete example is listed below.
Running the example loads the data from the CSV file and prints the contents, matching our single row with 10 columns defined in the previous example.
Sometimes we have a lot of data in NumPy arrays that we wish to save efficiently, but which we only need to use in another Python program.
Therefore, we can save the NumPy arrays into a native binary format that is efficient to both save and load.
This is common for input data that has been prepared, such as transformed data, that will need to be used as the basis for testing a range of machine learning models in the future or running many experiments.
The .
npy file format is appropriate for this use case and is referred to as simply “NumPy format“.
This can be achieved using the save() NumPy function and specifying the filename and the array that is to be saved.
The example below defines our two-dimensional NumPy array and saves it to a .
npy file.
After running the example, you will see a new file in the directory with the name ‘data.
npy‘.
You cannot inspect the contents of this file directly with your text editor because it is in binary format.
You can load this file as a NumPy array later using the load() function.
The complete example is listed below.
Running the example will load the file and print the contents, confirming that both it was loaded correctly and that the content matches what we expect in the same two-dimensional format.
Sometimes, we prepare data for modeling that needs to be reused across multiple experiments, but the data is large.
This might be pre-processed NumPy arrays like a corpus of text (integers) or a collection of rescaled image data (pixels).
In these cases, it is desirable to both save the data to file, but also in a compressed format.
This allows gigabytes of data to be reduced to hundreds of megabytes and allows easy transmission to other servers of cloud computing for long algorithm runs.
The .
npz file format is appropriate for this case and supports a compressed version of the native NumPy file format.
The savez_compressed() NumPy function allows multiple NumPy arrays to be saved to a single compressed .
npz file.
We can use this function to save our single NumPy array to a compressed file.
The complete example is listed below.
Running the example defines the array and saves it into a file in compressed numpy format with the name ‘data.
npz’.
As with the .
npy format, we cannot inspect the contents of the saved file with a text editor because the file format is binary.
We can load this file later using the same load() function from the previous section.
In this case, the savez_compressed() function supports saving multiple arrays to a single file.
Therefore, the load() function may load multiple arrays.
The loaded arrays are returned from the load() function in a dict with the names ‘arr_0’ for the first array, ‘arr_1’ for the second, and so on.
The complete example of loading our single array is listed below.
Running the example loads the compressed numpy file that contains a dictionary of arrays, then extracts the first array that we saved (we only saved one), then prints the contents, confirming the values and the shape of the array matches what we saved in the first place.
This section provides more resources on the topic if you are looking to go deeper.
In this tutorial, you discovered how to save your NumPy arrays to file.
Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.
with just a few lines of scikit-learn codeLearn how in my new Ebook: Machine Learning Mastery With PythonCovers self-study tutorials and end-to-end projects like: Loading data, visualization, modeling, tuning, and much more.
Skip the Academics.
Just Results.
.. More details