Downloading Datasets into Google Drive via Google Colab

Downloading Datasets into Google Drive via Google ColabStep-by-step guide to use Google Drive in Google Colab for data science projectsKevin LukBlockedUnblockFollowFollowingFeb 15Google Colab and Google Drive back you up in deep learning — Photo Credit: Elizabeth TsungIf you are working on an old MacBook Pro like me (Late 2013, with 120GB HD), limited storage will be my greatest hurdle in working on data science project.

For those who are also working on data science project with large dataset, I am sure that saving the dataset and training the model on the cloud will definitely ease your mind.

In this tutorial, I will share with you my experience in:Mounting the Google Drive to Google ColabDownloading the dataset directly to Google Drive via Google Colabwith the use of Kaggle API;from competition website which username and password is required while requesting to downloadBONUS: One click to enable FREE GPU support in Google Colab to train with Tensorflow.

Mount the Google Drive to Google ColabStep 1First, go to your Google Colab then type the below:from google.

colab import drivedrive.

mount('/content/gdrive')The cell will return the following and your needs to go to the link to retrieve the authorization code.

Then you are good to go!Step 2If you are able to access Google Drive, your google drive files should be all under:/content/gdrive/My Drive/while your current directory will be /content/Click on the arrow on the left and you will find the data structure.

For your easy usage, just save the below code snippet and paste it into the Google Colab and you can mount your Google Drive to the notebook easily.

Download the dataset directly to Google Drive via Google ColabIn this section I will share with you my experience in downloading dataset from Kaggle and other competition.

Downloading Kaggle datasets via Kaggle APIStep 1 — Get the API key from your accountVisit www.

kaggle.

com ⇨ login ⇨ My Account ⇨ Create New API TokenThe “kaggle.

json” file will be auto downloaded.

Step 2 — Upload the kaggle.

json fileUse these code snippets in Google Colab for the task:from google.

colab import filesfiles.

upload() #this will prompt you to update the jsonThe below will create the necessary folder path.

!pip install -q kaggle!mkdir -p ~/.

kaggle!cp kaggle.

json ~/.

kaggle/!ls ~/.

kaggle!chmod 600 /root/.

kaggle/kaggle.

json # set permissionStep 3 — Download the required datasetSimply download the required dataset with the syntax:!kaggle competitions download -c ‘name_of_competition’ -p “target_colab_dir”!kaggle competitions download -c histopathologic-cancer-detection -p /content/gdrive/My Drive/kaggle/cancerBonus: please see the git gist below for searching Kaggle datasetStep 4 — UnzipFor dataset with multiple zip files like the example, I tend to change directory to the designated folder and unzip them one by one.

!unzip “data_name.

zip”cd gdrive/My Drive/kaggle/cancer #change dir!mkdir train *create a directory named train/!mkdir test *create a directory named test/!unzip train.

zip -d train/ #unzip data in train/!unzip test.

zip -d test/ #unzip data in test/!unzip sample_submission.

csv.

zip!unzip train_labels.

csv.

zipGo to here to read more from Kaggle API docs.

Download Dataset from competition website which username and password is required while requesting to downloadFor competition like ICIAR2018, you will need to provide the username and password while downloading the dataset.

To do this in Google Colab, first you can change your current directory to the folder you wish to save your dataset.

Then, use wget instead of using curl command.

!wget — user=your_username — password=your_password http://cdn1.

i3s.

up.

pt/digitalpathology/ICIAR2018_BACH_Challenge.

zipAfter downloading, you can unzip the file using the same approach above.

BONUS: One click to enable FREE GPU in Google Colab to train with TensorflowAfter you mounted your Google Drive to Google Colab and downloaded the required dataset, let’s enable GPU in your Colab notebook and train your model.

From task bar: Runtime ⇨ Change runtime typeHardware accelerator: None ⇨ GPUHope you find this tutorial useful and happy cloud computing!.

. More details

Leave a Reply