How to Upload Image Dataset in Google Colab

Tip and tricks to improve your Google Colab Experience

Photograph past Ehimetalor Akhere Unuabona on Unsplash

Colab (short for Colaboratory) is a complimentary platform from Google that allows users to code in Python. Colab is essentially the Google version of a Jupyter Notebook. Some of the advantages of Colab over Jupyter include zero configuration, free access to GPUs & CPUs, and seamless sharing of code.

More and more people are using Colab to accept the reward of the high-stop computing resource without being restricted past their price. Loading data is the outset step in whatever data science project. Frequently, loading information into Colab crave some extra setups or coding. In this commodity, yous'll learn the 7 common ways to load external information into Google Colab. This article is structured as follows:

  1. Uploading file through Files explorer
  2. Uploading file using files module
  3. Reading a file from Github
  4. Cloning a Github Repository
  5. Downloading files using Linux wget command
  6. Accessing Google Drive by mounting it locally
  7. Loading Kaggle Datasets

1. Uploading file through Files explorer

You can use the upload option at the pinnacle of the Files explorer to upload whatsoever file(due south) from your local machine to Google Colab.

Here is what you need to do:

Footstep 1: Click the Files icon to open the "Files explorer" pane

Click Files icon (Epitome past writer)

Step ii: Click the upload icon and select the file(s) y'all wish to upload from the "File Upload" dialog window.

(Image by author)

Step three: Once the upload is complete, yous can read the file as you would normally. For example, pd.read_csv('Salary_Data.csv')

(Paradigm past author)

2. Uploading file using Colab files module

Instead of clicking the GUI, you can also use Python code to upload files. You can import files module from google.colab. Then phone call upload() to launch a "File Upload" dialog and select the file(s) you wish to upload.

          from google.colab import files
uploaded = files.upload()

File Upload dialog

One time the upload is complete, your file(s) should announced in "Files explorer" and you can read the file as you would normally.

(Image past author)

3. Reading file from Github

One of the easiest means to read data is through Github. Click on the dataset in the Github repository, so click the "Raw" button.

(Prototype by author)

Copy the raw data link and laissez passer it to the function that can have a URL. For instance, pass a raw CSV URL to Pandas read_csv():

          import pandas as pd          df = pd.read_csv('https://raw.githubusercontent.com/BindiChen/machine-learning/master/information-analysis/001-pandad-pipage-function/information/train.csv')        

4. Cloning a Github repository

You can also clone a Github repository into your Colab environment in the same fashion as you would in your local car, using git clone.

          !git clone https://github.com/BindiChen/car-learning.git        

Once the repository is cloned, you should be able to see its contents in "Files explorer" and you lot can simply read the file as you would ordinarily.

git clone and read the file in Colab (Image past author)

5. Downloading files from the spider web using Linux wget command

Since Google Colab lets you exercise everything which you tin in a locally hosted Jupyter Notebook, you tin besides use Linux shell command like ls, dir, pwd, cd etc using !.

Among those available Linux commands, the wget allows you to download files using HTTP, HTTPS, and FTP protocols.

In its simplest form, when used without whatsoever option, wget will download the resource specified in the URL to the electric current directory, for example:

wget in Colab (Epitome by author)

Rename file

Sometimes, y'all may want to salve the downloaded file under a different name. To do that, simply pass the -O option followed by the new name:

          !wget https://example.com/cats_and_dogs_filtered.nil \
-O new_cats_and_dogs_filtered.cipher

Relieve file to a specific location

By default, wget volition save files in the current working directory. To save the file to a specific location, use the -P option:

          !wget https://example.com/cats_and_dogs_filtered.cipher \
-P /tmp/

Invalid HTTPS SSL certificate

If yous desire to download a file over HTTPS from a host that has an invalid SSL certificate, you can pass the --no-bank check-document option:

          !wget https://example.com/cats_and_dogs_filtered.naught \
--no-check-certificate

Multiple files at once

If you lot want to download multiple files at once, use the -i option followed by the path to a file containing a list of the URLs to exist downloaded. Each URL needs to be on a separate line.

          !wget            -i dataset-urls.txt                  

The following is an example shows dataset-urls.txt:

          http://instance-1.com/dataset.nil
https://example-2.com/railroad train.csv
http://case-3.com/test.csv

half dozen. Accessing Google Bulldoze by mounting information technology locally

Y'all can use the drive module from google.colab to mountain your Google Bulldoze to Colab.

          from google.colab import drive                      bulldoze.mount('/content/bulldoze')                  

Executing the above statement, you will exist provided an authentication link and a text box to enter your authorization lawmaking.

Click the authentication link and follow the steps to generate your authorization code. Copy the code displayed and paste it into the text box every bit shown above. In one case it is mounted, yous should get a bulletin like:

          Mounted at /content/bulldoze        

Afterward that, you should exist able to explore the contents via "Files explorer" and read the information as you would normally.

Finally, to unmount your Google Bulldoze:

          drive.flush_and_unmount()        

7. Loading Kaggle datasets

Information technology is possible to download any dataset seamlessly from Kaggle into your Google Colab. Hither is what yous demand to do:

Step 1: Download your Kaggle API Token: Go to Account and gyre downwards to the API section.

Generate Kaggle API token (Image by author)

Past clicking "Create New API Token", a kaggle.json file volition be generated and downloaded to your local machine.

Step 2: Upload kaggle.json to your Colab project: for instance, you tin can import files module from google.colab, and call upload() to launch a File Upload dialog and select the kaggle.json from your local machine.

Upload kaggle.json (Image by author)

Step iii: Update KAGGLE_CONFIG_DIR path to the current working directory. Yous can run !pwd to get the electric current working directory and assign the value to os.environ['KAGGLE_CONFIG_DIR'] :

Configure KAGGLE_CONFIG_DIR (Image by author)

Step four: Finally, you should be able to run the post-obit Kaggle API to download datasets:

          !kaggle competitions download -c titanic          !kaggle datasets download -d alexanderbader/forbes-billionaires-2021-30        

Download Kaggle Dataset (Epitome past author)

Annotation for the competition dataset, the Kaggle API should be available under the Data tab

Retrieve Kaggle API from competition dataset (Prototype past author)

For the general dataset, the Kaggle API can be accessed as follows:

Recollect Kaggle API from a general dataset (Image by writer)

Decision

Google Colab is a great tool for individuals who want to take advantage of the capabilities of high-end computing resources (like GPUs, TPUs) without beingness restricted by their price.

In this article, we accept gone through most of the means you can better your Google Colab feel by loading external data into Google Colab. I promise this article will aid y'all to salve time in learning Colab and Data Analysis.

Cheers for reading. Stay tuned if y'all are interested in the applied aspect of machine learning.

You may exist interested in some of my Pandas articles:

  • 10 tricks for Converting numbers and strings to datetime in Pandas
  • Using Pandas method chaining to ameliorate lawmaking readability
  • How to practice a Custom Sort on Pandas DataFrame
  • All the Pandas shift() you lot should know for data analysis
  • When to utilise Pandas transform() function
  • Pandas concat() tricks you should know
  • Divergence betwixt utilize() and transform() in Pandas
  • All the Pandas merge() you should know
  • Working with datetime in Pandas DataFrame
  • Pandas read_csv() tricks you should know
  • 4 tricks you should know to parse date columns with Pandas read_csv()

More tutorials can be establish on my Github

garciaquede1984.blogspot.com

Source: https://towardsdatascience.com/7-ways-to-load-external-data-into-google-colab-7ba73e7d5fc7

0 Response to "How to Upload Image Dataset in Google Colab"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel