How to Upload Image Dataset in Google Colab
7 means to load external data into Google Colab
Tip and tricks to improve your Google Colab Experience
Colab (short for Colaboratory) is a complimentary platform from Google that allows users to code in Python. Colab is essentially the Google version of a Jupyter Notebook. Some of the advantages of Colab over Jupyter include zero configuration, free access to GPUs & CPUs, and seamless sharing of code.
More and more people are using Colab to accept the reward of the high-stop computing resource without being restricted past their price. Loading data is the outset step in whatever data science project. Frequently, loading information into Colab crave some extra setups or coding. In this commodity, yous'll learn the 7 common ways to load external information into Google Colab. This article is structured as follows:
- Uploading file through Files explorer
- Uploading file using
filesmodule - Reading a file from Github
- Cloning a Github Repository
- Downloading files using Linux
wgetcommand - Accessing Google Drive by mounting it locally
- Loading Kaggle Datasets
1. Uploading file through Files explorer
You can use the upload option at the pinnacle of the Files explorer to upload whatsoever file(due south) from your local machine to Google Colab.
Here is what you need to do:
Footstep 1: Click the Files icon to open the "Files explorer" pane
Step ii: Click the upload icon and select the file(s) y'all wish to upload from the "File Upload" dialog window.
Step three: Once the upload is complete, yous can read the file as you would normally. For example, pd.read_csv('Salary_Data.csv')
2. Uploading file using Colab files module
Instead of clicking the GUI, you can also use Python code to upload files. You can import files module from google.colab. Then phone call upload() to launch a "File Upload" dialog and select the file(s) you wish to upload.
from google.colab import files
uploaded = files.upload()
One time the upload is complete, your file(s) should announced in "Files explorer" and you can read the file as you would normally.
3. Reading file from Github
One of the easiest means to read data is through Github. Click on the dataset in the Github repository, so click the "Raw" button.
Copy the raw data link and laissez passer it to the function that can have a URL. For instance, pass a raw CSV URL to Pandas read_csv():
import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/BindiChen/machine-learning/master/information-analysis/001-pandad-pipage-function/information/train.csv')
4. Cloning a Github repository
You can also clone a Github repository into your Colab environment in the same fashion as you would in your local car, using git clone.
!git clone https://github.com/BindiChen/car-learning.git Once the repository is cloned, you should be able to see its contents in "Files explorer" and you lot can simply read the file as you would ordinarily.
5. Downloading files from the spider web using Linux wget command
Since Google Colab lets you exercise everything which you tin in a locally hosted Jupyter Notebook, you tin besides use Linux shell command like ls, dir, pwd, cd etc using !.
Among those available Linux commands, the wget allows you to download files using HTTP, HTTPS, and FTP protocols.
In its simplest form, when used without whatsoever option, wget will download the resource specified in the URL to the electric current directory, for example:
Rename file
Sometimes, y'all may want to salve the downloaded file under a different name. To do that, simply pass the -O option followed by the new name:
!wget https://example.com/cats_and_dogs_filtered.nil \
-O new_cats_and_dogs_filtered.cipher Relieve file to a specific location
By default, wget volition save files in the current working directory. To save the file to a specific location, use the -P option:
!wget https://example.com/cats_and_dogs_filtered.cipher \
-P /tmp/ Invalid HTTPS SSL certificate
If yous desire to download a file over HTTPS from a host that has an invalid SSL certificate, you can pass the --no-bank check-document option:
!wget https://example.com/cats_and_dogs_filtered.naught \
--no-check-certificate Multiple files at once
If you lot want to download multiple files at once, use the -i option followed by the path to a file containing a list of the URLs to exist downloaded. Each URL needs to be on a separate line.
!wget -i dataset-urls.txt The following is an example shows dataset-urls.txt:
http://instance-1.com/dataset.nil
https://example-2.com/railroad train.csv
http://case-3.com/test.csv half dozen. Accessing Google Bulldoze by mounting information technology locally
Y'all can use the drive module from google.colab to mountain your Google Bulldoze to Colab.
from google.colab import drive bulldoze.mount('/content/bulldoze')
Executing the above statement, you will exist provided an authentication link and a text box to enter your authorization lawmaking.
Click the authentication link and follow the steps to generate your authorization code. Copy the code displayed and paste it into the text box every bit shown above. In one case it is mounted, yous should get a bulletin like:
Mounted at /content/bulldoze Afterward that, you should exist able to explore the contents via "Files explorer" and read the information as you would normally.
Finally, to unmount your Google Bulldoze:
drive.flush_and_unmount() 7. Loading Kaggle datasets
Information technology is possible to download any dataset seamlessly from Kaggle into your Google Colab. Hither is what yous demand to do:
Step 1: Download your Kaggle API Token: Go to Account and gyre downwards to the API section.
Past clicking "Create New API Token", a kaggle.json file volition be generated and downloaded to your local machine.
Step 2: Upload kaggle.json to your Colab project: for instance, you tin can import files module from google.colab, and call upload() to launch a File Upload dialog and select the kaggle.json from your local machine.
Step iii: Update KAGGLE_CONFIG_DIR path to the current working directory. Yous can run !pwd to get the electric current working directory and assign the value to os.environ['KAGGLE_CONFIG_DIR'] :
Step four: Finally, you should be able to run the post-obit Kaggle API to download datasets:
!kaggle competitions download -c titanic !kaggle datasets download -d alexanderbader/forbes-billionaires-2021-30
Annotation for the competition dataset, the Kaggle API should be available under the Data tab
For the general dataset, the Kaggle API can be accessed as follows:
Decision
Google Colab is a great tool for individuals who want to take advantage of the capabilities of high-end computing resources (like GPUs, TPUs) without beingness restricted by their price.
In this article, we accept gone through most of the means you can better your Google Colab feel by loading external data into Google Colab. I promise this article will aid y'all to salve time in learning Colab and Data Analysis.
Cheers for reading. Stay tuned if y'all are interested in the applied aspect of machine learning.
You may exist interested in some of my Pandas articles:
- 10 tricks for Converting numbers and strings to datetime in Pandas
- Using Pandas method chaining to ameliorate lawmaking readability
- How to practice a Custom Sort on Pandas DataFrame
- All the Pandas shift() you lot should know for data analysis
- When to utilise Pandas transform() function
- Pandas concat() tricks you should know
- Divergence betwixt utilize() and transform() in Pandas
- All the Pandas merge() you should know
- Working with datetime in Pandas DataFrame
- Pandas read_csv() tricks you should know
- 4 tricks you should know to parse date columns with Pandas read_csv()
More tutorials can be establish on my Github
Source: https://towardsdatascience.com/7-ways-to-load-external-data-into-google-colab-7ba73e7d5fc7
0 Response to "How to Upload Image Dataset in Google Colab"
Publicar un comentario