Scraping Photos and Descriptions from an Instagram account using Python

Darshan Majithiya
3 min readJan 31, 2019

Recently, I had to learn the Web Scraping technique for one of my ongoing college projects. So, to portray the things I learned, I decided to do this project and write my first medium blog about it.

What this project can do?

It can be used to scrape photos from any Instagram account (Offcourse, only if you follow that account or it’s an open account) and write the photo description for each photo to Excel Sheet.

Dependencies

You need Python 3.x on your system to run this program. Also, there are some handful of libraries you need to install beforehand. You can easily install them using python package manager pip.

  • requests 2.x
  • xlsxwriter 1.x
  • BeautifulSoup 4.x
  • lxml 4.x (For parsing the data. Although, you can use any other parser as well.)
  • Selenium 3.x

Also, Browser Driver will be required to control it. I’ll be using ChromeDriver for Linux.

You can download ChromeDriver at the following page, select the latest release and then download the package dedicated to your operating system (Linux, Mac, or Windows) into your machine: https://sites.google.com/a/chromium.org/chromedriver/downloads

If you prefer any other browser or are operating on any other OS, then you can use driver exclusive to that browser and to that OS. You can change the path to the driver in instaScraper.py by changing the value of self.driver.

Execution

You can clone the project code by running this in the terminal or you know, simply download it.

git clone https://github.com/darshan-majithiya/Scraping-Photos-and-Descriptions-from-an-Instagram-account.git

For executing the program, run the main.py file in the terminal as

python main.py

Implementation Structure

  • First, you’ll need to enter your account Username and Password.
  • Then enter the Target Username. (Remember, you must either follow that account or it should be an open account.)
Execution of code

As you enter the credentials and target username, the program will execute as:

  • Opens the Instagram homepage and login into your account.
  • Creates data>target_username>descriptions and data>target_username>images folders if they don’t already exist.
  • Closes the Turn-on Notification dialog box if it’s there.
  • Redirects to Target users profile.
  • Gets the number of posts.
  • Loads all the posts and fetches the URL of each image.
  • Writes the descriptions of each image to the Excel Sheet.
  • Downloads the Target account’s images.
  • Ends the execution.

Now our job is complete! You can simply run the code, output the images & descriptions, and pray Instagram doesn’t kick your butt off. :D

Restrictions

  • Instagram code tends to change from time to time, so you might need to change some of the XPath expressions when you run this code.
  • Turn-off the Two-factor Authentication in your Instagram account.
  • This program won’t download the videos.
  • A good internet connection might actually decrease the program execution time. But if you have slow internet, then you can change the wait time from 10 to whatever suits you in all the lines that match the below code in instaScraper.py file.
WebDriverWait(self.driver,10)
  • You must either follow the target account or it must be an open account.

If you enjoyed this article, feel free to clap many times (you know you want to!) and share with a friend. You can also leave a comment to ask a question or tell me how to improve. :)

Darshan Majithiya is a final year IT engineering student. I’m passionate about Data Science and Web Development. I believe Data and memorable User Experience are the two most important pillars of any intelligent product. Connect with me on LinkedIn or say hi on Twitter.

--

--

Darshan Majithiya

Data Scientist @ PharmEasy | Google Cloud Certified Professional Data Engineer