Final Project Update

by zengjilie

07 Dec 2022

Todo List Check

  • Find Website to collect data. )
  • Using BeautifulSoup to parse HTML raw data
  • Store data in csv files
  • Command-line user interface
  • Provide answers to each questions
  • Print Bueautifier
  • Add wait time to mimic data fetching

Reflections

This was a very challenging but also fun task for me. I’m a huge YouTube obsessive, so I chose YouTube facts as the topic. I spent a lot of time finding the perfect data for my project. Unfortunately, the only website, “socialblade.com”, which satisfies my needs is not a free resource, so I figured out I could use a Python scraper to extract all the data from the website.

Did I achieved my goals?

Yes, especially the parsing HTML part. Using regular expressions is one of the most difficult parts of this project.

Setbacks

I shouldn’t have used a traditional for loop.

all_lines = ...
for i in range(len(all_lines)):
  all_lines[i] = '|' + all_lines[i][:div_index] + '|' + all_lines[i][div_index] + '|\n'

Instead, I can just use this one-line for loop, which is a cleaner approach

all_lines = ['|' + i[:div_index] + '| ' + i[div_index:] + '|\n' for i in all_lines]

What have I learned?

  • Beautiful Soup
  • Pandas
  • Regular expression
  • f-string

What could be done better?

I could add some data visualization to make the interface more engaging.

Demo

Here's a little about zengjilie Find zengjilie on Twitter, Github, and on the web.