Final Project Update

by zengjilie

01 Dec 2022

Todo List Check

  • Find Website to collect data. )
  • Using BeautifulSoup to parse HTML raw data
  • Store data in csv files
  • Command-line user interface
  • Provide answers to each questions
  • Print Bueautifier
  • Add wait time to mimic data fetching

What did I learned

  • A new python package called pandas
  • Merging two different csv files using pd.merge
  • Applying lambda function all the value of a sepecific column

Setbacks

  • There are a lot of bugs when I was dealing with pandas For example, when I was using sort_values
# works
tmp = data.sort_values(by=['subs_new'], ascending=False)['channel_name'][0:10].to_string(index=False)
# doesn't work
tmp = data.sort_values(by=['subs_new'], ascending=False)['channel_name','subs'][0:10].to_string(index=False)
print(tmp)

It turns out I cann’t use ['channel_name', 'subs'] like this. The right way is to use it like this:

data['subs_new'] = data['subs'].apply(lambda x: format_numbers(x))
tmp = data.sort_values(by=['subs_new'], ascending=False)
print_beautifier(tmp[['channel_name', 'subs']][0:10].to_string(index=False))

Demo

Here's a little about zengjilie Find zengjilie on Twitter, Github, and on the web.