Todo List Check
- Find Website to collect data. )
- Using
BeautifulSoup
to parse HTML raw data - Store data in csv files
- Command-line user interface
- Provide answers to each questions
- Print Bueautifier
- Add wait time to mimic data fetching
What did I learned
- A new python package called
pandas
- Merging two different csv files using
pd.merge
- Applying lambda function all the value of a sepecific column
Setbacks
- There are a lot of bugs when I was dealing with pandas
For example, when I was using
sort_values
# works
tmp = data.sort_values(by=['subs_new'], ascending=False)['channel_name'][0:10].to_string(index=False)
# doesn't work
tmp = data.sort_values(by=['subs_new'], ascending=False)['channel_name','subs'][0:10].to_string(index=False)
print(tmp)
It turns out I cann’t use ['channel_name', 'subs']
like this. The right way is to use it like this:
data['subs_new'] = data['subs'].apply(lambda x: format_numbers(x))
tmp = data.sort_values(by=['subs_new'], ascending=False)
print_beautifier(tmp[['channel_name', 'subs']][0:10].to_string(index=False))