Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

202205.csv contains only 859188 records instead of 1M #5

Open
yohhaan opened this issue Apr 2, 2024 · 0 comments
Open

202205.csv contains only 859188 records instead of 1M #5

yohhaan opened this issue Apr 2, 2024 · 0 comments

Comments

@yohhaan
Copy link

yohhaan commented Apr 2, 2024

Hello,

Thank you for maintaining this repository and cached versions of crux-top-list.

202205.csv contains only 859188 records instead of the usual 1M. Can the corresponding list be regenerated and updated here or is the data also missing from Google's BigQuery database?

>>> import pandas as pd
>>> df = pd.read_csv("202205.csv")
>>> df
                                       origin     rank
0                          http://iporntv.net     1000
1       https://eldenring.wiki.fextralife.com     1000
2                 https://m.lightinthebox.com     1000
3                          https://ssc.nic.in     1000
4                  https://ja.m.wikipedia.org     1000
...                                       ...      ...
859183    https://www.vulcaodaborracha.com.br  1000000
859184                     https://www.vub.be  1000000
859185     https://www.virginianaturalgas.com  1000000
859186         https://www.virtualregatta.com  1000000
859187                https://zamosc.lento.pl  1000000

[859188 rows x 2 columns]
>>> df.groupby("rank").nunique()
         origin
rank           
1000        904
10000      7806
100000    76566
1000000  773912

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant