Jupyter Notebook and Python code to query Twitter data using the Twitter API v2.
Submit "simple" queries to the following Twitter API v2 endpoints:
- Tweets Counts Recent - See: Tweet Counts API Reference
- Tweets Search Recent - See: Search Tweets API Reference
The query results are returned in a Pandas Dataframe. Hence the assumption of "simple" queries that will return non-nested "tabular" results.
The code handles making iterative API requests in response to receipt of Twitter "next_token" keys. Some limited HTTP Request error handling is also built in to allow the code to fail silently if the HTTP status code does not equal 200. Finally, a log file is automatically generated to allow tracking of the API request iterations for audit purposes.
You need to apply for a Twitter Developer account with "Elevated" access.
See: How to get access to the Twitter API
With elevated access you will be allowed to query "recent" Twitter data for the last 7 days.
Then using your (new) Twitter Developer account you need to generate a Twitter API "Bearer Token" for authentication.
Once you have your Twitter Bearer Token, copy and paste this into the attached "twitter_bearer_token.py" file as follows:
def set_environment_variable():
os.environ["BEARER_TOKEN"]="<Paste your Bearer Token here inside the quotation marks>"
return
The underlying Python code used for accessing the Twitter endpoints is sourced from twitterdev on GitHub here:
In particular:
Recent Tweet Counts Python code
The source Twitter API v2 Sample code is used under the this license. Note that the Twitter sample source code has been modified for use here.
Two Jupyter Notebooks are provided as the main entry points to run the code:
- twitter_count_recent_main.ipynb
- twitter_search_recent_main.ipynb
These can be run directly from Jupyter. Alternatively the main() function within each notebook may be called from an external application.
For more information about the available Twitter API Query Parameters see the following:
Building queries for Tweet Counts
Building queries for Search Tweets
Note that the Twitter API v2 rate limits are currently set as follows:
- Tweet Counts Recent: 300 requests per 15 minute window
- Search Tweets Recent: 450 requests per 15 minute window
See: Twitter API v2 Rate Limits
These rate limits largely affect the searching of tweets. The maximum number of tweets ("max_results" query parameter) that can be returned per request is 100. So in a 15 minute window you may only return a maximum of 450 x 100 = 45000 tweets. Then you will need to wait 15 minutes before trying again.
Also in the case of Search Tweets there is a tweet cap of 2 million tweets per month (for "elevated" access).
See: Twitter API v2 Tweet Caps
Each time the main() code is run (either using twitter_count_recent_main.ipynb or twitter_search_recent_main.ipynb) a new log file will be automtically generated.
This is achieved by making two seperate calls to a function in the attached cedarlog.py module. The first call (at the beginning of the main program) will generate a log file name, then redirect the Python standard output to this log file. The second call (at the end of the main program) will close the log file then read the saved contents back to the default standard output.
Note: you will first need to manually create a folder to store the log files. The folder should be relative to the location of the Python files (eg, /log_files)
For more information please see: Cedarlog Repo
Happy to respond to any comments and answer any questions. Please feel free to drop me a line.
Copyright 2022 Cedarwood Insights Limited.
Licensed under the Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0