Skip to content
This repository has been archived by the owner on Feb 23, 2022. It is now read-only.

xdevplatform/Gnip-Analysis-Tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The gnip_analysis_tools Package

This package provides useful class definitions for configuring scripts in the Gnip-Analysis-Pipeline package. The intention is that you work from a working directory (we'll call this "TEST"), and that both gnip_analysis_pipeline and gnip_analysis_tools are installed as packages. Remember that these packages can be installed from the cloned repo location with:

[REPO] $ pip install -e .

Enrichments

According to the Gnip-Analysis-Pipeline docs, we configure enrichments by defining the enrichment_class_list variable in a configuration file.

The enrichments directory in this package contains files that define a base enrichment class along with some other helpful enrichment classes, including a simple example. To use the test enrichment from your working directory, you would create an enrichments configuration file (called my_enrichments.py):

from gnip_analysis_tools.enrichments import test_enrichment

enrichment_class_list = [test_enrichment.TestEnrichment]

We can the enrich the Tweets in my_tweets.json as follows:

[TEST] $ cat my_tweets.json | tweet_enricher.py -c my_enrichments.py > my_enriched_tweets.json

To configure an NLP enrichment with NLTK, we provide nltk_enrichment.py, which can be configured like:

from gnip_analysis_tools.enrichments import nltk_enrichment

enrichment_class_list = nltk_enrichment.nltk_enrichments_list

Notice that this module has conveniently defined the list of enrichment classes.

A custom enrichment class can be defined locally:

from gnip_analysis_tools.enrichments import base_enrichment

class MyEnrichment(enrichment_base.BaseEnrichment):
    def enrichment_value(self,tweet):
        return "my_test_enrichment_value"

enrichment_class_list = [MyEnrichment] 

Measurements

According to the Gnip-Analysis-Pipeline docs, we configure measurementss by defining the measurements_class_list variable in a configuration file.

The measurements directory in this package contains files that contain a variety of base/helper classes for construction measurement classes. To use the test measurement from your working directory, you would create an enrichments configuration file (called my_measurements.py):

from gnip_analysis_tools.measurements.test_measurements import TweetCounter,ReTweetCounter

measurement_class_list = [TweetCounter,ReTweetCounter]

We can the build time series from the Tweets in my_enriched_tweets.json as follows:

[TEST] $ cat my_enriched_tweets.json | tweet_time_series_builder.py -c my_measurements.py > time_series.csv

(Note that none of the enrichments we added in the previous section are required to build the specified time series.)

To construct a time series for each observed hashtag, we can define a class locally that inherits key functionality from classes in measurement_base.py:

from gnip_analysis_tools.measurements.measurement_base import Counters

class HashtagCounters(Counters):
    def update(self,tweet):
        for item in tweet['twitter_entities']['hashtags']:
        # put a # in from of the term,
        # since they've been removed in the payload
        self.counters['#'+item['text']] += 1

measurement_class_list = [ HashtagCounters ]

See measurement_base.py for a full description of how to create custom measurement classes.

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published