-
Notifications
You must be signed in to change notification settings - Fork 4
User manual: The ECA language
An ECA rule has the form (the clauses RULE and CONDITION are optional):
RULE: <name> EVENT: <event> CONDITION: <condition> ACTION: <actions>
This specifies that whenever event <event> occurs and <condition> is met, then the <actions> are executed. For example, the rule
EVENT: new_tweet ACTION: ntweets = ntweets + 1
counts tweets: whenever event new_tweet occurs, the variable ntweets is increased by one.
There are two other handy events: 'initalize' and 'finalize'. Any rules attached to the first event are triggered when the rule engine starts, i.e., before any of the new_tweet events. Analogously, any rules attached to 'finalize' are triggered at the end when all new_tweet events have occurred.
By the way, variables need to be declared, so you’d need to preceed the above rule with
DECLARE ntweets = 0
which besides declaring the variable, also initializes it with zero.
The ECA rule language also supports constants and a variety of data types (almost everything of Python using a syntax a similar to Python as possible). For example,
CONSTANT update_frequency = 100 CONSTANT admiration_words = ['wow', 'outstanding', 'cool', 'perfect'] CONSTANT admiration_weight = { wow:2, outstanding:3, cool:1, perfect:3 }
The second example illustrates a list of strings. With admiration_words[1] you retrieve the second word (the first element of a list has position 0). The third example illustrates a 'dictionary'. This is a more advanced form of list where you do not access the elements by position, but with a 'key': admiration_weight['cool'] evaluates to 1 and admiration_weight[admiration_words[1]] determines the weight of the second word 'outstanding', i.e., 3.
An event is actually an object that contains information about the event. For example, the "new_tweet" event object has attributes like 'text' with the text of the tweet (see Reference manual: Events). You can use these attributes in the condition and actions. For example, the rule
CONSTANT longtweetlength = 50 DECLARE nlongtweets = 0 EVENT: new_tweet CONDITION: len(new_tweet.text) >= longtweetlength ACTION: nlongtweets = nlongtweets + 1
counts the number of long tweets. Note that you can simply add these lines to the rule we saw earlier. It is not a problem to have two rules on the same event. They are independent of each other. Every rule on a certain event will be executed in some predetermined order; usually this order does not even matter. The rules above will simultaneously count the total number of tweets as well as the number of long tweets.
The example of counting long tweets also illustrates how to use functions. The ECA language supports many functions on numbers, strings, lists, etc. In fact, many of the functions offered by Python are supported in ECA as well (see Functions and Modules). If you need one that is not supported, you can easily add it yourself (see Writing your own module).
You can also create your own events with NEWEVENT which trigger their associated rules … which in turn may also create new events, etc. In this way, you can break up complex functionality into smaller chunks.
For example, a word cloud is nothing more than a visualization of the importance of words. The more important a word, the larger its font. In essence, the data 'behind' is just a dictionary which stores per word its importance. If, for simplicity, we use as importance of a word the number of times it occurs, then determining the data behind a word cloud boils down to counting words.
We can accomplish this with a rather complex rule such as:
DECLARE word_importance = {} EVENT: new_tweet ACTION: FORALL w IN getwords(new_tweet.text): IF w IN word_importance THEN word_importance[w] = word_importance[w] + 1 ELSE word_importance[w] = 1
The action clause is rather complex. It determines the individual words of the tweet with getwords(new_tweet.text). It loops over these words with FORALL w IN …: …. And for each word, it tests whether or not the word is already in the word_importance dictionary. If so, then it increases the count of the word by one. If not, then it initializes the count of the word with one.
The rule is not only complex, but also not very reusable. Any other functionality involving individual words needs to repeat the FORALL. A much more elegant solution is to generate an event of your own per word:
DECLARE word_importance = {} EVENT: new_tweet ACTION: FORALL w IN getwords(new_tweet.text): NEWEVENT new_word {word:w} EVENT: new_word CONDITION: new_word.word IN word_importance ACTION: word_importance[new_word.word] += 1 EVENT: new_word CONDITION: NOT(new_word.word IN word_importance) ACTION: word_importance[new_word.word] = 1
The rules in this specification are each simpler. Moreover, any other functionality involving words can easily be attached to the user-defined event new_word, for example, the rule
DECLARE nlongwords = 0 EVENT: new_word CONDITION: len(new_word.word) > 10 ACTION: nlongwords = nlongwords + 1
counts long words reusing the rule that splits a tweet into individual words.
This is exactly how you are intended to proceed when developing a TweetBoard: start simple. With a few lines, you can have a working prototype of a dashboard with a simple visualization. One can than iteratively add more functionality and visualizations or improve the existing ones. Each such addition or improvement typically involves just the addition of a few rules or the replacement of a few rules by others. In this way, the dashboard gradually grows and improves with a working prototype after each iteration.
The ECA language supports a function called csv_read_kv_dict which reads key-value pairs from a CSV-file into a dictionary. It takes three parameters: the name of the file, the column for the keys, and the column for the values.
Many tools, such as LightSide, can export a resulting classification in a CSV-file. The file "modules/tweet_sentiment.csv" is an example of such a file. It contains lines like:
ID,label,predicted_label,text 303937526471725056,-1,-1,Batavierenrace door #Ulft: Eind april klinkt jaarlijks het startschot voor de grootste estafetteloop v... http://t.co/hijGD3pk #Enschede 303959452221046784,-1,-1,Organiseer jij een NSK in 2013? Meld je dan aan voor de NSK bijeenkomst van komende donderdag! @NSAFZeuS @Batavierenrace Graag RT!
The first line is a header row explaining that the first column is an ID (the tweet id), the second column is the label (class/sentiment of the tweet), the third column is the predicted label (predicted sentiment of the tweet), and the fourth column is the tweet itself.
You can import this file into variable "sentiment" with the following line.
DECLARE sentiment = csv_read_kv_dict('modules/tweet_sentiment.csv',0,1)
If you would like to use these sentiments in an ECA rule, you could write something like:
DECLARE n_negative_tweets = 0 EVENT: new_tweet CONDITION: sentiment[new_tweet.id] == -1 ACTION: n_negative_tweets += 1
This counts all negative tweets.
Other purposes than sentiments can be achieved using this functionality. You can train a classifyer for any other set of classes, for example, enthusastic and not enthusiastic, or aggressive and not aggressive. Also you can import any other CSV-files with key-value pairs in them.