Skip to content

User manual: The ECA language

mvankeulen edited this page Oct 2, 2013 · 5 revisions

Rules

An ECA rule has the form (the clauses RULE and CONDITION are optional):

RULE: <name>
EVENT: <event>
CONDITION: <condition>
ACTION: <actions>

This specifies that whenever event <event> occurs and <condition> is met, then the <actions> are executed. For example, the rule

EVENT: new_tweet
ACTION: ntweets = ntweets + 1

counts tweets: whenever event new_tweet occurs, the variable ntweets is increased by one.

There are two other handy events: 'initalize' and 'finalize'. Any rules attached to the first event are triggered when the rule engine starts, i.e., before any of the new_tweet events. Analogously, any rules attached to 'finalize' are triggered at the end when all new_tweet events have occurred.

Variables

By the way, variables need to be declared, so you’d need to preceed the above rule with

DECLARE ntweets = 0

which besides declaring the variable, also initializes it with zero.

Constants and complex data types

The ECA rule language also supports constants and a variety of data types (almost everything of Python using a syntax a similar to Python as possible). For example,

CONSTANT update_frequency = 100
CONSTANT admiration_words = ['wow', 'outstanding', 'cool', 'perfect']
CONSTANT admiration_weight = { wow:2, outstanding:3, cool:1, perfect:3 }

The second example illustrates a list of strings. With admiration_words[1] you retrieve the second word (the first element of a list has position 0). The third example illustrates a 'dictionary'. This is a more advanced form of list where you do not access the elements by position, but with a 'key': admiration_weight['cool'] evaluates to 1 and admiration_weight[admiration_words[1]] determines the weight of the second word 'outstanding', i.e., 3.

Event object

An event is actually an object that contains information about the event. For example, the "new_tweet" event object has attributes like 'text' with the text of the tweet (see Reference manual: Events). You can use these attributes in the condition and actions. For example, the rule

CONSTANT longtweetlength = 50
DECLARE nlongtweets = 0

EVENT: new_tweet
CONDITION: len(new_tweet.text) >= longtweetlength
ACTION: nlongtweets = nlongtweets + 1

counts the number of long tweets. Note that you can simply add these lines to the rule we saw earlier. It is not a problem to have two rules on the same event. They are independent of each other. Every rule on a certain event will be executed in some predetermined order; usually this order does not even matter. The rules above will simultaneously count the total number of tweets as well as the number of long tweets.

Functions

The example of counting long tweets also illustrates how to use functions. The ECA language supports many functions on numbers, strings, lists, etc. In fact, many of the functions offered by Python are supported in ECA as well (see Functions and Modules). If you need one that is not supported, you can easily add it yourself (see Writing your own module).

Creating new events

You can also create your own events with NEWEVENT which trigger their associated rules …​ which in turn may also create new events, etc. In this way, you can break up complex functionality into smaller chunks.

For example, a word cloud is nothing more than a visualization of the importance of words. The more important a word, the larger its font. In essence, the data 'behind' is just a dictionary which stores per word its importance. If, for simplicity, we use as importance of a word the number of times it occurs, then determining the data behind a word cloud boils down to counting words.

We can accomplish this with a rather complex rule such as:

DECLARE word_importance = {}

EVENT: new_tweet
ACTION: FORALL w IN getwords(new_tweet.text):
   IF w IN word_importance
   THEN word_importance[w] = word_importance[w] + 1
   ELSE word_importance[w] = 1

The action clause is rather complex. It determines the individual words of the tweet with getwords(new_tweet.text). It loops over these words with FORALL w IN …​: …​. And for each word, it tests whether or not the word is already in the word_importance dictionary. If so, then it increases the count of the word by one. If not, then it initializes the count of the word with one.

The rule is not only complex, but also not very reusable. Any other functionality involving individual words needs to repeat the FORALL. A much more elegant solution is to generate an event of your own per word:

DECLARE word_importance = {}

EVENT: new_tweet
ACTION: FORALL w IN getwords(new_tweet.text): NEWEVENT new_word {word:w}

EVENT: new_word
CONDITION: new_word.word IN word_importance
ACTION: word_importance[new_word.word] += 1

EVENT: new_word
CONDITION: NOT(new_word.word IN word_importance)
ACTION: word_importance[new_word.word] = 1

The rules in this specification are each simpler. Moreover, any other functionality involving words can easily be attached to the user-defined event new_word, for example, the rule

DECLARE nlongwords = 0

EVENT: new_word
CONDITION: len(new_word.word) > 10
ACTION: nlongwords = nlongwords + 1

counts long words reusing the rule that splits a tweet into individual words.

A few notes on method

This is exactly how you are intended to proceed when developing a TweetBoard: start simple. With a few lines, you can have a working prototype of a dashboard with a simple visualization. One can than iteratively add more functionality and visualizations or improve the existing ones. Each such addition or improvement typically involves just the addition of a few rules or the replacement of a few rules by others. In this way, the dashboard gradually grows and improves with a working prototype after each iteration.

Using the output of a classifier

The ECA language supports a function called csv_read_kv_dict which reads key-value pairs from a CSV-file into a dictionary. It takes three parameters: the name of the file, the column for the keys, and the column for the values.

Many tools, such as LightSide, can export a resulting classification in a CSV-file. The file "modules/tweet_sentiment.csv" is an example of such a file. It contains lines like:

ID,label,predicted_label,text
303937526471725056,-1,-1,Batavierenrace door #Ulft: Eind april klinkt jaarlijks het startschot voor de grootste estafetteloop v... http://t.co/hijGD3pk #Enschede
303959452221046784,-1,-1,Organiseer jij een  NSK in 2013? Meld je dan aan voor de NSK bijeenkomst van komende donderdag! @NSAFZeuS @Batavierenrace Graag RT!

The first line is a header row explaining that the first column is an ID (the tweet id), the second column is the label (class/sentiment of the tweet), the third column is the predicted label (predicted sentiment of the tweet), and the fourth column is the tweet itself.

You can import this file into variable "sentiment" with the following line.

DECLARE sentiment = csv_read_kv_dict('modules/tweet_sentiment.csv',0,1)

If you would like to use these sentiments in an ECA rule, you could write something like:

DECLARE n_negative_tweets = 0

EVENT: new_tweet
CONDITION: sentiment[new_tweet.id] == -1
ACTION: n_negative_tweets += 1

This counts all negative tweets.

Other purposes than sentiments can be achieved using this functionality. You can train a classifyer for any other set of classes, for example, enthusastic and not enthusiastic, or aggressive and not aggressive. Also you can import any other CSV-files with key-value pairs in them.