Skip to content

Hello Twitter! Teknek drinking from the hose

edwardcapriolo edited this page Jan 6, 2014 · 6 revisions

This examples comes from the class io.teknek.twitter.EndToEndTest.

The TwtterStreamFeed is layered on top of twitters hbc client. The credentials come from twitter so you can not run this example without your own.

@Test
public void hangAround() throws JsonGenerationException, JsonMappingException, IOException {
  p = new Plan().withFeedDesc(
        new FeedDesc().withFeedClass(TwitterStreamFeed.class.getName()).withProperties(getCredentialsOrDie()))
              .withRootOperator(new OperatorDesc(new EmitStatusAsTextOperator())
                .withNextOperator(new OperatorDesc(new BeLoudOperator())));
  p.setName("yell");
  p.setMaxWorkers(1);
  td.applyPlan(p);
  try {
    Thread.sleep(10000);
  } catch (InterruptedException e) { }
}

If we let this example run we get some random tweets.

{statusAsText=RT @Pesadohein: "Pode ser pra SEMPRE?" - FERRO TUDOOOOOOOO}
{statusAsText=RT @kabasimsek: abi sen dershaneye mi gittin hira mağarasına mı? @kemalgulen}
{statusAsText=RT @Maeecol: Nos vamos a Elche o no te gusta la aventura??  @Mariyeyes}
{statusAsText=RT @Zackkkk_: My circle so small I can talk to Myself.}
{statusAsText=@soykarolay Ire al de aqui de la casa toda las tardes.}
{statusAsText=@JulianRohatgi I would rather smash my face into rusty nails than study and take this math exam.}

That is nifty. Lets say we only want to extract URL's from the stream. Operators can be supplied a map of properties. In this case we made an operator called EmitFieldsMatchingPatter this is a general purpose operator that takes in a regular expression pattern and emits all matching tokens that match the url separately.

public void hangAround() throws JsonGenerationException, JsonMappingException, IOException {
  Map<String,Object> params = MapBuilder.makeMap(EmitFieldsMatchingPattern.SOURCE_FIELD, "statusAsText",
        EmitFieldsMatchingPattern.REGEX,  "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]" ) ;
  p = new Plan().withFeedDesc(
        new FeedDesc().withFeedClass(TwitterStreamFeed.class.getName()).withProperties(getCredentialsOrDie()))
              .withRootOperator(new OperatorDesc(new EmitStatusAsTextOperator())
                //.withNextOperator(new OperatorDesc(new BeLoudOperator())));
                .withNextOperator( new OperatorDesc(new EmitFieldsMatchingPattern()).withParameters(params))
                  .withNextOperator(new OperatorDesc(new BeLoudOperator())));

Then we can see our output is only urls.

DEBUG 22:37:34,757 No children operators for this operator. Tuple not being passed on {out=http://t.co/9q7ajc0acs}
DEBUG 22:37:34,759 No children operators for this operator. Tuple not being passed on {out=http://t.co/XWsPCJgeDf}

Next you could build an Operator that just ticks off Cassandra counters.