Titan is a free, open source database that is capable of processing extremely large graphs and it supports a variety of indexing and storage backends, which makes it easier to extend than some popular NoSQL Graph databases.
This docker image instantiaties a Titan graph database that is capable of integrating with an ElasticSearch container (Indexing) and a Cassandra container (Storage).
The default distribution of Titan runs on a single node, so I thought it would be helpful if there was a modular way at runtime to hook up Titan to its dependencies.
Enter Docker. Now it is possible to run Titan and it's dependencies in separate Docker containers.
This container is using Titan 1.0.0. Please refer to its page for more information.
TinkerPop is a vendor-independent API specification for manipulating and access Graph databases. This is using TinkerPop 3.0.1.
The minimum system requirements for this stack is 1 GB with 2 cores.
docker run -d --name es1 elasticsearch
docker run -d --name cas1 elubow/cassandra
docker run -d -P --name titan1 --link es1:elasticsearch --link cas1:cassandra elubow/titan-gremlin
I run with a 3 node Cassandra cluster and some local ports exported, like so:
docker run -d --name cas1 -p 7000:7000 -p 7001:7001 -p 7199:7199 -p 9160:9160 -p 9042:9042 elubow/cassandra
docker run -d --name cas2 --link cas1:cassandra elubow/cassandra start docker inspect --format '{{ .NetworkSettings.IPAddress }}' cas1
docker run -d --name cas3 --link cas1:cassandra elubow/cassandra start docker inspect --format '{{ .NetworkSettings.IPAddress }}' cas1
docker run -d --name es1 --link cas1:cassandra -p 9200:9200 elasticsearch
docker run -d --name titan1 --link es1:elasticsearch --link cas1:cassandra -p 8182:8182 -p 8184:8184 elubow/titan-gremlin
If you want to connect from a Gremlin client, download Titan.
Then create a properties file that looks like this where the storage.hostname
is the hostname or IP of docker.
storage.backend=cassandrathrift
storage.hostname=192.168.99.100
Then start the gremlin server by doing bin/gremlin.sh
and run the following commands inside the Gremlin console:
gremlin> graph = TitanFactory.open('/Users/elubow/tmp/local-gremlin.properties')
==>standardtitangraph[cassandrathrift:[192.168.99.100]]
gremlin> g = graph.traversal()
==>graphtraversalsource[standardtitangraph[cassandrathrift:[192.168.99.100]], standard]
gremlin> g.V()
==>v[4168]
NOTE: This will not use the elasticsearch backend.
8182: HTTP port for REST API 8184: JMX Port (You won't need to use this, probably)
To test out the REST API (over Boot2docker):
curl "http://192.168.99.100:8182?gremlin=100-1"
curl "http://192.168.99.100:8182?gremlin=g.addV('Name','Eric')"
curl "http://192.168.99.100:8182?gremlin=g.V()"
I've tested this container with the following containers:
- elubow/cassandra: This is the Cassandra Storage backend for Titan. It scales well for large datasets. Also forces Cassandra 2.1 as that's compatible with Titan.
- elasticsearch: This is the ElasticSearch Indexing backend for Titan. It provides search capabilities for Titan graph datasets.
In the near future, I'd like to add support for:
- Scaling/Clustering Cassandra and ElasticSearch backends.
- External volumes for persistent data.
- Security between Titan and its backends.
- Example application stack integrating with Titan.