Skip to content

problems with treedb integration

unenglishable edited this page Dec 17, 2014 · 4 revisions

The Problem with Treedb Integration

Hi, welcome to the writeup for treedb integration problems! This is to help keep track of the current changes necessary to complete treedb integration with core. Also included are details as to why these changes are important.

We are currently working to stabilizing the core API with postgres as a backend and will return to treedb as a candidate for optimization. We can then use the stable API to build backend(s) that conform to that standard. In the meantime, we will be considering redesign options on treedb that will make it work more intuitively with core.

Updates

Treedb does not support updates. It does not support them well. It does not support them at all. (it does not support them in a box; it does not support them with a fox)

In what way does it not support them? That's a terrific question.

tl;dr: treedb needs to support updates more intuitively. A little consideration towards redesign might shed light on a good solution.

Storage by id

This problem has been addressed with a short-term workaround, allowing id in the options argument to store(). This seems to work well, until you consider that more data needs to be updated than the stored object itself.

Updating parent-child relationships

When moving a child to a different parent, it is necessary to delete the old parent relationship of the same type, replacing it with the new parent relationship. This is necessary for the tree format of the database, and is probably not too difficult or costly to do. However, there are more complications when considering the update of indexes when relationships change.

The current scheme for parent-child relationships is to query by key in the roots database for parents and with children() for children. The key scheme is: [parentType, parentKey, childType, childKey].

Updating indexes

Indexes need to handle updates in two ways; neither of which are handled at the moment.

Problems with the difficulty of implementing index updates stem from how indexing works in treedb. Put simply, objects are sorted by key and returned in that order (or in reverse order, when the option is set). This type of sorting is perfect for sorting on fields that will never change (like created_at), but as you will see, this is not the case for values that will be changing. Since the key is the only important part, it follows that updates to index are done by changing the key.

For a normal object storage, an update can simply be an overwrite of the existing data on a particular key. However, since we are storing data in the key, we need to reconstruct and delete the old key before writing a new key to the database. It is not as simple as overwriting the old key with a new one.

This may not be a problem in itself, but it does warrant investigation on a better technique for implementing indexes.

Updating metadata

Metadata is decentralized and aggregative. It is decentralized in the sense that each object has its own metadata store; data which is aggregated and propagates up to parent types.

At the moment, it is not clear what exactly should be stored in metadata. Fields such as created_at and updated_at may not actually be "metadata". For posts, which use a version system, all versions of a post are simply children post-versions of the parent post. In this case, updated_at for a post is an integral part of the post-version, as it indicates when the post-version was created. In effect, this is the post-version's created_at.

Anyhow, updates to metadata are different from the instantiation of metadata. Currently, only the instantiation type is handled. This creates a new store in metadata for a particular key and updates whatever metadata it needs to in each parent (recursively, or not).

-- This is the end of the document. --