-
Notifications
You must be signed in to change notification settings - Fork 2
Posting List Concepts
For simplicity and flexibility, we distinct two different types of inverted lists:
- document list,
- payload list.
A document list contains document IDs (implementation of the underlying structure may vary and is beyond the scope of this document). Typically, that means we can perform operations such as next
or nextgeq
(details below).
A payload list contains any data related to the documents in a particular document list. Examples are: document frequency list, BM25 score list, and query likelihood list. As opposed to a document list, a payload list does not support nexteq
but must provide an align
method that moves the pointer to where the document list points.
For many reasons (which might be written down another time), no inheritance is used. Instead, we rely on templates for compile-time polymorphism, and type erasure (if needed) for run-time polymorphism.
A document list view is an object containing information needed to traverse a document list. For example, such view might decode a list block boundaries to later decode one block at a time when needed. (However, it might as well decompress all data right away, this is an implementation detail.)
// Example class definition.
class document_list_view {
public:
using value_type = /* implementation detail */;
using iterator_type = /* implementation detail */;
document_list_view();
document_list_view(const document_list_view&);
document_list_view(document_list_view&&);
document_list_view& operator=(const document_list_view&);
document_list_view& operator=(document_list_view&&);
document_list_view(memory_view data);
iterator_type begin() const;
iterator_type end() const;
iterator_type lookup(value_type doc) const;
};
value_type
The type of document IDs.
iterator_type
The type list iterator. See Document List Iterator.
document_list_view();
Constructs an empty list.
document_list_view(const document_list_view&);
document_list_view(document_list_view&&);
document_list_view& operator=(const document_list_view&);
document_list_view& operator=(document_list_view&&);
Copy and move constructors and assignment operators.
document_list_view(memory_view data);
Reads and stores list's metadata. Some implementations might read a few pointers and resolve the rest during iteration; other might read all available information about the blocks. This implementation specific.
iterator_type begin() const;
iterator_type end() const;
Creates an iterator ("opens" a posting list) pointing to the beginning and the end of the list. See Document List Iterator for more information about iterators.
iterator_type lookup(value_type doc) const;
Returns an iterator pointing to either doc
or the next greater ID. If no such ID exists, it will return end()
.
A document iterator takes care of traversing, skipping, decoding blocks, etc. Since it is read-only, this is a const
iterator. However, it is not thread-safe; two threads can use one posting list but not one iterator.
// Example class definition.
class document_list_iterator {
public:
using value_type = /* implementation detail */;
using reference = /* implementation detail */;
document_list_iterator();
document_list_iterator(const document_list_view&);
document_list_iterator(document_list_view&&);
document_list_iterator& operator=(const document_list_view&);
document_list_iterator& operator=(document_list_view&&);
/* Implementation-dependent constructors */
bool operator!=(const document_list_iterator&);
reference operator*();
document_list_iterator& operator++();
document_list_iterator operator++(int);
document_list_iterator& moveto(const value_type& doc);
document_list_iterator nextgeq(const value_type& doc);
};
value_type
The type of document IDs in the list.
reference
The type of reference, by default const value_type&
.
document_list_iterator();
Default constructor, points to the beginning of the list.
document_list_iterator(const document_list_view&);
document_list_iterator(document_list_view&&);
document_list_iterator& operator=(const document_list_view&);
document_list_iterator& operator=(document_list_view&&);
Copy and move constructors and assignment operators. Note that typically an iterator owns come decompressed data, it is unwise to keep copying it too often. Rather, use one iterator copy for one iteration; use move semantics if needed.
bool operator!=(const document_list_iterator&);
Compares two iterators on the same list. Warning: for efficiency reasons, testing association to the same list is inadvisable, therefore proper care should be exercised to compare only iterators belonging to a single list.
reference operator*();
Dereferences the iterator.
document_list_iterator& operator++();
document_list_iterator operator++(int);
Moves the iterator to the next position. Warning: if there is no specific reason to copy the iterator, use prefix operator ++it
rather it++
to avoid unnecessary copying.
document_list_iterator& moveto(const value_type& doc);
Moves the iterator to either doc
or the next greater ID. It will point to the end of the list (after the last element) in case such document ID does not exist in the list.
document_list_iterator nextgeq(const value_type& doc);
Similar to moveto
, but returns a copy. Use only for a good reason when a copy is necessary.
We can also distinguish a forward document iterator, which does not define nextgeq
or moveto
. Such iterator can be read only sequentially. E.g., an exhaustive TAAT processing algorithm only requires a forward iterator without the ability to look up documents.
A payload list is similar to a document list with some important differences.
First, it can potentially store an arbitrary type of values. Nevertheless, we expect it to be a primitive type, such as int
, long
, or double
, with emphasis on storing integers, such as frequencies or quantized scores.
Second, although a payload list can in theory exist on its own, it is intended to be used along with a respective document list. Therefore, the lookup functionality is replaced with operation to efficiently align a payload iterator with its document list iterator.
// Example class definition.
class payload_list_view {
public:
using value_type = /* implementation detail */;
using iterator_type = /* implementation detail */;
payload_list_view();
payload_list_view(const payload_list_view&);
payload_list_view(payload_list_view&&);
payload_list_view& operator=(const payload_list_view&);
payload_list_view& operator=(payload_list_view&&);
payload_list_view(memory_view data);
iterator_type begin() const;
iterator_type end() const;
template<class DocumentIterator> iterator_type at(DocumentIterator doc_iter) const;
};
value_type
The type of payloads.
iterator_type
The type list iterator. See Payload List Iterator.
payload_list_view();
Constructs an empty list.
payload_list_view(const payload_list_view&);
payload_list_view(payload_list_view&&);
payload_list_view& operator=(const payload_list_view&);
payload_list_view& operator=(payload_list_view&&);
Copy and move constructors and assignment operators.
payload_list_view(memory_view data);
Reads and stores list's metadata. Some implementations might read a few pointers and resolve the rest during iteration; other might read all available information about the blocks. This implementation specific.
iterator_type begin() const;
iterator_type end() const;
Creates an iterator ("opens" a posting list) pointing to the beginning and the end of the list. See Payload List Iterator for more information about iterators.
template<class DocumentIterator> iterator_type at(DocumentIterator doc_iter) const;
Returns an iterator pointing to the payload corresponding to doc_iter
.
// Example class definition.
class payload_list_iterator {
public:
using value_type = /* implementation detail */;
using reference = /* implementation detail */;
payload_list_iterator();
payload_list_iterator(const payload_list_iterator&);
payload_list_iterator(payload_list_iterator&&);
payload_list_iterator& operator=(const payload_list_iterator&);
payload_list_iterator& operator=(payload_list_iterator&&);
/* Implementation-dependent constructors */
bool operator!=(const payload_list_iterator&);
reference operator*();
payload_list_iterator& operator++();
payload_list_iterator operator++(int);
payload_list_iterator& align(const value_type& doc);
document_list_iterator nextgeq(const value_type& doc);
template<class DocumentIterator> document_list_iterator align(DocumentIterator dit);
};
value_type
The type of payload in the list.
reference
The type of reference, by default const value_type&
.
payload_list_iterator();
Default constructor, points to the beginning of the list.
payload_list_iterator(const payload_list_iterator&);
payload_list_iterator(payload_list_iterator&&);
payload_list_iterator& operator=(const payload_list_iterator&);
payload_list_iterator& operator=(payload_list_iterator&&);
Copy and move constructors and assignment operators. Note that typically an iterator owns come decompressed data, it is unwise to keep copying it too often. Rather, use one iterator copy for one iteration; use move semantics if needed.
bool operator!=(const payload_list_iterator&);
Compares two iterators on the same list. Warning: for efficiency reasons, testing association to the same list is inadvisable, therefore proper care should be exercised to compare only iterators belonging to a single list.
reference operator*();
Dereferences the iterator.
payload_list_iterator& operator++();
payload_list_iterator operator++(int);
Moves the iterator to the next position. Warning: if there is no specific reason to copy the iterator, use prefix operator ++it
rather it++
to avoid unnecessary copying.
template<class DocumentIterator> document_list_iterator align(DocumentIterator dit);
Aligns the iterator to a corresponding document iterator.