Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRDT datatype implementation #18046

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions vlib/datatypes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ println(stack)
- [x] Min heap (priority queue)
- [x] Set
- [x] Quadtree
- [x] CRDT Conflict-free replicated data type
- [ ] ...
160 changes: 160 additions & 0 deletions vlib/datatypes/crdt/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# A V implementation of CRDTs

The following state-based counters and sets have currently been implemented.

## Counters

### G-Counter

A grow-only counter (G-Counter) can only increase in one direction.
The increment operation increases the value of current replica by 1.
The merge operation combines values from distinct replicas by taking the maximum of each replica.

```v
import datatypes.crdt { new_gcounter }

mut gcounter := new_gcounter()

// We can increase the counter monotonically by 1.
gcounter.increment()

// Twice.
gcounter.increment()

// Or we can pass in an arbitrary delta to apply as an increment.
gcounter.increment_value(2)

// Should print '4' as the result.
println(gcounter.value())

mut another_counter := new_gcounter()

// We can merge counter between each other
another_counter.merge(gcounter)
gcounter.merge(another_counter)
assert another_counter.value() == gcounter.value()
```

### PN-Counter

A positive-negative counter (PN-Counter) is a CRDT that can both increase or
decrease and converge correctly in the light of commutative
operations. Both `.increment()` and `.decrement()` operations are allowed and thus
negative values are possible as a result.

```v
import datatypes.crdt { new_pncounter }

mut pncounter := new_pncounter()

// We can increase the counter by 1.
pncounter.increment()

// Or more.
pncounter.increment_value(100)

// And similarly decrease its value by 1.
pncounter.decrement()

// Or more.
pncounter.decrement_value(100)

// End result should equal '0' here.
assert pncounter.value() == 0
```

## Sets

### G-Set

A grow-only (G-Set) set to which element/s can be added to. Removing element
from the set is not possible.

```v
import datatypes.crdt { new_gset }

mut obj := 'dummy-object'
mut gset := new_gset[string]()

gset.add(obj)

// Should always print 'true' as `obj` exists in the g-set.
println(gset.lookup(obj))
```

### 2P-Set

Two-phase set (2P-Set) allows both additions and removals to the set.
Internally it comprises of two G-Sets, one to keep track of additions
and the other for removals.

```v
import datatypes.crdt { new_two_phase_set }

obj := 'dummy-object'

mut twophaseset := new_two_phase_set[string]()

twophaseset.add(obj)

// Remove the object that we just added to the set, emptying it.
twophaseset.remove(obj)

// Should return 'false' as the obj doesn't exist within the set.
assert twophaseset.lookup(obj) == false
```

### LWW-Element-Set

Last-write-wins element set (LWW-Element-Set) keeps track of element additions
and removals but with respect to the timestamp that is attached to each
element. Timestamps should be unique and have ordering properties.

```v
import datatypes.crdt { new_lwweset }

obj := 'dummy-object'
mut lwweset := new_lwweset[string]()

// Here, we remove the object first before we add it in. For a
// 2P-set the object would be deemed absent from the set. But for
// a LWW-set the object should be present because `.add()` follows
// `.remove()`.
lwweset.remove(obj)
lwweset.add(obj)

// This should print 'true' because of the above.
assert lwweset.lookup(obj) == true
```

### OR-Set

An OR-Set (Observed-Removed-Set) allows deletion and addition of
elements similar to LWW-e-Set, but does not surface only the most recent one.
Additions are uniquely tracked via tags and an element is considered member of the set.
If the deleted set consists of all the tags present within additions.

```v
import datatypes.crdt { new_orset }

// object 1 == object 2
obj1 := 'dummy-object'
obj2 := 'dummy-object'

mut orset := new_orset[string]()

orset.add(obj1)
orset.add(obj2)

// Removing any one of the above two objects should remove both
// because they contain the same value.
orset.remove(obj1)

// Should return 'false'.
assert orset.lookup(obj2) == false
```

## ToDo

- [x] Add `compare` and `merge` methods to G-Set (`gset.v`)
- [ ] Add `compare` and `merge` methods to 2P-Set (`two_phase_set.v`)
1 change: 1 addition & 0 deletions vlib/datatypes/crdt/crdt.v
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
module crdt
63 changes: 63 additions & 0 deletions vlib/datatypes/crdt/gcounter.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
module crdt

import rand

// GCounter represent a G-counter in CRDT, which is
// a state-based grow-only counter that only supports
// increments.
pub struct GCounter {
// identity provides a unique identity to each replica.
identity string
// counter maps identity of each replica to their
// entry values i.e. the counter value they individually
// have.
mut:
counter map[string]int
}

// new_gcounter returns a GCounter by pre-assigning a unique
// identity to it.
pub fn new_gcounter() GCounter {
return GCounter{
identity: rand.ulid().str()
counter: map[string]int{}
}
}

// increment increments the GCounter by the value of 1 everytime it
// is called.
pub fn (mut g GCounter) increment() {
g.increment_value(1)
}

// inc_val allows passing in an arbitrary delta to increment the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// inc_val allows passing in an arbitrary delta to increment the
// increment_value allows passing in an arbitrary delta to increment the

// current value of counter by. Only positive values are accepted.
// If a negative value is provided the implementation will panic.
pub fn (mut g GCounter) increment_value(value int) {
if value < 0 {
panic('Cannot decrement a gcounter')
}
g.counter[g.identity] += value
}

// count returns the total count of this counter across all the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// count returns the total count of this counter across all the
// value returns the total count of this counter across all the

// present replicas.
pub fn (mut g GCounter) value() int {
mut total := 0
for key in g.counter.keys() {
total += g.counter[key]
}
return total
}

// Merge combines the counter values across multiple replicas.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Merge combines the counter values across multiple replicas.
// merge combines the counter values across multiple replicas.

// The property of idempotency is preserved here across
// multiple merges as when no state is changed across any replicas,
// the result should be exactly the same everytime.
pub fn (mut g GCounter) merge(c &GCounter) {
for key, value in c.counter {
if key !in g.counter || g.counter[key] < value {
g.counter[key] = value
}
}
}
16 changes: 16 additions & 0 deletions vlib/datatypes/crdt/gcounter_test.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
module crdt
Copy link
Member

@spytheman spytheman Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you want to test private functions, it is a little better, to not use module crdt for the test (i.e. just make a test that uses import datatypes.crdt, like a normal user program will have to) - the reason is that in this way, the test will also serve as an example, that is directly copy/paste-able.


fn test_increment() {
mut counter := new_gcounter()
counter.increment()
assert counter.value() == 1
}

fn test_merge() {
mut first_counter := new_gcounter()
first_counter.increment()
assert first_counter.value() == 1
mut second_counter := new_gcounter()
second_counter.merge(first_counter)
assert second_counter.value() == 1
}
53 changes: 53 additions & 0 deletions vlib/datatypes/crdt/gset.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
module crdt

// Gset is a grow-only set.
struct GSet[T] {
mut:
main_set map[T]T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
main_set map[T]T
main_set map[T]bool

For most types, like strings or pointers etc, sizeof(bool) < sizeof(T), and the map here is only used for its keys afaik.

}

// new_gset returns an instance of GSet.
pub fn new_gset[T]() GSet[T] {
return GSet[T]{
main_set: map[T]T{}
}
}

// add lets you add an element to grow-only set.
pub fn (mut g GSet[T]) add(elem T) {
g.main_set[elem] = T{}
}

// lookup returns true if an element exists within the
// set or false otherwise.
pub fn (mut g GSet[T]) lookup(elem T) bool {
return elem in g.main_set
}

// len returns the no. of elements present within GSet.
pub fn (mut g GSet[T]) len() int {
return g.main_set.len
}

// elements returns all the elements present in the set.
pub fn (mut g GSet[T]) elements() []T {
mut elements := []T{}
for _, element in g.main_set {
elements << element
}
return elements
Comment on lines +34 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mut elements := []T{}
for _, element in g.main_set {
elements << element
}
return elements
return g.main_set.keys()

}

// compare returns true if both of of sets are same, false otherwise.
pub fn (mut g GSet[T]) compare(c GSet[T]) bool {
return g == c
}

// merge function to merge the GSet object's payload with the argument's payload.
pub fn (mut g GSet[T]) merge(c GSet[T]) {
for key, _ in c.main_set {
if key !in g.main_set {
g.main_set[key] = c.main_set[key]
}
}
}
41 changes: 41 additions & 0 deletions vlib/datatypes/crdt/gset_test.v
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
module crdt

fn test_lookup() {
mut gset := new_gset[string]()
element := 'some-test-element'
assert gset.lookup(element) == false
}

fn test_add() {
mut gset := new_gset[string]()
element := 'some-test-element'
assert gset.lookup(element) == false
gset.add(element)
assert gset.lookup(element)
}

fn test_compare() {
mut first_gset := new_gset[string]()
element := 'some-test-element'
assert first_gset.lookup(element) == false
first_gset.add(element)
assert first_gset.lookup(element) == true
mut second_gset := new_gset[string]()
assert first_gset.compare(second_gset) == false
second_gset.add(element)
assert second_gset.compare(first_gset)
assert first_gset.compare(second_gset)
}

fn test_merge() {
mut first_gset := new_gset[string]()
element := 'some-test-element'
assert first_gset.lookup(element) == false
first_gset.add(element)
assert first_gset.lookup(element) == true
mut second_gset := new_gset[string]()
assert first_gset.compare(second_gset) == false
second_gset.merge(first_gset)
assert second_gset.compare(first_gset)
assert first_gset.compare(second_gset)
}
Loading