Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of ContinuationLocal #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 45 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,46 @@
# continuation-local-storage-jvm
A common JVM implementation of continuation-local storage

This is a common implementation of continuation-local storage for the JVM.
It is based on Twitter's [`com.twitter.util.Local`](https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Local.scala), but this is a Java port
without any external dependencies (notably; no dependency on Scala).

The ContinuationLocal can be used for any data that should be persisted in
a continuation scope, but is especially useful for various monitoring and tracing
implementations.

## Usage example

```java
import com.github.distributedtracing.ContinuationLocal;

public final class MyProgram {
private static final ContinuationLocal<Integer> userId
= new ContinuationLocal<Integer>();

public static void main(String[] args) {
userId.set(123);
printUserId();
}

private static void printUserId() {
System.out.println(userId.get());
}
}
```

## Design differences between com.twitter.util.Local and ContinuationLocal:

- The value of `com.twitter.util.Local<T>` is of type `scala.Option<T>`,
where `Some(value)` holds the value, while `None` indicates that the
value is absent. In `com.github.distributedtracing.ContinuationLocal<T>`,
the value is not an Option, instead being a nullable of type `T`.
- In `com.twitter.util.Local` the inner storage mechanism is a `ThreadLocal`,
while in `ContinuationLocal`, it is a `InheritableThreadLocal`. This means
that a value that would not be passed on in `com.twitter.util.Local` will be
in a `ContinuationLocal`, in case a new thread is started in the same
context as the `ContinuationLocal` was set.
- `com.twitter.util.Local`'s `update` method is not included. The similar
method `set` should instead be used. `update` in `c.t.u.Local` uses `Option`
types, while `set` in `ContinuationLocal` deals with nullable values directly.
- `com.twitter.util.Local`'s (static) `letClear` function has been renamed to
`letAllClear` to avoid a naming collision with the non-static method `letClear`.
92 changes: 92 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>7</version>
</parent>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<groupId>com.github.distributedtracing</groupId>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.github probably isn't the right group id, as we won't be able to publish against that. @rtyler correct me if I am wrong on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of packages out there released from github: http://search.maven.org/#search%7Cga%7C1%7Ccom.github

But maybe we should aquire a domain like "distributedtracing.org" and use "org.distributedtracing" as a groupId? Then we could point that domain to a gh-pages here in the github repo. (And the specification won't be tied to github)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I suppose I could have googled that :) I don't know what's best, but I suspect that not having github in package names or group ids could lead to later flexibility (thinking apache etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I suggest that we buy that domain, and change the groupId.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.github.distributedtracing is certainly legitimate and can be used for publishing. So can "distributedtracing" or anything in between.

Group IDs are only conventionally supposed to be reverse DNS to the authors domain, and distributedtracing.github.com is a resolvable name and will point to the github pages repo for this organization when it is set up

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think the gh-pages will be distributedtracing.github.io so it would be better to start with a maven group io.github.distributedtracing if that was the rationale. We should be able to control static content in the domain/subdomain that we claim, so I think ".com" is wrong.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsyer both are equivalent, foo.github.io and foo.github.com are the same.

In the case of a custom CNAME, like we use for @lookout, lookout.github.io, lookout.github.com and hackers.lookout.com all end up at the same place

<artifactId>continuation-local-storage</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>continuation-local-storage</name>
<url>https://github.com/DistributedTracing/continuation-local-storage-jvm</url>

<description>
This is a Java port of Twitter's com.twitter.util.Local. The purpose of
ContinuationLocal is to be able to pass state implicitly between continuations;
much like a ThreadLocal, but one that also works in thread pools and event loops.
</description>

<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>

<developers>
<developer>
<id>eirslett</id>
<name>Eirik Sletteberg</name>
<email>[email protected]</email>
</developer>
</developers>

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>animal-sniffer-maven-plugin</artifactId>
<version>1.14</version>
<executions>
<execution>
<phase>test</phase>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
<configuration>
<signature>
<groupId>org.codehaus.mojo.signature</groupId>
<artifactId>java16</artifactId>
<version>1.1</version>
</signature>
</configuration>
</plugin>
</plugins>

<pluginManagement>
<plugins>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add animalsniffer?

      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>animal-sniffer-maven-plugin</artifactId>
        <version>1.14</version>
        <executions>
          <execution>
            <phase>test</phase>
            <goals>
              <goal>check</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <signature>
            <groupId>org.codehaus.mojo.signature</groupId>
            <artifactId>java16</artifactId>
            <version>1.1</version>
          </signature>
        </configuration>
      </plugin>

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
170 changes: 170 additions & 0 deletions src/main/java/com/github/distributedtracing/ContinuationLocal.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
package com.github.distributedtracing;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ultimately what we want? Might as well state your intent and go for

org.continuation

from the start

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to not having it in the com.github namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do +1 it myself.
org.continuation is a good namespace, since it's not directly tied to distributed tracing. One could use a ContinuationLocal for other things as well, like passing around business logic state, even though I would certainly not recommend that.


public final class ContinuationLocal<T> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some things in the readme could be moved here I think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. It would be good to add some javadoc here.

private static InheritableThreadLocal<Object[]> localCtx =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is passive inheritance desired? Is it better to be explicit about passing context so that the mechanism by which context carries over thread boundaries is not dependent on how the thread was created. Or to put it another way, everyone should use executors or wrap runnables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to ThreadLocal vs InheritableThreadLocal? (The only difference being that InheritableThreadLocal passes state from one thread to another thread if the second thread was created from the first thread)

It's a design decision whether or not passing state between threads should be explicit or implicit. The ThreadLocal in Java does it implicitly, so I guess that's the "standard" Java way. Another point of view is that we want to be able to trace requests without having to make lots of changes to userland code. Even more, to instruct developers to manually wrap all their threads and thread pools. In a multi-million-line code base, it's a pain to search-replace every place you need to wrap runnables and so on. (Unfortunately, that's what we have to do anyways, unless the Java standard library adopts something like ContinuationLocal and wraps ExecutorServices natively).

The underlying mechanism for passing state is the save() and restore() stuff, and the ExecutorService wrapper included in this library is simply a default implementation for thread pools. Likewise, InheritableThreadLocal could be thought of as a default implementation for passing state into new threads, I suppose?
Rxjava would have to add save()/restore() hooks to their library in order to achive the same thing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just questioning whether implicit passing via InheritableThreadLocal is a fools errand for the reasons you mention. I consider wrapping Executors to be explicit passing btw and something most modern application do as a matter of routine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem InheritableThreadLocal leads to unpredictability. For instance, ThreadPoolExecutor.execute() does create threads, which would lead to the threads in the pool each having a "random" value in the ContinuationLocal. It would seem better to obviously break so that the user discovers the missing wrap().

There are also issues with object retention, in that the ThreadPoolExecutor, even when properly wrapped, would be retaining references to the ContinuationLocal information from "arbitrary" threads, some of which may be long-dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariusae was it a conscious decision to use ThreadLocal insted of InheritableThreadLocal, in the first place?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eirslett yes. we wanted all context saving and restoring to be explicit.

new InheritableThreadLocal<Object[]>();
private static volatile int size = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use AtomicInteger so you don't have to hold the class monitor which risks contention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! Will change that.


/**
* Return a snapshot of the current ContinuationLocal state.
*/
public static Object[] save() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to violate the mutability expectations of users when state is moved between threads. In particular the fact that some inner code can directly replace the bound value at any time for any ContinuationLocal regardless of visibility rules. I'd strongly consider encapsulating this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, also a trade-off here; Twitter solved this by using a type alias:
twitter/util@288c790
I'm not sure if their construct is 100 % safe either. But the idea was to reduce allocations by not using a wrapper.
I do agree with you that the safety an encapsulating class will provide justifies the allocation overhead. Any other people with opinions?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems dropping some type safety and returning Object would be superior over Object[]. That 1) allows the concrete implementation to morph and 2) discourages modifying the object. It is true that users could cast to the Object[], but that seems less of a concern. restore() would require "saved was returned from save()" and do a cast internally, throwing an IllegalArgumentException if it isn't the expected type.

That isn't to say that I think returning an Object is better than a special type needing an extra allocation, just that it seems less problematic than Object[].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good solution!

return localCtx.get();
}

/**
* Restore the ContinuationLocal state to a given set of values.
*/
public static void restore(Object[] saved) {
localCtx.set(saved);
}

private static synchronized int add() {
size += 1;
return size - 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return size++?

}

private static void set(int i, Object v) {
assert i < size;
Object[] ctx = localCtx.get();

if(ctx == null) {
ctx = new Object[size];
} else {
Object[] oldCtx = ctx;
ctx = new Object[size];
System.arraycopy(oldCtx, 0, ctx, 0, oldCtx.length);
}

ctx[i] = v;
localCtx.set(ctx);
}

private static Object get(int i) {
Object[] ctx = localCtx.get();
if(ctx == null || ctx.length <= i) {
return null;
}

return ctx[i];
}

private static void clear(int i) {
set(i, null);
}

/**
* Clear all locals in the current context.
*/
private static void clearAll() {
localCtx.set(null);
}

/**
* Execute a block with the given ContinuationLocals,
* restoring current values upon completion.
*/
public static <U> U let(Object[] ctx, Producer<U> f) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the usual meaning of let.

Object[] saved = save();
restore(ctx);
U result;
try {
result = f.apply();
} finally {
restore(saved);
}
return result;
}

/**
* Execute a block with all ContinuationLocals clear, restoring
* current values upon completion.
*/
public static <U> U letAllClear(Producer<U> f) {
return ContinuationLocal.let(null, f);
}

/**
* Convert a Producer&lt;R&gt; into another producer of the same
* type whose ContinuationLocal context is saved when calling `closed`
* and restored upon invocation.
*/
public static <R> Producer<R> closed(final Producer<R> fn) {
final Object[] closure = ContinuationLocal.save();
return new Producer<R>() {
public R apply() {
Object[] save = ContinuationLocal.save();
ContinuationLocal.restore(closure);
R result;
try {
result = fn.apply();
} finally {
ContinuationLocal.restore(save);
}
return result;
}
};
}

private final int me = add();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While integer keys are efficient the data density tradeoff might be undesirable if folks think that they can create and discard new continuation local instances all the time.

If the performance characteristics of indexed lookup are critical then perhaps place an upper bound on the array size and fallback to a spare map implementation once that size is exceeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's right that developers shouldn't make new ContinuationLocals everywhere, because they have an overhead, and aren't really garbage collected either. But I'm not sure the implementation details should document how ContinuationLocal is supposed to be used, maybe that should be a warning in the JavaDoc instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ContinuationLocal instances appear to be garbage collectable (am I missing something?) I agree you want to avoid too much coupling between implementation details and interface but there is a clear risk here. Guidance in doc is probably sufficient for most cases but defense in depth is also worth considering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objects that are referred to by a ContinuationLocal is stored inside the context (the current/temporary implementation of context is Object[]). So the ContinuationLocal itself is garbage collected, but not the value it's bound to.
Anyways, I suppose the "correct" use of a ContinuationLocal is to have it in a global, static variable? So it will have the same lifecycle as the application itself. Are there any cases where one would need additional scoping?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a data point: we're operating really rather huge codebases with com.twitter.util.Local, and we've never had any issues with this approach.


/**
* Update the ContinuationLocal with the given value.
*/
public void set(T value) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is continuous mutability the goal or an anti-pattern here? We explicitly chose not to allow it in

grpc/grpc-java#562

because we believe the aggregate set of 'context' passed to a unit of work should be stable and that it was an exceptional event to need to mutate it. If folks need mutability for some reason they can have the stored state be mutable.

If your goal is to have the 'feel' of ThreadLocal then it will have to accept the risks that come with that choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point! Mutability can be achieved by using a mutable inner value.
I'm not sure what's the right answer here... as you say it's a balance between the ThreadLocal "feel" vs. benefits of immutability. Does anybody else have opinions?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.twitter.util.Local are not mutable. That is, they are bindings and not variables. As @louiscryan points out, you can however bind a ref cell. I think this is the most sensible design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariusae wouldn't this set() method on the Local mean it's mutable in the same sense that a ThreadLocal is mutable?
https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Local.scala#L117

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really hard to compare, because ThreadLocals don't propagate across their delimitation's boundaries (starting a new thread) whereas Locals do.

Thus, if I change a Local with set(), it doesn't affect any context where those Locals have been suspended.

Even so, set is a low-level API, and user code should instead use let which emphasizes the binding nature of the context. (This should be made more clear in its documentation.) https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Local.scala#L128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the implication of this, is that if you do want to share values that mutate across contexts, you do have to use inner refs.

Finagle's context API provides a clearer API for all of this. (It does more--but it might inform how I'd write com.twitter.util.Local were I to do it again.) https://github.com/twitter/finagle/blob/develop/finagle-core/src/main/scala/com/twitter/finagle/context/Context.scala

set(me, value);
}

/**
* Get the ContinuationLocal's value.
*/
public T get() {
return (T)get(me);
}

/**
* Alias for get()
*/
public T apply() {
return get();
}

/**
* Execute a block with a specific ContinuationLocal value,
* restoring the current state upon completion.
*/
public <U> U let(T value, Producer<U> f) {
T saved = get();
set(value);
U result;
try {
result = f.apply();
} finally {
set(saved);
}
return result;
}

/**
* Execute a block with the ContinuationLocal cleared, restoring
* the current state upon completion.
*/
public <U> U letClear(Producer<U> f) {
T saved = get();
clear();
U result;
try {
result = f.apply();
} finally {
set(saved);
}
return result;
}

/**
* Clear the ContinuationLocal's value. Other ContinuationLocal's are not modified.
*/
public void clear() {
ContinuationLocal.clear(me);
}
}
Loading