Skip to content

Commit

Permalink
Deprecate TArray and TRef. (#152)
Browse files Browse the repository at this point in the history
* deprecate `TArray` and `TRef`.

* remote TArray and TRef

* docs update

* Update Project.toml

Co-authored-by: Hong Ge <[email protected]>
  • Loading branch information
KDr2 and yebai authored Jun 22, 2022
1 parent 9579c93 commit 4322ecd
Show file tree
Hide file tree
Showing 10 changed files with 135 additions and 471 deletions.
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ uuid = "6f1fad26-d15e-5dc8-ae53-837a1d7b8c9f"
license = "MIT"
desc = "Tape based task copying in Turing"
repo = "https://github.com/TuringLang/Libtask.jl.git"
version = "0.7.5"
version = "0.8"

[deps]
FunctionWrappers = "069b7b12-0de2-55c6-9aab-29f3d0a68a2e"
Expand Down
33 changes: 15 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ a = copy(ttask)
@show consume(ttask) # 3
```

Heap allocated objects are shallow copied:
Array and Ref objects are deep copied:

```julia
using Libtask
Expand All @@ -50,46 +50,43 @@ ttask = TapedTask(f)
@show consume(ttask) # 0
@show consume(ttask) # 1

a = copy(t)
a = copy(ttask)
@show consume(a) # 2
@show consume(a) # 3

@show consume(ttask) # 4
@show consume(ttask) # 5
@show consume(ttask) # 2
@show consume(ttask) # 3
```

In constrast to standard arrays, which are only shallow copied during
task copying, `TArray`, an array data structure provided by Libtask,
is deep copied during the copying process of a task:
Others Heap allocated objects (e.g., `Dict`) are shallow copied:

```julia
using Libtask

function f()
t = TArray(Int, 1)
t[1] = 0
for _ in 1:10
t = Dict(1=>10, 2=>20)
while true
produce(t[1])
t[1] = 1 + t[1]
end
end

ttask = TapedTask(f)

@show consume(ttask) # 0
@show consume(ttask) # 1
@show consume(ttask) # 10
@show consume(ttask) # 11

a = copy(ttask)
@show consume(a) # 2
@show consume(a) # 3
@show consume(a) # 12
@show consume(a) # 13

@show consume(ttask) # 2
@show consume(ttask) # 3
@show consume(ttask) # 14
@show consume(ttask) # 15
```

Notes:
Notes:

- The [Turing](https://github.com/TuringLang/Turing.jl) probabilistic programming
- The [Turing](https://github.com/TuringLang/Turing.jl) probabilistic programming
language uses this task copying feature in
an efficient implementation of the [particle
filtering](https://en.wikipedia.org/wiki/Particle_filter) sampling
Expand Down
6 changes: 2 additions & 4 deletions src/Libtask.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@ using FunctionWrappers: FunctionWrapper
using LRUCache

export TapedTask, consume, produce
export TArray, tzeros, tfill, TRef

export TArray, tzeros, tfill, TRef # legacy types back compat

include("tapedfunction.jl")
include("tapedtask.jl")

include("tarray.jl")
include("tref.jl")

end
104 changes: 62 additions & 42 deletions src/tapedfunction.jl
Original file line number Diff line number Diff line change
@@ -1,37 +1,37 @@
#=
`TapedFunction` converts a Julia function to a friendly tape for user-specified interpreters.
With this tape-like abstraction for functions, we gain some control over how the function is
executed, like capturing continuations, caching variables, injecting additional control flows
(i.e. produce/consume) between instructions on the tape, etc.
Under the hood, we firstly used Julia's compiler API to get the IR code of the original function.
We use the unoptimised typed code in a non-strict SSA form. Then we convert each IR instruction
to a Julia data structure (an object of a subtype of AbstractInstruction). All the operands
(i.e., the variables) these instructions use are stored in a data structure called `Bindings`.
This conversion/binding process is performed at compile-time / tape-recording time and is only
done once for each function.
`TapedFunction` converts a Julia function to a friendly tape for user-specified interpreters.
With this tape-like abstraction for functions, we gain some control over how the function is
executed, like capturing continuations, caching variables, injecting additional control flows
(i.e. produce/consume) between instructions on the tape, etc.
Under the hood, we firstly used Julia's compiler API to get the IR code of the original function.
We use the unoptimised typed code in a non-strict SSA form. Then we convert each IR instruction
to a Julia data structure (an object of a subtype of AbstractInstruction). All the operands
(i.e., the variables) these instructions use are stored in a data structure called `Bindings`.
This conversion/binding process is performed at compile-time / tape-recording time and is only
done once for each function.
In a nutshell, there are two types of instructions (or primitives) on a tape:
- Ordinary function call
- Control-flow instruction: GotoInstruction and CondGotoInstruction, ReturnInstruction
Once the tape is recorded, we can run the tape just like calling the original function.
Once the tape is recorded, we can run the tape just like calling the original function.
We first plugin the arguments, run each instruction on the tape, and stop after encountering
a ReturnInstruction. We also provide a mechanism to add a callback after each instruction.
This API allowed us to implement the `produce/consume` machanism in TapedTask. And exploiting
a ReturnInstruction. We also provide a mechanism to add a callback after each instruction.
This API allowed us to implement the `produce/consume` machanism in TapedTask. And exploiting
these features, we implemented a fork mechanism for TapedTask.
Some potentially sharp edges of this implementation:
1. GlobalRef is evaluated at the tape-recording time (compile-time). Most times,
the value/object associated with a GlobalRef does not change at run time.
So this works well. But, if you do something like `module A v=1 end; make tapedfunction; A.eval(:(v=2)); run tf;`,
1. GlobalRef is evaluated at the tape-recording time (compile-time). Most times,
the value/object associated with a GlobalRef does not change at run time.
So this works well. But, if you do something like `module A v=1 end; make tapedfunction; A.eval(:(v=2)); run tf;`,
The assignment won't work.
2. QuoteNode is also evaluated at the tape-recording time (compile-time). Primarily
2. QuoteNode is also evaluated at the tape-recording time (compile-time). Primarily
the result of evaluating a QuoteNode is a Symbol, which works well most of the time.
3. Each Instruction execution contains one unnecessary allocation at the moment.
So writing a function with vectorised computation will be more performant,
3. Each Instruction execution contains one unnecessary allocation at the moment.
So writing a function with vectorised computation will be more performant,
for example, using broadcasting instead of a loop.
=#

Expand All @@ -58,8 +58,9 @@ mutable struct TapedFunction{F, TapeType}
binding_values::Bindings
arg_binding_slots::Vector{Int} # arg indices in binding_values
retval_binding_slot::Int # 0 indicates the function has not returned
deepcopy_types::Vector{Any}

function TapedFunction{F, T}(f::F, args...; cache=false) where {F, T}
function TapedFunction{F, T}(f::F, args...; cache=false, deepcopy_types=[]) where {F, T}
args_type = _accurate_typeof.(args)
cache_key = (f, args_type...)

Expand All @@ -72,17 +73,17 @@ mutable struct TapedFunction{F, TapeType}
ir = _infer(f, args_type)
binding_values, slots, tape = translate!(RawTape(), ir)

tf = new{F, T}(f, length(args), ir, tape, 1, binding_values, slots, 0)
tf = new{F, T}(f, length(args), ir, tape, 1, binding_values, slots, 0, deepcopy_types)
TRCache[cache_key] = tf # set cache
return tf
end

TapedFunction(f, args...; cache=false) =
TapedFunction{typeof(f), RawTape}(f, args...; cache=cache)
TapedFunction(f, args...; cache=false, deepcopy_types=[]) =
TapedFunction{typeof(f), RawTape}(f, args...; cache=cache, deepcopy_types=deepcopy_types)

function TapedFunction{F, T0}(tf::TapedFunction{F, T1}) where {F, T0, T1}
new{F, T0}(tf.func, tf.arity, tf.ir, tf.tape,
tf.counter, tf.binding_values, tf.arg_binding_slots, 0)
tf.counter, tf.binding_values, tf.arg_binding_slots, 0, tf.deepcopy_types)
end

TapedFunction(tf::TapedFunction{F, T}) where {F, T} = TapedFunction{F, T}(tf)
Expand Down Expand Up @@ -444,31 +445,50 @@ end
## copy Bindings, TapedFunction

"""
tape_copy(x)
tape_shallowcopy(x)
tape_deepcopy(x)
Function `tape_shallowcopy` and `tape_deepcopy` are used to copy data
while copying a TapedFunction. A value in the bindings of a
TapedFunction is either `tape_shallowcopy`ed or `tape_deepcopy`ed. For
TapedFunction, all types are shallow copied by default, and you can
specify some types to be deep copied by giving the `deepcopy_types`
kwyword argument while constructing a TapedFunction.
The default behaviour of `tape_shallowcopy` is, we return its argument
untouched, like `identity` does, i.e., `tape_copy(x) = x`. The default
behaviour of `tape_deepcopy` is, we call `deepcopy` on its argument
and return the result, `tape_deepcopy(x) = deepcopy(x)`. If one wants
some kinds of data to be copied (shallowly or deeply) in a different
way, one can overload these functions.
Function `tape_copy` is used to copy data while copying a
TapedFunction, the default behaviour is: we perform share the data
between tasks, i.e., `tape_copy(x) = x`. If one wants some kinds of
data to be copied, or deeply copied, one can overload this function.
"""
function tape_copy end
tape_copy(x) = x
function tape_shallowcopy end, function tape_deepcopy end

tape_shallowcopy(x) = x
tape_deepcopy(x) = deepcopy(x)
# Core.Box is used as closure captured variable container, so we should tape_copy its contents
tape_copy(x::Core.Box) = Core.Box(tape_copy(x.contents))
# ?? should we deepcopy Array and Dict by default?
# tape_copy(x::Array) = deepcopy(x)
# tape_copy(x::Dict) = deepcopy(x)
tape_shallowcopy(x::Core.Box) = Core.Box(tape_shallowcopy(x.contents))
tape_deepcopy(x::Core.Box) = Core.Box(tape_deepcopy(x.contents))

function _tape_copy(v, deepcopy_types)
if any(t -> isa(v, t), deepcopy_types)
tape_deepcopy(v)
else
tape_shallowcopy(v)
end
end

function copy_bindings(old::Bindings)
function copy_bindings(old::Bindings, deepcopy_types)
newb = copy(old)
for k in 1:length(old)
isassigned(old, k) && (newb[k] = tape_copy(old[k]))
newb[k] = _tape_copy(old[k], deepcopy_types)
end
return newb
end

function Base.copy(tf::TapedFunction)
new_tf = TapedFunction(tf)
new_tf.binding_values = copy_bindings(tf.binding_values)
new_tf.binding_values = copy_bindings(tf.binding_values, tf.deepcopy_types)
return new_tf
end
22 changes: 19 additions & 3 deletions src/tapedtask.jl
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ end

# NOTE: evaluating model without a trace, see
# https://github.com/TuringLang/Turing.jl/pull/1757#diff-8d16dd13c316055e55f300cd24294bb2f73f46cbcb5a481f8936ff56939da7ceR329
function TapedTask(f, args...)
tf = TapedFunction(f, args...; cache=true)
function TapedTask(f, args...; deepcopy_types=[Array, Ref]) # deepcoy Array and Ref by default.
tf = TapedFunction(f, args...; cache=true, deepcopy_types=deepcopy_types)
TapedTask(tf, args...)
end

Expand Down Expand Up @@ -178,7 +178,7 @@ function Base.copy(t::TapedTask; args=())
end
else
# the task is not started yet, but no args is given
tape_copy.(t.args)
map(a -> _tape_copy(a, t.tf.deepcopy_types), t.args)
end
end
new_t = TapedTask(tf, task_args...)
Expand All @@ -187,3 +187,19 @@ function Base.copy(t::TapedTask; args=())
new_t.task.storage[:tapedtask] = new_t
return new_t
end

# TArray and TRef back-compat
function TArray(args...)
Base.depwarn("`TArray` is deprecated, please use `Array` instead.", :TArray)
Array(args...)
end
function TArray(T::Type, dim)
Base.depwarn("`TArray` is deprecated, please use `Array` instead.", :TArray)
Array{T}(undef, dim)
end
tzeros, tfill = zeros, fill

function TRef(x)
Base.depwarn("`TRef` is deprecated, please use `Ref` instead.", :TArray)
Ref(x)
end
Loading

2 comments on commit 4322ecd

@yebai
Copy link
Member

@yebai yebai commented on 4322ecd Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/62851

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.8.0 -m "<description of version>" 4322ecd7c55b19d25e91adb504266cc25ffb3dbb
git push origin v0.8.0

Please sign in to comment.