-
Notifications
You must be signed in to change notification settings - Fork 262
MessageSemantics
The ownership and default semantics of protobuf messages have some subtle corner cases. The two key considerations to reconcile are:
- we want to be able to read deeply nested fields (eg.
foo.bar.baz
) without having to first test for message presence at every level (eg.if (foo.has_bar() && foo.bar.has_baz())
). - when serializing a message, we don't want to serialize empty submessages just because we read a default value out of that submessage.
The semantics for scalar fields (numbers, bools, strings) are simple: if you just read a field's default value but never set it, the value is considered unset and will not be serialized.
// C++ example:
MyMessage msg;
int32_t x = msg.myfield(); // Returns default.
msg.has_myfield(); // Returns false; will not be serialized.
msg.set_myfield(5);
msg.has_myfield(); // Returns true; will be serialized.
msg.clear_myfield();
msg.has_myfield(); // Return false; will not be serialized.
The semantics for a dynamic language like Python are almost identical:
# Python example:
msg = MyMessage()
x = msg.myfield
msg.HasField("myfield") # Returns false; will not be serialized.
msg.myfield = 5
msg.HasField("myfield") # Returns true; will be serialized.
msg.ClearField("myfield")
msg.HasField("myfield") # Returns false; will not be serialized.
Submessage fields are more complicated because we want to be able to inspect deep messages without causing any implicitly-created submessages to be serialized. There is also the issue of submessage ownership; languages without garbage collection like C++ often create an ownership model where submessages are owned by the parent message:
// C++ example:
MyMessage msg;
msg.bar().baz(); // Returns default value; msg.bar() is const.
msg.has_bar(); // Returns false; msg.bar will not be serialized.
msg.mutable_bar()->set_baz(5);
msg.has_bar() // Returns true; msg.bar will be serialized.
// C++ has direct ownership of submessages, so you can't assign
// submessage instances.
msg.set_bar(MyBarMessage()); // XXX does not exist
This ownership model doesn't fit dynamic languages so well. The mutable_
business in C++ isn't a good match for dynamic language conventions where "const" containers are generally not used.
x = foo.bar.baz
foo.HasField("bar") # Returns false; we only inspected it, so it won't be serialized.
# Python users expect to be able to say this:
foo.bar.baz = 5
foo.HasField("bar") # Returns true because we set a field of the submessage.
# It would be non-idiomatic and annoying if the design was like C++.
# This is *not* how the Python bindings actually work.
foo.bar.baz = 5 # Returns ERROR (hypothetically), foo.bar is immutable.
foo.mutable_bar.baz = 5
One other thing that dynamic language users expect is that they can "reparent" messages at will.
bar = Bar()
msg = MyMessage()
msg.bar = bar # Should we allow this?
Should we allow this kind of reparenting or not? There are pros and cons. The pros are convenience and efficiency, as well as composability:
# If I'm composing a message reparenting lets me compose the sub-parts in a more
# functional style.
msg.bar = MakeBar();
# If I can't reparent, the above looks more like:
FillInBar(msg.bar)
# If I've obtained a Bar from some other data source, I can make it part of
# another message without having to copy.
msg.bar = ParseBar()
On the other hand, allowing reparenting opens some cans of worms:
# If I can reparent, I can create cycles, which must be detected as an error
# at serialization time (which would have a potentially significant cost).
# It could be useful to create such cycles in some cases, but since they
# aren't serializable it might be better to disallow them.
msg.msg = msg
x = msg.foo.bar # Read only, won't serialize msg.foo.
foo = msg.foo
foo.bar = 5 # Write of foo, now msg.foo will be serialized, is this unexpected?
x = msg2.foo.bar # Read only, won't serialize msg2.foo.
msg3.foo = msg2.foo # Should msg3.foo be serialized, since it was explicitly assigned?
Another issue: if the implicitly-created submessage has a field set but is later cleared, should the submessage be serialized?
msg = MyMessage()
msg.foo.bar = 5 # The write will cause foo to be serialized.
msg.foo.ClearField("bar") # Now should foo be serialized?
msg.foo.Clear() # How about now?