-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data storage classes #494
Comments
This reminds me of some of the discussions in ros2 about serialization and native data types. Is there any understanding we can draw from that experience? |
Let's start by removing the usage of SDF types inside components in favor of Ignition Messages. This should help with the scripting effort too, so we can postpone adding scripting interfaces to SDFormat classes. |
I don't think that's what we want to do. Generally, I think we should avoid making protobufs part of any of our public APIs if it can be avoided. I would like to see a common |
That's because of gazebosim/gz-msgs#113 (also pointed out above), right? Yeah, that's a good point.
My main concern with this is that adding another type feels counterproductive when the goal is to reduce the number of types we have. I'm more inclined to figuring out a way in which we can make protobuf work for us instead putting a lot of effort into designing yet-another-collection-of-types. |
I think based on the recommended guidance from the Protobuf C++ Tutorial, we're going to end up with another collection of objects one way or another.
This makes me think that the ign-msgs types themselves should be some wrapper around the protobuf types, rather than the protobuf types themselves. |
Another issue is that |
This shouldn't be an issue from Fortress anymore, because we're not trying to allocate components contiguously in memory anymore since #856. |
It would be nice to revisit the topic of storing components in contiguous memory, that was one of the big motivations for moving to ECS. If using messages would block us from achieving that goal, then I'd suggest against their use in components. Based on my reading of this thread, the reason to use protobuf in components is serialization performance. The reason to use SDF is to avoid ABI issues, follow guidance from the protobuf developers, and leave open the door for improving our cache performance. It seems that SDF is the safer option. Open questions I have are:
|
If it's being serialized to XML and back, I would say that the performance is not great.
As we still intend to support both use cases, it's hard to predict which portion of users will be impacted. People who are doing analysis or simulation on a single computer probably won't incur a ton of serialization, but any interaction with a remote or cloud-based simulation instance would have issues. |
Unfortunately, most of the SDF DOM objects also make use of the PIMPL pattern which likely will block any sort of cache performance improvement. The best strategy would be to come up with another very basic (read: c-struct) representation of the data that could be either serialized to SDF DOM objects or protobuf messages. This would allow the ECS to operate on the most efficient in-memory version of the data. |
I know I brought this up, but as of #856, we don't store components in contiguous storage, so this may not be relevant anymore. |
I know it sounds backwards not to use contiguous storage anymore, but as detailed on #856, that had been hurting performance, rather than helping it.
Most SDF classes were originally designed to be "read-only" and part of a tree. As documented in most of them: Typical usage of the SDF DOM is through the Root object. Some of this may be changing, and there's an ongoing effort to document this here. But ultimately, it doesn't look like SDF classes were designed to be used as independent components that store plain data efficiently.
I believe the guidance is that we shouldn't be packaging the generated C++ code. It may be possible to solve this while still using them in components, see gazebosim/gz-msgs#113. |
I think you could argue that we were never really using contiguous storage correctly because many of our components were using the PIMPL pattern as well. The speedups in contiguous storage would come from using POD types as components.
I agree with this as well. It would be interesting to experiment with POD components at some point, but I don't think it's a high priority at the moment.
This is my interpretation as well. |
Some thoughts on using simple POD / structs:
|
➡️ ProposalI've allowed myself to dream a little and think about what would be the ultimate solution to this situation if we had no baggage or backwards compatibility to maintain. The goals are:
It's all SDFOur API across the stack has always been grounded on the SDF spec. We have model, link, collision, visual, etc, entities and components because these are SDF concepts. Our messages mimic SDF types, our sensors, physics and rendering support parameters exposed through SDF. Change only one placeThe canonical SDF description is the XML spec. So here's an idea:
|
If we use gz-msgs as components, wouldn't that mean the generated C++ code is now part of our public API? If the goal is to move away from shipping generated protobuf code, I don't think we would be going in the right direction if we use them as components. Consider a user application that generates it's own |
I agree, it seems like a step in the wrong direction. Fundamentally, protobuf is a serialization format, it's generally going to be sub-optimal to use for anything other than serialization. I know there is hesitance to add another set of structures, but I think about it the way that ROS 2 works. You have a set of structures that the user interacts with (the ROS idl) structs, and you have a set of structures that are used for serialization/deserialization (in some RMW implementations). These two things are generated from the same set of rosidl files, as well as all of the code to migrate between the two. I think it would make the most sense to have a POD representation and a protobuf generation and conversion code, all generated from the SDF spec. The POD objects could be stored as components, but when you went to publish with This is also probably closer to |
Maybe this can be worked around by enforcing a
Yeah good point, this may be a deal breaker.
The problem with POD types is that they can't be extended without breaking ABI. On the flip side, they're better to store in contiguous memory in the ECM. I lean in favor of prioritizing extensibility without breaking ABI rather than in-memory optimization, because the former is something we're actively dealing with all the time, while the latter has a potential theoretical benefit.
I like the idea of having another type generated from the SDF spec though, I'd just like it to be PIMPL'ed. The result may look very similar to our current SDF DOM classes. |
If we design this so that components can only be created by factory (i.e., private constructors), maybe we can use a custom memory allocator to create the |
I think this is hard without vendoring |
Yeah that's a good point, but I think the supported combination can be on a platform-basis, since users won't be sharing binaries across those platforms. We can say that we only support the upstream protobuf version on all platforms. That would be whatever comes with Ubuntu. On macOS and Windows, whenever the version changes, we recompile all our bottles / conda binaries to use the latest version, like we already do. What we wouldn't support are custom protobuf versions coming from a PPA, built from source, or anything like that. Which we already don't anyway. |
Breaking ABI is only a problem if the the POD structs are part of the public API. POD types could fit as part of a collection of interfaces to the same data. SDFormat could be the interface to read and write data to and from files, |
Another thing to keep in mind for this discussion is that we have started supporting custom data in SDFormat files through data in custom xml namespaces. A map of |
One of the challenges with Ignition's modular architecture is to create data structures and convert between them across all libraries.
Types everywhere
For example, Ignition Math is used by pretty much all libraries, from SDFormat to Rendering and Physics, to store math data types like vectors. But that's not the only place where those types exist. For most Ignition Math types, we also have equivalent Ignition Msgs types, so that the data can be transmitted over the wire.
Sure, that's not a terrible situation, and each class has a different use case. But let's look at something more complex, like a 3D mesh.
We start from sdf::Mesh which stores some basic metadata like file path and scale. That's pretty much the same as what ignition::msgs::MeshGeom holds. But the actual 3D data, like vertices and normals, is held by ignition::common::Mesh, which is used by both Physics and Rendering. But by itself, it doesn't include all the data that those libraries need. Ignition Rendering has a wrapper that adds extra information: ignition::rendering::MeshDescriptor. And Ignition Physics needs more information to be passed alongside
common::Mesh
to its ignition::physics::mesh::AttachMeshFeature.While implementing heightmaps, I created
sdf::Heightmap
(gazebosim/sdformat#388) to hold metadata, which is similar to ignition::msgs::HeightmapGeom. Made use of common::HeightmapData to store the actual height data, but that doesn't hold any material information. So for Ignition Rendering, I created theignition::rendering::HeightmapDescriptor
class (gazebosim/gz-rendering#180), of which 80% is a direct copy fromsdf::Heightmap
, with an extra field forcommon::HeightmapData
. The physics implementation will probably need to combine the SDF and Common types too.With so many equivalent types, we also need lots of conversion functions between them:
Implications
All these different types affect:
Moreover, there's just too much boilerplate involved. The classes on SDF and Rendering for example use the PIMPL pattern to be extensible, so they need setters and getters for each property, besides the rule of 5 (until we have something like gazebosim/gz-cmake#123). This is all created manually, which causes inconsistencies. All this just to hold some data.
Moving forward
I'm opening this issue to gather some ideas. I think that the situation we have with the math types is pretty good. There are 2 classes to store data, one for operations (
ignition::math
) and one for transmission (ignition::msgs
). How could we achieve an equivalent situation for the other types?The text was updated successfully, but these errors were encountered: