-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Short Alias" for type encoders #193
Comments
I don't know if this would be a good idea to bake into the current API. We have done this on other projects like RYA which uses an unencoded integer to represent types. Also on one of our projects we needed to use XML schema identifiers for data types (much larger aliases). Basically what we did for each of these was to just have have a utility that created a Wrapper for the type encoders which overrode the value returned for the aliases. Then created a TypeRegistry where we just wrapped all the encoders. private static <T, U> TypeEncoder<T, U> changeAlias(final TypeEncoder<T, U> encoder, final String alias) {
return new TypeEncoder<T, U>() {
@Override
public String getAlias() {
return alias;
}
@Override
public Class<T> resolves() {
return encoder.resolves();
}
@Override
public U encode(T value) {
return encoder.encode(value);
}
@Override
public T decode(U value) {
return encoder.decode(value);
}
};
} In this case you can just make the alias |
That's quite a lot of work just to provide a minimized type name... I've On Thu, Sep 17, 2015 at 2:40 PM, eawagner [email protected] wrote:
|
I think wee should add wrapped encoders for this. The reason we generally need to use these is so that we can persist the type name somewhere... namely Accumulo according to RYA and Accumulo Recipes. |
The reason I would suggest doing this is that, there are a lot of reasons to want a different alias, accuracy (XML schema types), conforming to naming standards (Elastic Search schema defs), or for better compression (Rya, Recipes Shard Tables). Obviously, we can't meet them all. The current mechanism is there because it is accurate, not very large, and readable for when that is important. Like you I have found sevaral instances where the default is not always the best solution for every specific problem, but the API provides the means to customize the behavior of the API without complicating it. Its really isn't that much work. The reason I use that utility method, is my type registry definition looks like this for one of our impls public static final TypeRegistry<String> MY_TYPES = new TypeRegistry<String>(
changeAlias(booleanEncoder(), BOOLEAN_ALIAS),
changeAlias(byteEncoder(), BYTE_ALIAS),
changeAlias(doubleEncoder(), DOUBLE_ALIAS),
changeAlias(floatEncoder(), FLOAT_ALIAS),
changeAlias(integerEncoder(), INTEGER_ALIAS),
changeAlias(longEncoder(), LONG_ALIAS),
changeAlias(stringEncoder(), STRING_ALIAS),
changeAlias(uriEncoder(), URI_ALIAS)
); Mostly the question becomes, at how many variations of different types of aliases do we support. Currently we indirectly support 2; the current alias, and the actual Class which also has to be unique. Theoretically, you could just use the class as the alias, but the getAlias method is there to allow anyone to customize that behavior to your specific needs. |
The main reason I don't want this to be a separate change in a separate I would have to disagree with you that the current types are "accurate, not You've mentioned 2 different cases so far that would warrant wanting to Let's support XML aliases and "minimal" aliases. On Thu, Sep 17, 2015 at 3:42 PM, eawagner [email protected] wrote:
|
The reason I don't want to support every single one of these use cases, is they are limited and we are working in an infinite space of what people would want to make their aliases. Some standards based, some for space reasons, some because it matches their UI. I am contending that there is a mechanism to support this for a given code base. I understand you want to have the latest that mango supports, but it makes things harder to maintain especially for other users creating their own encoders. One of the more annoying things with the aliases is that it has to be unique in a type registry. That is ok, and we guarantee that all the aliases and Classes represented by the encoders are unique. Now if I want to add my own encoder, we would require an implementer to specify a class, alias, minimal alias(maybe as an integer), and XML alias, all of which would need to be unique. Also they need to verify that they would be unique across all mango and future mango updates to the default simple and lexicoder or you don't get the benefit of just getting the latest and greatest. I do understand to want to update to the latest and greatest mango type encoders, but it would complicate both the TypeEncoder interface and the TypeRegistry API to support all of these usecases. For instance, currently there is only one decode method based on alias and one method that will look up the alias for a class. public String getAlias(Object obj) {...}
public Object decode(String alias, U value) {...} Supporting other alias mechanisms means we will need to expose these methods now. public String getAlias(Object obj) {...}
public String getMinimalAlias(Object obj) {...}
public String getXMLSchema(Object obj) {...}
public Object decodeFromAlias(String alias, U value) {...}
public Object decodeFromMinimalAlias(String alias, U value) {...}
public Object decodeFromXMLSchema(String alias, U value) {...} Also every single implementer of a custom type encoder would have to define each of these methods on the type encoder interface. I would assume just for XMLSchema, most people will just make it up anyway, since it makes no sense for their types(i.e. none defined for ipv6). This is just makes using the API more complicated to use. I doubt most people would even know the difference between why they would need to have a minimal alias vs just the simple alias. All I am getting at is that there is already a relatively easy means to change the aliases. Yes, that means that you might not get the automatic goodness of using the simple type registries out of the box, but if you are worried about space, you should look at not only the alias size, but also the encoding size (see #137) which would require writing a whole new set of encoder implementations anyway. |
BTW per my last sentence in the the previous comment. I am for potentially writing a more compact set of encoders which would have more compact aliases and encodings, but not to modify the existing API. |
The alias for the type encoders takes up quite a large footprint when persisted to disk. Each time I need to encode an integer, I basically need to write out "integer" which is 7 bytes. I propose we create a "short alias" for each encoder that returns a single byte.
The text was updated successfully, but these errors were encountered: