-
Notifications
You must be signed in to change notification settings - Fork 3
"fast" vs. "safe" #21
Comments
@sirthias Can provide more detailed feedback, but essentially when you are constructing JSON (i.e. you are parsing JSON), Array provides much better performance than any of the immutable data structures. @sirthias Provided benchmarks for this in the gitter channel The tldr is, there was no datastructure that I could find which would make most people even remotely happy. The other solution here of course, is to not use a concrete API
I believe the intention is that something like a JSON parser would use the fast variant, and then people would have a choice of whether to use the fast unsafe variant OR convert it to safe. There seems to be 2 use cases, which is what the fast/safe split is trying to represent. One is parsing JSON and representing it as an AST, and the other is querying/passing JSON around
This is why From a historical perspective, trying to make a single API didn't make anyone happy, although I can definitely see merit in picking one (probably |
Maybe we should enumerate the additional safety the
Did I forget something? IMO, we can hide all these insafeties if we make the AST abstract and just don't mention the optimizations. That way the optimized representation of the data can be private to the implementation. In the normal user API and in the interaction with other libraries we don't lose any of the safety while an intra-library usage can still use the optimal representation to avoid costs if possible. (To make things faster also between libraries - if that's a goal at all - we could still provide marker interfaces that would surface the underlying fast representation if we can agree on one.) |
One solution could be to use a super type of |
Its also meant to be "safe" in regards to performance, that is, having lookups for basic use cases which are no worse than eC (effective constant) lookup time, this is given by using
That is true, would have to redo the design for this
The current fast one I think is one that @non and @sirthias agreed on, they cared about the performance aspects of the AST
The problem with leaving it to the parser is that users don't have the gurantee that they are given a safe |
I make it more precise, the following AST can be used as
|
I think we need to be very careful about our choices about being strictly safe versus being sensibly safe enough. We already accept that any reference may be |
@OlivierBlanvillain I would go with |
I'll try to come up with a concrete suggestion tomorrow. |
I think nulls is an exception that we have to deal with, due to Java. If we had the choice, I think a very small minority of people would argue for null in its current form, so I am not sure its a completely valid comparison From the people that are arguing for a safe version, the impression I got is that they wanted a JSON version of String, something that is immutable and sensibly correct. From my rudimentary knowledge of parsers, the odds of parsers that are purely focused on speed of creating something slightly wrong isn't a once in a blue moon kind of thing
There could be silent overflows, or things like Infinity appearing as a String in a JNumber. There are also the issues of duplicate keys which are being discussed in length elsewhere At least personally I think the current level of safe is sensible. Its still not completely safe, BigDecimal will still swallow really large numbers, and Scala collections have a memory limit (so you can't story a really massive JSON). @rossabaker , One of the current owners of json4s/http4s, was arguing heavily for Vector, so was @bryce-anderson. IndexedSeq defaults to Vector iirc anyways? |
I don't recall one way or the other as to my past preference for |
Just letting you guys know, the default implementation of an https://github.com/scala/scala/blob/v2.11.7/src/library/scala/collection/IndexedSeq.scala#L25-L29
I think it was mainly @rossabaker . Apologies if I was putting words into your mouth, wasn't intentional! |
@mdedetrich, no worries, I very well could have said it, and I will say I do have a slight preference for Ultimately the decision here comes down to some degree of personal preference/experience and specific use cases. On the topic of the thread, I also find it difficult to swallow two AST's and if a choice needs to be made obviously the safe version is the most useful. I suspect someone who needs a "fast" AST might have their own specific needs anyway and even small deviations from their requirements will cause them to roll their own. _Note: this is just speculation on my part._ Btw @mdedetrich, I appreciate the effort you're putting into this. I noticed the SLIP. Its an important discussion to have whether the SLIP is accepted or not: it will help set a precedent for how future work in the standard library will be approached. |
I'm with you on this. Regarding the choice of a type for Note that while your fast design might be the fastest possible (without getting into manual memory management like they do in Netty/Spark), the safe is definitely not the safest, the following fails at runtime:
While this typo can be detected at compile time with shapeless:
Scala has it's limitation, but the language and it's standard library were designed to be a compromise between speed and safety. I could be safer, could be faster, but I don't think a JSON AST should have to deal with these details by exposing two API. There is a way to do it in a reasonably safe way with basically no performance lost, it's a |
Does the distinction really pulls its weight?
Arguments against:
toSafe
?The text was updated successfully, but these errors were encountered: