Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support user-defined mapping for Inf and NaN via keyword arg #294

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

hhaensel
Copy link

This is a keyword-argument approach to implement #292 as an alternative to #293.

@essenciary
Copy link

essenciary commented Nov 27, 2024

@quinnj aware that you're super busy and don't want to add any pressure. Just let us know if there's anything we can do to help you expedite this PR (it's really small) to get it out of your work queue. Thanks!

@hhaensel
Copy link
Author

hhaensel commented Nov 27, 2024

If you review this, please consider my question from #292 (comment)

@hhaensel
Copy link
Author

@quinnj We're about releasing a new version of the GenieFramework and a solution to the Inf issue would be super welcome.
Could you let us know, whether there's a chance of having this PR implemented?

@LilithHafner
Copy link
Contributor

The API seems reasonable. Performance of writing NaN and Inf under allow_inf is a bout 2x slower with this PR than before (probably due to the changed writing implementation). IDK how important perf is in this case. Performance in other cases is unaffected. I can't comment on the correctness without knowing much about the codebase though I do know that it should probably have tests.

julia> @b rand([Inf, NaN, -Inf], 100) JSON3.write(_, allow_inf=true)
244.709 ns (3 allocs: 2.047 KiB) # release
496.429 ns (3 allocs: 2.047 KiB) # pr

julia> @b rand(100) JSON3.write(_, allow_inf=true)
2.488 μs (7 allocs: 6.594 KiB) # release
2.486 μs (7 allocs: 6.594 KiB) # pr

@hhaensel
Copy link
Author

In my tests I have a 388ns (#main) vs. 501ns (#hh-infinity2), which I would consider acceptable.
If you're also ok with that, I'd add some tests.

…_inf_mapping` and `quoted_inf_mapping` plus docstrings and tests
@hhaensel
Copy link
Author

hhaensel commented Jan 15, 2025

Fixed the performance issue for default mappings and added underscore_inf_mapping and quoted_inf_mapping plus respective docstrings and tests.

julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.quoted_inf_mapping) |> println
["Infinity","-Infinity","NaN"]

julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.underscore_inf_mapping) |> println
["__inf__","__neginf__","__nan__"]

EDIT: updated code example

@hhaensel
Copy link
Author

We could also provide a

hacky_inf_mapping(x) = x == Inf ? "1e1000" : x == -Inf ? "-1e1000" : "\"__nan__\""

which would correctly generate Infinity and -Infinity correctly for typical browser implementations of deserialisation.
Any version to generate NaN would be welcome as well as a different name...

@LilithHafner
Copy link
Contributor

Adding quoted_inf_mapping and underscore_inf_mapping feels like standard proliferation, other than that this seems reasonable. It's plausible the custom mapping case could be faster but I don't think that's blocking as long as it doesn't hurt perf in the allow_inf=true case.

@hhaensel
Copy link
Author

  • I tried various things to speed up custom mappings, but both NTuples and Vectors were slower by more than a factor of 2.
  • What do you propose concerning quoted_inf_mapping and underscore_mapping? Should we simply provide a demo_inf_mapping or should we simply place an example in the docs? And if so, what example would we chose?

@LilithHafner
Copy link
Contributor

Definitely in the docs; and at that point it doesn't really matter much which.

@hhaensel
Copy link
Author

All done

src/write.jl Outdated Show resolved Hide resolved
@@ -46,6 +46,11 @@ end
@test JSON3.read("Inf"; allow_inf=true) === Inf
@test JSON3.read("Infinity"; allow_inf=true) === Inf
@test JSON3.read("-Infinity"; allow_inf=true) === -Inf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that it is not possible to support reading with a custom mapping defined with a function that maps Float64 to String.

However, it is possible to support reading and writing with a @NamedTuple{positive_inf::AbstractString, negative_inf::AbstractString, nan::AbstractString} API.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that not being able to read the JSON back in when a custom mapping is used is a bummer, but also it's something that is/will be better solved in JSONBase.jl, where it's easier to override reading things.

Actually, we do have the RawJson construct if you really needed to parse something back in; it's a bit of a heavy-handed escape hatch here, but technically would work.

Copy link
Contributor

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quinnj, I think this PR is ready for final review.

The biggest drawbacks are

  • Unable to parse back the emitted JSON
  • Sub-optimal performance when emitting many NaNs or Infs using a custom mapping (does not effect allow_inf=true usage)

But I think it is likely still worth merging despite them because of the value this feature provides.

@quinnj
Copy link
Owner

quinnj commented Jan 29, 2025

Sorry to be so slow on the review here; I've indeed been busy and am woefully behind on github notifications. Always feel free to give me a ping on slack, which is the only platform I stay up to date on these days. I'm going to start looking at this now.

(also thanks to @LilithHafner for jumping in to help review here)

src/write.jl Show resolved Hide resolved
src/write.jl Outdated Show resolved Hide resolved
src/write.jl Outdated Show resolved Hide resolved
@quinnj
Copy link
Owner

quinnj commented Jan 29, 2025

If anyone has time to look at the CI failures, I'd appreciate it. I'm a bit busy, but could probably look in the next few days.

@LilithHafner
Copy link
Contributor

CI failures look to be the same as the failures on main.

@hhaensel
Copy link
Author

Found a way to read inf_mappings, so please wait with merging

@hhaensel
Copy link
Author

I added support for inf_mapping in case that the mapping used string values and not other types, e.g. like 1e1000.

julia> inf_mapping(x) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\"";

julia> JSON3.read("""["__nan__", {"a": "__inf__"},["__neginf__"]]"""; inf_mapping)
3-element JSON3.Array{Any, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
 NaN
    {
   "a": Inf
}
    [-Inf]

If this meets your expectation, I'll add docs and tests.

@hhaensel
Copy link
Author

hhaensel commented Jan 31, 2025

One more thought:
We could also define the inf_mapping such that there is no need to add the extra quotes. If people really want to supply non-string values, they could supply JSONText, e.g. JSONText("1e1000").
In that case a typical inf_mapping would look like

inf_mapping(x) = x == Inf ? "__inf__" : x == -Inf ? "__neginf__" : "__nan__"

EDIT: Just realised that there is no JSONText in JSON3, so probably not a good idea.

src/read.jl Outdated
Comment on lines 176 to 180
float = if val == codeunits(inf_mapping(Inf))[2:end-1]
Inf
elseif val == codeunits(inf_mapping(-Inf))[2:end-1]
-Inf
elseif val == codeunits(inf_mapping(NaN))[2:end-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage makes me think that inf_mapping should be a Tuple or NamedTuple rather than a function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so, too. But the function version was much faster.

Copy link
Author

@hhaensel hhaensel Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried arrays, tuples and functions, at least concerning writing. I didn't check read performance.

Copy link
Author

@hhaensel hhaensel Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked the RawType approach but I couldn't find out how to change the type to Float. The current approach looks more natural to me and has less code.
It is somewhat of a restriction that I only support the case of string mappings, but I think it is very untypical that people want to cover other values than Infinity and NaN if they have a process that allows to send non-standard JSON.
EDIT: it's easy to include the quotes just by expanding the view and leaving out the [2:end-1] so my previous comment is no longer valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between function and tuple that is giving you that performance difference is methods are specialized on a function but not on a tuple's value. You could get similar performance with a tuple by lifting it to the type domain with Val. I do see the advantage in terms of runtime performance of having the serialization format of inf and nan be passed into write/read at the type level

@hhaensel
Copy link
Author

How do we proceed? Change to a tuple or named tuple? If so, could you paste a piece of code here? I'm not sure whether I understood how you did it.

@LilithHafner
Copy link
Contributor

I don't have a piece of code to paste. I'm also not that invested in the API here; Happy to hear @quinnj's thoughts.

@hhaensel
Copy link
Author

hhaensel commented Jan 31, 2025

I retried a tuple solution, which is only slightly inferior performancewise than the functional mapping. With

fn_mapping(x::Real) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\""
tuple_mapping = ("\"__inf__\"", "\"__neginf__\"", "\"__nan__\"")

x = rand([Inf, NaN, -Inf], 1000)
y = JSON3.write.(x, inf_mapping=fn_mapping)
jy = join(y, "\", \"")

I obtain

Operation fn_mapping tuple_mapping allow_inf = true
JSON3.write(x, …) 3.688 μs 4.114 μs 3.375 μs
JSON3.read.(y, …) 5.908 ms 6.035 ms 5.927 ms
JSON3.read(jy, …) 177.292 ns 307.983 ns 165.138 ns

My tuple implementation is available as branch hh-infinity-tuple.

EDIT: updated table with values for allow_inf = true

@hhaensel
Copy link
Author

If performance should be the criterion, we could define a macro @inf_mapping which takes three strings and defines an anonymous function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants