support user-defined mapping for Inf and NaN via keyword arg #294

hhaensel · 2024-11-17T21:55:10Z

This is a keyword-argument approach to implement #292 as an alternative to #293.

essenciary · 2024-11-27T08:21:14Z

@quinnj aware that you're super busy and don't want to add any pressure. Just let us know if there's anything we can do to help you expedite this PR (it's really small) to get it out of your work queue. Thanks!

hhaensel · 2024-11-27T17:51:47Z

If you review this, please consider my question from #292 (comment)

hhaensel · 2024-12-19T08:49:41Z

@quinnj We're about releasing a new version of the GenieFramework and a solution to the Inf issue would be super welcome.
Could you let us know, whether there's a chance of having this PR implemented?

LilithHafner · 2025-01-14T17:59:47Z

The API seems reasonable. Performance of writing NaN and Inf under allow_inf is a bout 2x slower with this PR than before (probably due to the changed writing implementation). IDK how important perf is in this case. Performance in other cases is unaffected. I can't comment on the correctness without knowing much about the codebase though I do know that it should probably have tests.

julia> @b rand([Inf, NaN, -Inf], 100) JSON3.write(_, allow_inf=true)
244.709 ns (3 allocs: 2.047 KiB) # release
496.429 ns (3 allocs: 2.047 KiB) # pr

julia> @b rand(100) JSON3.write(_, allow_inf=true)
2.488 μs (7 allocs: 6.594 KiB) # release
2.486 μs (7 allocs: 6.594 KiB) # pr

hhaensel · 2025-01-15T00:17:07Z

In my tests I have a 388ns (#main) vs. 501ns (#hh-infinity2), which I would consider acceptable.
If you're also ok with that, I'd add some tests.

…_inf_mapping` and `quoted_inf_mapping` plus docstrings and tests

hhaensel · 2025-01-15T09:47:27Z

Fixed the performance issue for default mappings and added underscore_inf_mapping and quoted_inf_mapping plus respective docstrings and tests.

julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.quoted_inf_mapping) |> println
["Infinity","-Infinity","NaN"]

julia> JSON3.write([Inf, -Inf, NaN], inf_mapping = JSON3.underscore_inf_mapping) |> println
["__inf__","__neginf__","__nan__"]

EDIT: updated code example

hhaensel · 2025-01-15T10:19:26Z

We could also provide a

hacky_inf_mapping(x) = x == Inf ? "1e1000" : x == -Inf ? "-1e1000" : "\"__nan__\""

which would correctly generate Infinity and -Infinity correctly for typical browser implementations of deserialisation.
Any version to generate NaN would be welcome as well as a different name...

LilithHafner · 2025-01-15T20:00:01Z

Adding quoted_inf_mapping and underscore_inf_mapping feels like standard proliferation, other than that this seems reasonable. It's plausible the custom mapping case could be faster but I don't think that's blocking as long as it doesn't hurt perf in the allow_inf=true case.

hhaensel · 2025-01-15T20:18:56Z

I tried various things to speed up custom mappings, but both NTuples and Vectors were slower by more than a factor of 2.
What do you propose concerning quoted_inf_mapping and underscore_mapping? Should we simply provide a demo_inf_mapping or should we simply place an example in the docs? And if so, what example would we chose?

LilithHafner · 2025-01-15T20:48:35Z

Definitely in the docs; and at that point it doesn't really matter much which.

hhaensel · 2025-01-15T23:44:34Z

All done

src/write.jl

LilithHafner · 2025-01-16T13:04:58Z

test/json.jl

@@ -46,6 +46,11 @@ end
    @test JSON3.read("Inf"; allow_inf=true) === Inf
    @test JSON3.read("Infinity"; allow_inf=true) === Inf
    @test JSON3.read("-Infinity"; allow_inf=true) === -Inf


Noting that it is not possible to support reading with a custom mapping defined with a function that maps Float64 to String.

However, it is possible to support reading and writing with a @NamedTuple{positive_inf::AbstractString, negative_inf::AbstractString, nan::AbstractString} API.

Yeah, I agree that not being able to read the JSON back in when a custom mapping is used is a bummer, but also it's something that is/will be better solved in JSONBase.jl, where it's easier to override reading things.

Actually, we do have the RawJson construct if you really needed to parse something back in; it's a bit of a heavy-handed escape hatch here, but technically would work.

LilithHafner

@quinnj, I think this PR is ready for final review.

The biggest drawbacks are

Unable to parse back the emitted JSON
Sub-optimal performance when emitting many NaNs or Infs using a custom mapping (does not effect allow_inf=true usage)

But I think it is likely still worth merging despite them because of the value this feature provides.

quinnj · 2025-01-29T04:13:32Z

Sorry to be so slow on the review here; I've indeed been busy and am woefully behind on github notifications. Always feel free to give me a ping on slack, which is the only platform I stay up to date on these days. I'm going to start looking at this now.

(also thanks to @LilithHafner for jumping in to help review here)

src/write.jl

quinnj · 2025-01-29T17:28:18Z

If anyone has time to look at the CI failures, I'd appreciate it. I'm a bit busy, but could probably look in the next few days.

LilithHafner · 2025-01-30T14:11:12Z

CI failures look to be the same as the failures on main.

hhaensel · 2025-01-31T11:09:38Z

Found a way to read inf_mappings, so please wait with merging

hhaensel · 2025-01-31T11:45:54Z

I added support for inf_mapping in case that the mapping used string values and not other types, e.g. like 1e1000.

julia> inf_mapping(x) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\"";

julia> JSON3.read("""["__nan__", {"a": "__inf__"},["__neginf__"]]"""; inf_mapping)
3-element JSON3.Array{Any, Base.CodeUnits{UInt8, String}, Vector{UInt64}}:
 NaN
    {
   "a": Inf
}
    [-Inf]

If this meets your expectation, I'll add docs and tests.

hhaensel · 2025-01-31T11:52:36Z

One more thought:
We could also define the inf_mapping such that there is no need to add the extra quotes. If people really want to supply non-string values, they could supply JSONText, e.g. JSONText("1e1000").
In that case a typical inf_mapping would look like

inf_mapping(x) = x == Inf ? "__inf__" : x == -Inf ? "__neginf__" : "__nan__"

EDIT: Just realised that there is no JSONText in JSON3, so probably not a good idea.

LilithHafner · 2025-01-31T12:57:28Z

src/read.jl

+        float = if val == codeunits(inf_mapping(Inf))[2:end-1]
+            Inf
+        elseif val == codeunits(inf_mapping(-Inf))[2:end-1]
+            -Inf
+        elseif val == codeunits(inf_mapping(NaN))[2:end-1]


This usage makes me think that inf_mapping should be a Tuple or NamedTuple rather than a function.

I thought so, too. But the function version was much faster.

I tried arrays, tuples and functions, at least concerning writing. I didn't check read performance.

I also checked the RawType approach but I couldn't find out how to change the type to Float. The current approach looks more natural to me and has less code.
It is somewhat of a restriction that I only support the case of string mappings, but I think it is very untypical that people want to cover other values than Infinity and NaN if they have a process that allows to send non-standard JSON.
EDIT: it's easy to include the quotes just by expanding the view and leaving out the [2:end-1] so my previous comment is no longer valid.

The difference between function and tuple that is giving you that performance difference is methods are specialized on a function but not on a tuple's value. You could get similar performance with a tuple by lifting it to the type domain with Val. I do see the advantage in terms of runtime performance of having the serialization format of inf and nan be passed into write/read at the type level

hhaensel · 2025-01-31T16:56:52Z

How do we proceed? Change to a tuple or named tuple? If so, could you paste a piece of code here? I'm not sure whether I understood how you did it.

LilithHafner · 2025-01-31T17:01:49Z

I don't have a piece of code to paste. I'm also not that invested in the API here; Happy to hear @quinnj's thoughts.

hhaensel · 2025-01-31T22:07:29Z

I retried a tuple solution, which is only slightly inferior performancewise than the functional mapping. With

fn_mapping(x::Real) = x == Inf ? "\"__inf__\"" : x == -Inf ? "\"__neginf__\"" : "\"__nan__\""
tuple_mapping = ("\"__inf__\"", "\"__neginf__\"", "\"__nan__\"")

x = rand([Inf, NaN, -Inf], 1000)
y = JSON3.write.(x, inf_mapping=fn_mapping)
jy = join(y, "\", \"")

I obtain

Operation	fn_mapping	tuple_mapping	allow_inf = true
`JSON3.write(x, …)`	3.688 μs	4.114 μs	3.375 μs
`JSON3.read.(y, …)`	5.908 ms	6.035 ms	5.927 ms
`JSON3.read(jy, …)`	177.292 ns	307.983 ns	165.138 ns

My tuple implementation is available as branch hh-infinity-tuple.

EDIT: updated table with values for allow_inf = true

hhaensel · 2025-01-31T22:10:48Z

If performance should be the criterion, we could define a macro @inf_mapping which takes three strings and defines an anonymous function.

support user-defined mapping for Inf and NaN via keyword arg

d68937d

hhaensel mentioned this pull request Nov 27, 2024

support user functions for Inf and NaN #293

Closed

Merge branch 'quinnj:main' into hh-infinity2

7d887c6

allow_inf: fix lower performance for default_mapping, add `underscore…

1ce1b28

…_inf_mapping` and `quoted_inf_mapping` plus docstrings and tests

remove custom inf_mappings and add kw inf_mapping to docstring

f656813

LilithHafner reviewed Jan 16, 2025

View reviewed changes

modify docstring for inf_mapping

62f8efb

LilithHafner approved these changes Jan 16, 2025

View reviewed changes

quinnj reviewed Jan 29, 2025

View reviewed changes

src/write.jl Show resolved Hide resolved

quinnj reviewed Jan 29, 2025

View reviewed changes

src/write.jl Outdated Show resolved Hide resolved

quinnj reviewed Jan 29, 2025

View reviewed changes

src/write.jl Outdated Show resolved Hide resolved

revise docstring of write, add brackets for clarity

0dc0277

quinnj approved these changes Jan 29, 2025

View reviewed changes

support read() with inf_mapping

4fc6b43

LilithHafner reviewed Jan 31, 2025

View reviewed changes

include quotes in test when reading inf and nan strings

7f08318

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support user-defined mapping for Inf and NaN via keyword arg #294

support user-defined mapping for Inf and NaN via keyword arg #294

hhaensel commented Nov 17, 2024

essenciary commented Nov 27, 2024 •

edited

Loading

hhaensel commented Nov 27, 2024 •

edited

Loading

hhaensel commented Dec 19, 2024

LilithHafner commented Jan 14, 2025

hhaensel commented Jan 15, 2025

hhaensel commented Jan 15, 2025 •

edited

Loading

hhaensel commented Jan 15, 2025

LilithHafner commented Jan 15, 2025

hhaensel commented Jan 15, 2025

LilithHafner commented Jan 15, 2025

hhaensel commented Jan 15, 2025

LilithHafner Jan 16, 2025

quinnj Jan 29, 2025

LilithHafner left a comment

quinnj commented Jan 29, 2025

quinnj commented Jan 29, 2025

LilithHafner commented Jan 30, 2025

hhaensel commented Jan 31, 2025

hhaensel commented Jan 31, 2025

hhaensel commented Jan 31, 2025 •

edited

Loading

LilithHafner Jan 31, 2025

hhaensel Jan 31, 2025

hhaensel Jan 31, 2025 •

edited

Loading

hhaensel Jan 31, 2025 •

edited

Loading

LilithHafner Jan 31, 2025

hhaensel commented Jan 31, 2025

LilithHafner commented Jan 31, 2025

hhaensel commented Jan 31, 2025 •

edited

Loading

hhaensel commented Jan 31, 2025

support user-defined mapping for Inf and NaN via keyword arg #294

Are you sure you want to change the base?

support user-defined mapping for Inf and NaN via keyword arg #294

Conversation

hhaensel commented Nov 17, 2024

essenciary commented Nov 27, 2024 • edited Loading

hhaensel commented Nov 27, 2024 • edited Loading

hhaensel commented Dec 19, 2024

LilithHafner commented Jan 14, 2025

hhaensel commented Jan 15, 2025

hhaensel commented Jan 15, 2025 • edited Loading

hhaensel commented Jan 15, 2025

LilithHafner commented Jan 15, 2025

hhaensel commented Jan 15, 2025

LilithHafner commented Jan 15, 2025

hhaensel commented Jan 15, 2025

LilithHafner Jan 16, 2025

Choose a reason for hiding this comment

quinnj Jan 29, 2025

Choose a reason for hiding this comment

LilithHafner left a comment

Choose a reason for hiding this comment

quinnj commented Jan 29, 2025

quinnj commented Jan 29, 2025

LilithHafner commented Jan 30, 2025

hhaensel commented Jan 31, 2025

hhaensel commented Jan 31, 2025

hhaensel commented Jan 31, 2025 • edited Loading

LilithHafner Jan 31, 2025

Choose a reason for hiding this comment

hhaensel Jan 31, 2025

Choose a reason for hiding this comment

hhaensel Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

hhaensel Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

LilithHafner Jan 31, 2025

Choose a reason for hiding this comment

hhaensel commented Jan 31, 2025

LilithHafner commented Jan 31, 2025

hhaensel commented Jan 31, 2025 • edited Loading

hhaensel commented Jan 31, 2025

essenciary commented Nov 27, 2024 •

edited

Loading

hhaensel commented Nov 27, 2024 •

edited

Loading

hhaensel commented Jan 15, 2025 •

edited

Loading

hhaensel commented Jan 31, 2025 •

edited

Loading

hhaensel Jan 31, 2025 •

edited

Loading

hhaensel Jan 31, 2025 •

edited

Loading

hhaensel commented Jan 31, 2025 •

edited

Loading