Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Draft) Base64 read & write Buffer #819

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

MartinDevi
Copy link
Contributor

Write a Base64-encoded string to a buffer, and read the contents of a buffer as a Base64-encoded string, to avoid having the extra copies required when using byte strings.

A few questions remain:

  • Dedicated exception for invalid Base64?
  • Any way to avoid having this much duplication with the existing Base64 code? I don't see how, or maybe some small bits that wouldn't really change much.
  • Where to put the new methods? Keep them as extensions or add them as members? Move them to a different file?
  • Add similar extensions on BufferedSource and BufferedSink? Seems like that would be a good idea.

Also, I migrated the Base64 maps from byte arrays to strings because it seems wasteful to encode to a byte array then decode to a string. They could also be a char array, although I'm not sure that would make much of a difference.

Closes #613

@swankjesse
Copy link
Collaborator

Do you really need this? My thoughts have always been that if base64 is causing a performance problem, the solution is to avoid base64.

@swankjesse
Copy link
Collaborator

(I’m also extremely sensitive to keeping Okio small.)

@MartinDevi
Copy link
Contributor Author

Do you really need this? My thoughts have always been that if base64 is causing a performance problem, the solution is to avoid base64.

The premise is that there's a choice in the format in which data is being provided, which isn't always the case 😄

It's also still the most convenient format to use when embedding raw data within JSON. On mobile, the size of data downloaded from network requests can become non negligible for the app's memory, so I suspect that avoiding those extra copies when decoding can offer a small but nice performance boost while reading content, mostly by limiting the number of allocations and therefore the work of garbage collection.

@swankjesse
Copy link
Collaborator

I like the feature; I just am reluctant to put it into the library because it leans into an awkward design. With UnsafeCursor can you get the same performance in an external library? Or perhaps just use this?
https://github.com/erickok/okio-extensions

@MartinDevi
Copy link
Contributor Author

I can try to see what's doable using UnsafeCursor, but I'll still run into the issue of the fact that Okio's Base64 capabilities are internal, so they'd have to be copied (or find some other API which would work). The okio-extensions project doesn't really do any kind of streamed implementation (this limitation is explicit in Base64Sink documentation), and actually internally it does the copy of the bytes into a ByteString then copies them into UTF-8, so really it's just syntactic sugar with very dangerous limitations to be used properly on streams, and no or little performance improvement.

@swankjesse
Copy link
Collaborator

Agreed! Though we’re not saving much code on Base64 by keeping it in Okio; the streaming and non-streaming implementations are different.

@MartinDevi
Copy link
Contributor Author

That's true, it basically just saves the space of the dictionaries with encoding characters, which isn't saying much.

Another solution would potentially be to have ByteString's Base64 implementation rely on this new one for Buffer. It means a bit more copying and allocations when doing Base64 conversions for ByteString, compared to what exists right now, but then again ByteString isn't supposed to be meant for high-performance tool, as I understand it.

Minus the fact that if it's a large Base64 then a SegmentedByteString would be returned (which should be transparent anyways), this change should yield the same behavior.

internal inline fun ByteString.commonBase64(): String =
  Buffer().write(this).readBase64()

internal inline fun ByteString.commonBase64Url(): String =
  Buffer().write(this).readBase64Url()

internal inline fun String.commonDecodeBase64(): ByteString? {
  val buffer = Buffer()
  try {
    buffer.writeBase64(this)
  } catch (e: IllegalArgumentException) { // TODO: Dedicated Base64 exception?
    return null
  }
  return buffer.readByteString()
}

This way only a single Base64 implementation remains, for Buffer.

@swankjesse
Copy link
Collaborator

Good point! I’m still reluctant to add streaming base64 to Okio, just in interest of making sure most of the library is of value to most of its users.

`ByteString` relies on Buffer implementation for Base64 capabilities.

`BufferedSource` and `BufferedSink` API for Base64.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Base64 decoding/encoding source/sink
2 participants