You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to read a slice of a file, based on the start & end char indices.
chatgpt attempt, seems to work:
Yes, there is a more efficient way to handle large files by reading only the necessary parts of the file instead of slurping the entire content. You can accomplish this using Java's java.nio.file API. Here is a way to do this using interop:
(defnlazy-substring-of-file [filename start end]
(let [path (java.nio.file.Paths/get filename (into-array String []))
options (into-array java.nio.file.OpenOption [java.nio.file.StandardOpenOption/READ])
fc (.newByteChannel java.nio.file.Files path options)
bb (java.nio.ByteBuffer/allocate (- end start))]
(.position fc start)
(.read fc bb)
(.close fc)
(String. (.array bb) "UTF-8")))
In this function:
java.nio.file.Paths/get is used to get a java.nio.file.Path object from the filename.
java.nio.file.Files/newByteChannel is used to create a new java.nio.channels.SeekableByteChannel to the file.
java.nio.ByteBuffer/allocate is used to create a ByteBuffer of the right size.
.position is used to set the read position of the byte channel.
.read is used to read the right amount of bytes from the file into the ByteBuffer.
String. (.array bb) "UTF-8" is used to create a new string from the ByteBuffer.
This function avoids reading the whole file into memory by only reading the necessary bytes. It works best when start and end are relatively small compared to the size of the file.
The text was updated successfully, but these errors were encountered:
Sure, I'm imagining a program that has to read many large files at predictable locations. For instance, media & archive headers. AFAIK, using slurp or fs/read-all-bytes could incur a high performance cost per operation, due to loading each entire file into memory. For a developer, it would be nice to have a readymade cross-platform function, and not have to delve into the host API.
That said, I've only done this sort of thing in C, so apologies if it's inaccurate or out of scope.
I'll keep this issue open to see if more people are interested. "lazily" might not be an accurate description: you want to read some specific segment from a file, without reading all of the file into memory, right?
kimo-k
changed the title
Read part of a file lazily
Efficiently read part of a file by seek offsets
Jul 9, 2023
It would be nice to read a slice of a file, based on the start & end char indices.
chatgpt attempt, seems to work:
The text was updated successfully, but these errors were encountered: