Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: sqlite cache store #3657

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

flakey5
Copy link
Member

@flakey5 flakey5 commented Oct 1, 2024

Note: this is a draft since #3562 isn't landed. Until it is landed, this will be based off of that pr's branch. For the actual diff see flakey5/undici@flakey5/3231...flakey5:undici:flakey5/20240910/sqlite-cache-store

This relates to...

Adding client side http caching (#3562)

Rationale

Adds a sqlite cache store option given the --experimental-sqlite flag is provided

Changes

Features

  • Cache store using sqlite as a backend

Bug Fixes

n/a

Breaking Changes and Deprecations

n/a

Status

cc @mcollina @ronag

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from c33c78e to 6c8ea8f Compare October 1, 2024 06:20
@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from 6c8ea8f to a015bf2 Compare October 15, 2024 02:16
@flakey5 flakey5 marked this pull request as ready for review October 15, 2024 02:17
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some docs?

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch 2 times, most recently from 592ffb3 to d951470 Compare October 16, 2024 07:21
lib/cache/sqlite-cache-store.js Outdated Show resolved Hide resolved
lib/cache/sqlite-cache-store.js Outdated Show resolved Hide resolved
lib/cache/sqlite-cache-store.js Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from 19b0eaf to 93a306c Compare November 2, 2024 08:20
@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch 2 times, most recently from 34b34d7 to 60a9679 Compare November 14, 2024 04:44
@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from 60a9679 to 8e0fa89 Compare November 14, 2024 05:53
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

lib/cache/sqlite-cache-store.js Outdated Show resolved Hide resolved
index.js Show resolved Hide resolved
test/cache-interceptor/cache-stores.js Outdated Show resolved Hide resolved
Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Unnecessarily complicated by locking.
  • How does it "prune" i.e. free space?
  • Why errorCallback?

@ronag
Copy link
Member

ronag commented Nov 18, 2024

IMHO I would think you could do this quite similar to the "simplified" memory store.

index.js Outdated Show resolved Hide resolved
@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch 2 times, most recently from bf40ed4 to 70fc6d3 Compare November 19, 2024 08:03
@flakey5
Copy link
Member Author

flakey5 commented Nov 19, 2024

Why errorCallback?

Goes back to the discussion we were having in the original caching pr for how to handle errors, we decided to handle them silently so the request still goes through.

Here it's just being used for handling json parse errors, since those shouldn't ever happen I would be in favor of just throwing outright though

How does it "prune" i.e. free space?

It prunes the entire database of expired values when we come across one when looking up a key, we can make this more stricter though (i.e. run the purge query on every write), wdyt?

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from 70fc6d3 to dd7ba9a Compare November 19, 2024 08:18
@ronag
Copy link
Member

ronag commented Nov 19, 2024

It prunes the entire database of expired values when we come across one when looking up a key, we can make this more stricter though (i.e. run the purge query on every write), wdyt?

When the database is full (we are missing maxSize) we should remove entries whether they are expired or not. I think we should have something like, while beyond maxCount or maxSize drop 10% of the oldest entries.

@flakey5
Copy link
Member Author

flakey5 commented Nov 19, 2024

I omitted maxCount here since it would break if there were multiple processes using a db for cache

If we do want it, I think we might be able to get away with it being stored somewhere in the db and being modified only with a procedure.

@ronag
Copy link
Member

ronag commented Nov 19, 2024

If we do want it, I think we might be able to get away with it being stored somewhere in the db and being modified only with a procedure.

I believe sqlite has meta commands for that, @mcollina ?

Comment on lines 255 to 315
const existingValue = findValue(key, true)
if (existingValue) {
// Updating an existing response, let's delete it
updateValueQuery.run(
JSON.stringify(stringifyBufferArray(body)),
value.deleteAt,
value.statusCode,
value.statusMessage,
JSON.stringify(stringifyBufferArray(value.rawHeaders)),
value.cachedAt,
value.staleAt,
value.deleteAt,
existingValue.id
)
} else {
// New response, let's insert it
insertValueQuery.run(
url,
key.method,
JSON.stringify(stringifyBufferArray(body)),
value.deleteAt,
value.statusCode,
value.statusMessage,
JSON.stringify(stringifyBufferArray(value.rawHeaders)),
value.vary ? JSON.stringify(value.vary) : null,
value.cachedAt,
value.staleAt,
value.deleteAt
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just always just overwrite? Does it update and insert need to be separate ops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so but I'm not sure how messy it would be compared to just having two queries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does update fail if the value doesn't exist? Doesn't sql have an insert or update command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does update fail if the value doesn't exist

It fails in the sense that it noops.

Doesn't sql have an insert or update command?

I'm not sure how well insert or update would work here based on how we're checking if there's a already cached value though. The only reliable and fast way we have to compare two cached values is with their ids. Otherwise, we need to compare their url, method, and vary headers to see if they were the same. The url and method comparisons are simple and can be done in sql, but comparing the vary headers is more difficult since it's stringified json. We can't compare it in sql since we don't know if the properties are in the same order, meaning we need to parse then compare (like what #findValues does).

I was experimenting with hashing the vary headers (sort properties alphabetically -> stringify -> crypto.createHash), however, it was a lot more complicated than just having two separate queries. It also would've only benefited writes, for reads we would still need to fetch each response and compare the vary headers like we already do

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch 2 times, most recently from 89ed094 to afa5d70 Compare November 19, 2024 20:48
types/cache-interceptor.d.ts Outdated Show resolved Hide resolved
docs/docs/api/CacheStore.md Show resolved Hide resolved
* @returns {import('../../types/cache-interceptor.d.ts').default.GetResult | undefined}
*/
get (key) {
if (typeof key !== 'object') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also support arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of cache keys? Currently no

lib/cache/sqlite-cache-store.js Outdated Show resolved Hide resolved
lib/cache/sqlite-cache-store.js Show resolved Hide resolved
Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the database primary key a hash of url + method + vary headers?

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from afa5d70 to 005f51b Compare November 21, 2024 03:19
@flakey5
Copy link
Member Author

flakey5 commented Nov 21, 2024

Why not make the database primary key a hash of url + method + vary headers?

That would still lead to the same state that I talked about here #3657 (comment)

I was experimenting with hashing the vary headers (sort properties alphabetically -> stringify -> crypto.createHash), however, it was a lot more complicated than just having two separate queries. It also would've only benefited writes, for reads we would still need to fetch each response and compare the vary headers like we already do

@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from 005f51b to f178cd1 Compare November 21, 2024 06:20
@mcollina
Copy link
Member

CI is very red :(

@mcollina
Copy link
Member

If we do want it, I think we might be able to get away with it being stored somewhere in the db and being modified only with a procedure.

I believe sqlite has meta commands for that, @mcollina ?

I think this would do it:

PRAGMA page_count;
PRAGMA page_size;

Then do page_count * page_size.

@mcollina
Copy link
Member

Why not make the database primary key a hash of url + method + vary headers?

@ronag I think this is possible, but I would likely investigate it after this lands. I think we would need to bring in https://www.npmjs.com/package/safe-stable-stringify to be able to have "stable" JSONs.

@ronag
Copy link
Member

ronag commented Nov 21, 2024

Why not make the database primary key a hash of url + method + vary headers?

@ronag I think this is possible, but I would likely investigate it after this lands. I think we would need to bring in https://www.npmjs.com/package/safe-stable-stringify to be able to have "stable" JSONs.

What's wrong with e.g:

function makeKey({ method, origin, path, vary }) {
  const hasher = crypto.createHash('sha256')
  hasher.update(origin.toLowerCase())
  hasher.update(method.toLowerCase())
  hasher.update(path ?? '')
  hasher.update(vary?.map(x => x.toLowerCase()).sort().join(',') ?? '')
  return hasher.digest('hex')
}

@mcollina
Copy link
Member

That would be very slow unfortunately :(. Unless we really need it.

docs/docs/api/CacheStore.md Outdated Show resolved Hide resolved
Co-authored-by: Robert Nagy <[email protected]>
Co-authored-by: Isak Törnros <[email protected]>
Co-authored-by: Matteo Collina <[email protected]>

Signed-off-by: flakey5 <[email protected]>
@flakey5 flakey5 force-pushed the flakey5/20240910/sqlite-cache-store branch from 27f1d63 to c40b97a Compare November 21, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants