Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a non-cryptographic hash to generate unique statement names #1759

Closed
wants to merge 2 commits into from

Conversation

drakkan
Copy link
Contributor

@drakkan drakkan commented Oct 8, 2023

you can benchmark like this

func hashSha256(sql string) string {
	digest := sha256.Sum256([]byte(sql))
	return "stmtcache_" + hex.EncodeToString(digest[0:24])
}

func hashFNV1(sql string) string {
	h := fnv.New64a()
	h.Write([]byte(sql))
	return "stmtcache_" + strconv.FormatUint(h.Sum64(), 10)
}

func hashXXHash(sql string) string {
	h := xxhash.New()
	h.Write([]byte(sql))
	return "stmtcache_" + strconv.FormatUint(h.Sum64(), 10)
}

func BenchmarkSha256(b *testing.B) {
	sql := `SELECT u.id,u.username,u.password,u.public_keys,u.home_dir,u.uid,u.gid,u.max_sessions,u.quota_size,u.quota_files,u.permissions,u.used_quota_size,u.used_quota_files,u.last_quota_update,u.upload_bandwidth,u.download_bandwidth,u.expiration_date,u.last_login,u.status,u.filters,u.filesystem,u.additional_info,u.description,u.email,u.created_at,u.updated_at,u.upload_data_transfer,u.download_data_transfer,u.total_data_transfer,u.used_upload_data_transfer,u.used_download_data_transfer,u.deleted_at,u.first_download,u.first_upload,r.name,u.last_password_change FROM users u LEFT JOIN roles r on r.id = u.role_id WHERE u.deleted_at = 0 ORDER BY u.username ASC LIMIT $1 OFFSET $2`
	for i := 0; i < b.N; i++ {
		hashSha256(sql)
	}
}

func BenchmarkFNV1(b *testing.B) {
	sql := `SELECT u.id,u.username,u.password,u.public_keys,u.home_dir,u.uid,u.gid,u.max_sessions,u.quota_size,u.quota_files,u.permissions,u.used_quota_size,u.used_quota_files,u.last_quota_update,u.upload_bandwidth,u.download_bandwidth,u.expiration_date,u.last_login,u.status,u.filters,u.filesystem,u.additional_info,u.description,u.email,u.created_at,u.updated_at,u.upload_data_transfer,u.download_data_transfer,u.total_data_transfer,u.used_upload_data_transfer,u.used_download_data_transfer,u.deleted_at,u.first_download,u.first_upload,r.name,u.last_password_change FROM users u LEFT JOIN roles r on r.id = u.role_id WHERE u.deleted_at = 0 ORDER BY u.username ASC LIMIT $1 OFFSET $2`
	for i := 0; i < b.N; i++ {
		hashFNV1(sql)
	}
}

func BenchmarkXXhash(b *testing.B) {
	sql := `SELECT u.id,u.username,u.password,u.public_keys,u.home_dir,u.uid,u.gid,u.max_sessions,u.quota_size,u.quota_files,u.permissions,u.used_quota_size,u.used_quota_files,u.last_quota_update,u.upload_bandwidth,u.download_bandwidth,u.expiration_date,u.last_login,u.status,u.filters,u.filesystem,u.additional_info,u.description,u.email,u.created_at,u.updated_at,u.upload_data_transfer,u.download_data_transfer,u.total_data_transfer,u.used_upload_data_transfer,u.used_download_data_transfer,u.deleted_at,u.first_download,u.first_upload,r.name,u.last_password_change FROM users u LEFT JOIN roles r on r.id = u.role_id WHERE u.deleted_at = 0 ORDER BY u.username ASC LIMIT $1 OFFSET $2`
	for i := 0; i < b.N; i++ {
		hashXXHash(sql)
	}
}

xxhash is faster but requires an external dependency. FNV-1 should be enough.

Results on my laptop:

BenchmarkFNV1-12    	  807478	      1304 ns/op	     760 B/op	       3 allocs/op
BenchmarkSha256-12    	  365073	      2831 ns/op	     864 B/op	       4 allocs/op

This is just a suggestion, even sha256 should be ok for general use. Thank you

@jackc
Copy link
Owner

jackc commented Oct 11, 2023

The unique statement name generation should only happen once per SQL text per connection. Given that it is not a per query cost, I think the additional collision resistance is worth the extra time.

@jackc jackc closed this Oct 11, 2023
@drakkan
Copy link
Contributor Author

drakkan commented Oct 11, 2023

No problem, it was just a suggestion 😄

I'm not sure why sha256 is truncated

hex.EncodeToString(digest[0:24])

this should be

hex.EncodeToString(digest[0:32])

or

hex.EncodeToString(digest[:])

am I missing something? Thank you

@jackc
Copy link
Owner

jackc commented Oct 11, 2023

The reason it is truncated is to stay under the identifier length limit of 63 bytes. https://www.postgresql.org/docs/current/limits.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants