-
Notifications
You must be signed in to change notification settings - Fork 73
Use safe explain tool #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| schema: z.string().optional(), | ||
| queryId: z.string().describe('The query ID from pg_stat_statements') | ||
| schema: z.string(), | ||
| queryId: z.string().describe('The query ID from pg_stat_statements (use the queryid field from getSlowQueries)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema parameter is defined as required in the Zod schema but has a default value in the destructuring, creating inconsistent behavior.
View Details
📝 Patch Details
diff --git a/apps/dbagent/src/lib/ai/tools/db.ts b/apps/dbagent/src/lib/ai/tools/db.ts
index 35fd3e9..9e155c4 100644
--- a/apps/dbagent/src/lib/ai/tools/db.ts
+++ b/apps/dbagent/src/lib/ai/tools/db.ts
@@ -72,7 +72,7 @@ The query needs to be complete, it cannot contain $1, $2, etc. If you need to, r
It's very important that $1, $2, etc. are not passed to this tool. Use the tool describeTable to get the types of the columns.
If you know the schema, pass it in as well.`,
parameters: z.object({
- schema: z.string(),
+ schema: z.string().optional(),
query: z.string()
}),
execute: async ({ schema = 'public', query }) => {
@@ -98,7 +98,7 @@ If you know the schema, pass it in as well.`,
This prevents SQL injection by not accepting raw SQL queries. Returns the explain plan as received from PostgreSQL.
Use the queryid field from the getSlowQueries tool output as the queryId parameter.`,
parameters: z.object({
- schema: z.string(),
+ schema: z.string().optional(),
queryId: z.string().describe('The query ID from pg_stat_statements (use the queryid field from getSlowQueries)')
}),
execute: async ({ schema = 'public', queryId }) => {
@@ -116,7 +116,7 @@ Use the queryid field from the getSlowQueries tool output as the queryId paramet
return tool({
description: `Describe a table. If you know the schema, pass it as a parameter. If you don't, use public.`,
parameters: z.object({
- schema: z.string(),
+ schema: z.string().optional(),
table: z.string()
}),
execute: async ({ schema = 'public', table }) => {
Analysis
Zod schema validation failure when required parameters have default values
What fails: safeExplainQuery(), unsafeExplainQuery(), and describeTable() tools define schema parameter as required (z.string()) but use default values in destructuring ({ schema = 'public', ... }), causing Zod validation to fail when AI models don't provide the parameter
How to reproduce:
// Test with the original schema definition
const schema = z.object({ schema: z.string(), queryId: z.string() });
schema.parse({ queryId: '123' }); // Fails with "Required" errorResult: Zod validation throws error: [{"code": "invalid_type", "expected": "string", "received": "undefined", "path": ["schema"], "message": "Required"}]
Expected: Should accept missing schema parameter and use default value 'public' as intended by the destructuring pattern
Fix: Changed schema: z.string() to schema: z.string().optional() in all three affected tool parameter schemas to match the intended behavior with default values
| Step 4: | ||
| Use the tool unsafeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool. | ||
| Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The playbook instructions tell users to "replace the query parameters ($1, $2, etc) with actual values" when using safeExplainQuery, but this tool fetches queries by queryId from pg_stat_statements, so users cannot modify the query.
View Details
📝 Patch Details
diff --git a/apps/dbagent/src/lib/tools/playbooks.ts b/apps/dbagent/src/lib/tools/playbooks.ts
index cf4fc47..57b6818 100644
--- a/apps/dbagent/src/lib/tools/playbooks.ts
+++ b/apps/dbagent/src/lib/tools/playbooks.ts
@@ -24,9 +24,10 @@ Use the tool findTableSchema to find the schema of the table involved in the slo
Use the tool describeTable to describe the table you found.
Step 4:
-Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool.
-Also, it's very important to replace the query parameters ($1, $2, etc) with the actual values. Generate your own values, but
- take into account the data types of the columns.
+Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool
+and the queryid from the getSlowQueries output. Note that safeExplainQuery automatically fetches the query
+from pg_stat_statements using the queryid, so you cannot modify the query text. If the query contains
+placeholders ($1, $2, etc), you may need to use unsafeExplainQuery instead with manually replaced values.
Step 5:
If the previous step indicates that an index is missing, tell the user the exact DDL to create the index.
Analysis
Incorrect safeExplainQuery instructions in slow queries playbook
What fails: SLOW_QUERIES_PLAYBOOK Step 4 instructs users to "replace the query parameters ($1, $2, etc) with actual values" when using safeExplainQuery, but this tool fetches queries by queryId from pg_stat_statements automatically, making parameter replacement impossible.
How to reproduce:
- Call getSlowQueries() to get queries with placeholders like "$1, $2"
- Follow playbook Step 4 instructions to use safeExplainQuery with the queryId
- safeExplainQuery fetches the query from
pg_stat_statements WHERE queryid = $1and detects placeholders - Function returns error: "The query seems to contain placeholders ($1, $2, etc). Replace them with actual values and try again."
Result: Users receive contradictory error message telling them to replace parameters they cannot modify, since safeExplainQuery automatically retrieves the query text from the database.
Expected: Instructions should clarify that safeExplainQuery uses queryId to automatically fetch queries from pg_stat_statements, and suggest using unsafeExplainQuery when parameter replacement is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct, we should remove that instruction to the model.
| await client.query("SET LOCAL lock_timeout = '200ms'"); | ||
| await client.query(`SET search_path TO ${schema}`); | ||
| const explainQuery = `EXPLAIN ${query}`; | ||
| const explainQuery = hasPlaceholders ? `EXPLAIN (GENERIC_PLAN true) ${query}` : `EXPLAIN ${query}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| Step 4: | ||
| Use the tool unsafeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool. | ||
| Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct, we should remove that instruction to the model.
Related discussion #223 We already improved it in #224, this PR makes it more secure by 1. Renaming `explainQuery` to `unsafeExplainQuery` while the single query check and transaction wrapping is pretty secure already, there might be edge cases where a query escapes the designated boundaries. Therefore, we rename the tool to `unsafeExplainQuery` 2. Create a new tool called `safeExplainQuery`, this operates on `queryId` and fetches the actual SQL from `pg_stat_statements` table itself thereby eliminating the code path that can lead to any SQL injection. This is done in the following stacked PR #226 3. Use the new `safeExplainQuery` tool instead of `unsafeExplainQuery`, to make it work, we had to additionally return `queryId` from `getSlowQueries` tool in addition to the slow SQL query. This is done in the following stacked PR #227
Related discussion xataio/agent#223 We already improved it in xataio/agent#224, this PR makes it more secure by 1. Renaming `explainQuery` to `unsafeExplainQuery` while the single query check and transaction wrapping is pretty secure already, there might be edge cases where a query escapes the designated boundaries. Therefore, we rename the tool to `unsafeExplainQuery` 2. Create a new tool called `safeExplainQuery`, this operates on `queryId` and fetches the actual SQL from `pg_stat_statements` table itself thereby eliminating the code path that can lead to any SQL injection. This is done in the following stacked PR xataio/agent#226 3. Use the new `safeExplainQuery` tool instead of `unsafeExplainQuery`, to make it work, we had to additionally return `queryId` from `getSlowQueries` tool in addition to the slow SQL query. This is done in the following stacked PR xataio/agent#227
See #225