Skip to content

Conversation

@divyenduz
Copy link
Contributor

@divyenduz divyenduz commented Sep 25, 2025

See #225

@vercel
Copy link
Contributor

vercel bot commented Sep 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
agent Ready Ready Preview Sep 26, 2025 6:30am

schema: z.string().optional(),
queryId: z.string().describe('The query ID from pg_stat_statements')
schema: z.string(),
queryId: z.string().describe('The query ID from pg_stat_statements (use the queryid field from getSlowQueries)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema parameter is defined as required in the Zod schema but has a default value in the destructuring, creating inconsistent behavior.

View Details
📝 Patch Details
diff --git a/apps/dbagent/src/lib/ai/tools/db.ts b/apps/dbagent/src/lib/ai/tools/db.ts
index 35fd3e9..9e155c4 100644
--- a/apps/dbagent/src/lib/ai/tools/db.ts
+++ b/apps/dbagent/src/lib/ai/tools/db.ts
@@ -72,7 +72,7 @@ The query needs to be complete, it cannot contain $1, $2, etc. If you need to, r
 It's very important that $1, $2, etc. are not passed to this tool. Use the tool describeTable to get the types of the columns.
 If you know the schema, pass it in as well.`,
       parameters: z.object({
-        schema: z.string(),
+        schema: z.string().optional(),
         query: z.string()
       }),
       execute: async ({ schema = 'public', query }) => {
@@ -98,7 +98,7 @@ If you know the schema, pass it in as well.`,
 This prevents SQL injection by not accepting raw SQL queries. Returns the explain plan as received from PostgreSQL.
 Use the queryid field from the getSlowQueries tool output as the queryId parameter.`,
       parameters: z.object({
-        schema: z.string(),
+        schema: z.string().optional(),
         queryId: z.string().describe('The query ID from pg_stat_statements (use the queryid field from getSlowQueries)')
       }),
       execute: async ({ schema = 'public', queryId }) => {
@@ -116,7 +116,7 @@ Use the queryid field from the getSlowQueries tool output as the queryId paramet
     return tool({
       description: `Describe a table. If you know the schema, pass it as a parameter. If you don't, use public.`,
       parameters: z.object({
-        schema: z.string(),
+        schema: z.string().optional(),
         table: z.string()
       }),
       execute: async ({ schema = 'public', table }) => {

Analysis

Zod schema validation failure when required parameters have default values

What fails: safeExplainQuery(), unsafeExplainQuery(), and describeTable() tools define schema parameter as required (z.string()) but use default values in destructuring ({ schema = 'public', ... }), causing Zod validation to fail when AI models don't provide the parameter

How to reproduce:

// Test with the original schema definition
const schema = z.object({ schema: z.string(), queryId: z.string() });
schema.parse({ queryId: '123' }); // Fails with "Required" error

Result: Zod validation throws error: [{"code": "invalid_type", "expected": "string", "received": "undefined", "path": ["schema"], "message": "Required"}]

Expected: Should accept missing schema parameter and use default value 'public' as intended by the destructuring pattern

Fix: Changed schema: z.string() to schema: z.string().optional() in all three affected tool parameter schemas to match the intended behavior with default values

Step 4:
Use the tool unsafeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool.
Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The playbook instructions tell users to "replace the query parameters ($1, $2, etc) with actual values" when using safeExplainQuery, but this tool fetches queries by queryId from pg_stat_statements, so users cannot modify the query.

View Details
📝 Patch Details
diff --git a/apps/dbagent/src/lib/tools/playbooks.ts b/apps/dbagent/src/lib/tools/playbooks.ts
index cf4fc47..57b6818 100644
--- a/apps/dbagent/src/lib/tools/playbooks.ts
+++ b/apps/dbagent/src/lib/tools/playbooks.ts
@@ -24,9 +24,10 @@ Use the tool findTableSchema to find the schema of the table involved in the slo
 Use the tool describeTable to describe the table you found.
 
 Step 4:
-Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool. 
-Also, it's very important to replace the query parameters ($1, $2, etc) with the actual values. Generate your own values, but
-  take into account the data types of the columns.
+Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool
+and the queryid from the getSlowQueries output. Note that safeExplainQuery automatically fetches the query
+from pg_stat_statements using the queryid, so you cannot modify the query text. If the query contains
+placeholders ($1, $2, etc), you may need to use unsafeExplainQuery instead with manually replaced values.
 
 Step 5:
 If the previous step indicates that an index is missing, tell the user the exact DDL to create the index.

Analysis

Incorrect safeExplainQuery instructions in slow queries playbook

What fails: SLOW_QUERIES_PLAYBOOK Step 4 instructs users to "replace the query parameters ($1, $2, etc) with actual values" when using safeExplainQuery, but this tool fetches queries by queryId from pg_stat_statements automatically, making parameter replacement impossible.

How to reproduce:

  1. Call getSlowQueries() to get queries with placeholders like "$1, $2"
  2. Follow playbook Step 4 instructions to use safeExplainQuery with the queryId
  3. safeExplainQuery fetches the query from pg_stat_statements WHERE queryid = $1 and detects placeholders
  4. Function returns error: "The query seems to contain placeholders ($1, $2, etc). Replace them with actual values and try again."

Result: Users receive contradictory error message telling them to replace parameters they cannot modify, since safeExplainQuery automatically retrieves the query text from the database.

Expected: Instructions should clarify that safeExplainQuery uses queryId to automatically fetch queries from pg_stat_statements, and suggest using unsafeExplainQuery when parameter replacement is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, we should remove that instruction to the model.

await client.query("SET LOCAL lock_timeout = '200ms'");
await client.query(`SET search_path TO ${schema}`);
const explainQuery = `EXPLAIN ${query}`;
const explainQuery = hasPlaceholders ? `EXPLAIN (GENERIC_PLAN true) ${query}` : `EXPLAIN ${query}`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Step 4:
Use the tool unsafeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool.
Use the tool safeExplainQuery to explain the slow queries. Make sure to pass the schema you found to the tool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, we should remove that instruction to the model.

@divyenduz divyenduz merged commit e4d4887 into safe_explain Sep 26, 2025
6 checks passed
@divyenduz divyenduz deleted the use_safe_explain branch September 26, 2025 17:45
github-merge-queue bot pushed a commit that referenced this pull request Sep 26, 2025
Related discussion #223

We already improved it in #224, this
PR makes it more secure by

1. Renaming `explainQuery` to `unsafeExplainQuery` while the single
query check and transaction wrapping is pretty secure already, there
might be edge cases where a query escapes the designated boundaries.
Therefore, we rename the tool to `unsafeExplainQuery`
2. Create a new tool called `safeExplainQuery`, this operates on
`queryId` and fetches the actual SQL from `pg_stat_statements` table
itself thereby eliminating the code path that can lead to any SQL
injection. This is done in the following stacked PR
#226
3. Use the new `safeExplainQuery` tool instead of `unsafeExplainQuery`,
to make it work, we had to additionally return `queryId` from
`getSlowQueries` tool in addition to the slow SQL query. This is done in
the following stacked PR #227
SolveEve pushed a commit to SolveEve/agent that referenced this pull request Nov 15, 2025
Related discussion xataio/agent#223

We already improved it in xataio/agent#224, this
PR makes it more secure by

1. Renaming `explainQuery` to `unsafeExplainQuery` while the single
query check and transaction wrapping is pretty secure already, there
might be edge cases where a query escapes the designated boundaries.
Therefore, we rename the tool to `unsafeExplainQuery`
2. Create a new tool called `safeExplainQuery`, this operates on
`queryId` and fetches the actual SQL from `pg_stat_statements` table
itself thereby eliminating the code path that can lead to any SQL
injection. This is done in the following stacked PR
xataio/agent#226
3. Use the new `safeExplainQuery` tool instead of `unsafeExplainQuery`,
to make it work, we had to additionally return `queryId` from
`getSlowQueries` tool in addition to the slow SQL query. This is done in
the following stacked PR xataio/agent#227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants