Skip to content

Commit 5efb155

Browse files
authored
Update 20250321_02.md
1 parent 00a854c commit 5efb155

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed

202503/20250321_02.md

Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,290 @@ docker compose up
100100
- [ ] 为代理添加审批工作流以运行具有潜在危险的语句
101101
- [ ] 允许配置可根据监控计划定义的工具
102102

103+
## 附, 剧本
104+
```
105+
export interface Playbook {
106+
name: string;
107+
description: string;
108+
content: string;
109+
isBuiltIn: boolean;
110+
}
111+
112+
const SLOW_QUERIES_PLAYBOOK = `
113+
Follow the following steps to find and troubleshoot slow queries:
114+
115+
Step 1:
116+
Use the tool getSlowQueries to find the slow queries.
117+
118+
Step 2:
119+
Pick a query to investigate. This doesn't have to be the slowest query.
120+
Prefer a SELECT query, avoid UPDATE, DELETE, INSERT.
121+
Avoid introspection queries, like the ones involving pg_catalog or information_schema. THIS IS VERY IMPORTANT.
122+
Avoid queries on the kine table.
123+
Include the query in your summary, but format it on multiple lines, so that no line is longer than 80 characters.
124+
125+
126+
Step 3:
127+
Use the tool findTableSchema to find the schema of the table involved in the slow query you picked.
128+
Use the tool describeTable to describe the table you found.
129+
130+
Step 4:
131+
Use the tool explainQuery to explain the slow queries. Make sure to pass the schema you found to the tool.
132+
Also, it's very important to replace the query parameters ($1, $2, etc) with the actual values. Generate your own values, but
133+
take into account the data types of the columns.
134+
135+
Step 5:
136+
If the previous step indicates that an index is missing, tell the user the exact DDL to create the index.
137+
138+
At the end:
139+
After you are finished, make a summary of your findings: the slow query summary (don't include the actual query unless it's short),
140+
the reason for which is slow, and the DDL to create the index if you found one. Also say what sort of improvement the user can expect
141+
from the index.
142+
`;
143+
144+
const GENERAL_MONITORING_PLAYBOOK = `
145+
Objective:
146+
To assess and ensure the optimal performance of the PostgreSQL database by reviewing key metrics, logs, and slow queries.
147+
148+
Step 1:
149+
Check CPU Utilization:
150+
151+
Retrieve and analyze the CPU utilization metrics.
152+
Ensure CPU usage is within acceptable limits (e.g., below 60%).
153+
154+
Step 2:
155+
Review Other Key Metrics:
156+
157+
Freeable Memory: Ensure sufficient memory is available (e.g., above 20 GB).
158+
Database Connections: Monitor for spikes; ensure connections are within expected limits.
159+
Read/Write IOPS: Check for any unusual spikes or bottlenecks.
160+
Disk Queue Depth: Ensure it remains at 0 to avoid I/O bottlenecks.
161+
162+
Step 3:
163+
Analyze Logs:
164+
165+
Retrieve recent logs and look for warnings or errors.
166+
167+
Step 4:
168+
Evaluate Slow Queries:
169+
170+
Retrieve and review slow queries.
171+
Identify known queries and ensure they are optimized or deemed acceptable.
172+
173+
Step 5:
174+
Document Findings:
175+
176+
Record any issues found and actions taken.
177+
Note any recurring patterns or areas for improvement.
178+
`;
179+
180+
const TUNING_PLAYBOOK = `
181+
Objective: Recommend performance and vacuum settings for the database.
182+
183+
Step 1:
184+
Use the getTablesAndInstanceInfo tool to gather what you know about the database and the cluster/instance type
185+
186+
Step 2:
187+
Think about what CPU/memory does that AWS instance class have?
188+
189+
Step 3:
190+
Given the information you collected above, think about the ideal settings for the following parameters:
191+
- max_connections
192+
- shared_buffers
193+
- effective_cache_size
194+
- maintenance_work_mem
195+
- checkpoint_completion_target
196+
- wal_buffers
197+
- default_statistics_target
198+
- random_page_cost
199+
- effective_io_concurrency
200+
- work_mem
201+
- huge_pages
202+
- min_wal_size
203+
- max_wal_size
204+
- max_worker_processes
205+
- max_parallel_workers_per_gather
206+
- max_parallel_workers
207+
- max_parallel_maintenance_workers.
208+
209+
Step 4:
210+
Now compare with the value you read via the tool getPerformanceAndVacuumSettings and see if there's anything you'd change.
211+
212+
Report your findings in a structured way, with the settings you'd change, and the reason for the change. Highlight the most important changes first.
213+
`;
214+
215+
const INVESTIGATE_HIGH_CPU_USAGE_PLAYBOOK = `
216+
Objective:
217+
To investigate and resolve high CPU usage in the PostgreSQL database.
218+
219+
Step 1:
220+
Use the tool getCurrentActiveQueries to get the currently active queries. Consider the state and the duration of the queries,
221+
to see if there is any particular query that is causing the high CPU usage. If it is, report that to the user.
222+
223+
Step 2:
224+
Check if there are any queries that are blocked waiting on locks. Use the tool getQueriesWaitingOnLocks to get the queries that are blocked waiting on locks.
225+
If there are, report that to the user.
226+
227+
Step 3:
228+
Check IOPS and disk queue depth. Use the tool getInstanceMetric to get the IOPS and disk queue depth.
229+
If there are any unusual spikes or bottlenecks, report that to the user.
230+
231+
Step 4:
232+
Get the vacuum stats for the top tables in the database. Use the tool getVacuumStats to get the vacuum stats.
233+
If there are any tables with a high number of dead tuples, report that to the user.
234+
235+
Step 5:
236+
Check the slow queries. Use the tool getSlowQueries to get the slow queries.
237+
If there are any slow queries, report that to the user.
238+
239+
Step 6:
240+
Check the logs. Use the tool getLogs to get the logs.
241+
If there are any unusual logs, report that to the user.
242+
243+
Step 7:
244+
Based on all the information you have gathered, make a summary of your findings and report them to the user.
245+
Be very specific about the queries you found and the reason for which they are slow.
246+
`;
247+
248+
const INVESTIGATE_HIGH_CONNECTION_COUNT_PLAYBOOK = `
249+
Objective:
250+
To investigate and resolve high connection count in the PostgreSQL database.
251+
252+
Step 1:
253+
Use the tool getConnectionsStats to get the connections stats. If the
254+
percentage of connections utilization is very low, you can stop here. Proceed with the next step only if the
255+
percentage is at least 20%.
256+
257+
Step 2:
258+
Get the metric for the number of connections. Check if the trend is upwards and consider
259+
how much time there is until the max is reached. If it looks like the max will be reached in the
260+
next hour, this should be alert level.
261+
262+
Step 3:
263+
If the percentage of connections utilization is high, get the instance info
264+
(with the tool getTablesAndInstanceInfo) and think about the stats you have gathered so far.
265+
Is the max_connections appropriate for the instance type? Are there many idle connections?
266+
267+
Step 4:
268+
Call the tool getConnectionsGroups to get an overview of the open connections.
269+
Try to figure out where are the bulk of the connections coming from.
270+
Are there many many "idle in transaction" connections? Think about the wait_event as well.
271+
272+
Step 5:
273+
If there are many idle connections, get the oldest idle connections with the tool getOldestIdleConnections.
274+
275+
Step 6:
276+
Based on all the information you have gathered, make a summary of your findings and report them to the user.
277+
Provide actionable advice to the user. If for example you recommend killing old idle connections,
278+
provide the query to do so. However, use judgement in selecting only the connections that are least likely to
279+
impact users (for example, because they are very old).
280+
If you recommend changing the max_connections parameter, provide the new value.
281+
`;
282+
283+
const INVESTIGATE_LOW_MEMORY_PLAYBOOK = `
284+
Objective:
285+
To investigate and resolve low freeable memory in the PostgreSQL database.
286+
287+
Step 1:
288+
Get the freeable memory metric using the tool getInstanceMetric.
289+
290+
Step 3:
291+
Get the instance details and compare the freeable memory with the amount of memory available.
292+
293+
Step 4:
294+
Check the logs for any indications of memory pressure or out of memory errors. If there are,
295+
make sure to report that to the user. Also this would mean that the situation is critical.
296+
297+
Step 4:
298+
Check active queries. Use the tool getConnectionsGroups to get the currently active queries.
299+
If a user or application stands out for doing a lot of work, record that to indicate to the user.
300+
301+
Step 5:
302+
Check the work_mem setting and shared_buffers setting. Think if it would make sense to reduce these
303+
in order to free up memory.
304+
305+
Step 6:
306+
If there is no clear root cause for using memory, suggest to the user to scale up the Postgres instance.
307+
Recommend a particular instance class.
308+
`;
309+
310+
export function getPlaybook(name: string): string {
311+
switch (name) {
312+
case 'investigateSlowQueries':
313+
return SLOW_QUERIES_PLAYBOOK;
314+
case 'generalMonitoring':
315+
return GENERAL_MONITORING_PLAYBOOK;
316+
case 'tuneSettings':
317+
return TUNING_PLAYBOOK;
318+
case 'investigateHighCpuUsage':
319+
return INVESTIGATE_HIGH_CPU_USAGE_PLAYBOOK;
320+
case 'investigateHighConnectionCount':
321+
return INVESTIGATE_HIGH_CONNECTION_COUNT_PLAYBOOK;
322+
case 'investigateLowMemory':
323+
return INVESTIGATE_LOW_MEMORY_PLAYBOOK;
324+
default:
325+
return `Error:Playbook ${name} not found`;
326+
}
327+
}
328+
329+
export function listPlaybooks(): string[] {
330+
return [
331+
'generalMonitoring',
332+
'investigateSlowQueries',
333+
'investigateHighCpuUsage',
334+
'investigateLowMemory',
335+
'investigateHighConnectionCount',
336+
'tuneSettings'
337+
];
338+
}
339+
340+
export function getBuiltInPlaybooks(): Playbook[] {
341+
return [
342+
{
343+
name: 'generalMonitoring',
344+
description: 'General monitoring of the database, checking logs, slow queries, main metrics, etc.',
345+
content: GENERAL_MONITORING_PLAYBOOK,
346+
isBuiltIn: true
347+
},
348+
{
349+
name: 'investigateSlowQueries',
350+
description: 'Investigate slow queries using pg_stat_statements and EXPLAIN calls.',
351+
content: SLOW_QUERIES_PLAYBOOK,
352+
isBuiltIn: true
353+
},
354+
{
355+
name: 'investigateHighCpuUsage',
356+
description: 'Investigate high CPU usage. This playbook should be execute while the CPU usage is elevated.',
357+
content: INVESTIGATE_HIGH_CPU_USAGE_PLAYBOOK,
358+
isBuiltIn: true
359+
},
360+
{
361+
name: 'investigateLowMemory',
362+
description: 'Investigate low freeable memory. This playbook should be execute while the freeable memory is low.',
363+
content: INVESTIGATE_LOW_MEMORY_PLAYBOOK,
364+
isBuiltIn: true
365+
},
366+
{
367+
name: 'investigateHighConnectionCount',
368+
description:
369+
'Investigate high connection count. This playbook should be execute while the connection count is elevated.',
370+
content: INVESTIGATE_HIGH_CONNECTION_COUNT_PLAYBOOK,
371+
isBuiltIn: true
372+
},
373+
{
374+
name: 'tuneSettings',
375+
description: 'Tune configuration settings for the database, based on the instance type, the database schema. ',
376+
content: TUNING_PLAYBOOK,
377+
isBuiltIn: true
378+
}
379+
];
380+
}
381+
382+
export function getPlaybookDetails(name: string): Playbook | undefined {
383+
return getBuiltInPlaybooks().find((playbook) => playbook.name === name);
384+
}
385+
```
386+
103387
虽然 Agent 本质上主要是您自行托管的开源项目,但我们也在开发云版本。云版本的优势在于某些集成更易于安装。
104388

105389
国内的数据库生态产品是否会跟进呢, 期待Ninedata, Apecloud, dbdoctor, dsmart等国内数据库生态产品最新消息.

0 commit comments

Comments
 (0)