Skip to content

Commit 1f72d05

Browse files
committed
docs: add readme and evaluation metrics for chart recommendation dataset
1 parent 96b7ba3 commit 1f72d05

13 files changed

+193
-58
lines changed

README.md

+3
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,9 @@ $ pnpm dev
133133
$ pnpm build
134134
```
135135

136+
## 🤖 Chart Recommendation Dataset
137+
The chart recommendation dataset is designed to evaluate or fine-tune large language models on their ability to recommend chart types based on given data. The dataset currently encompasses 16 types of charts, with 1-3 different data scenarios per chart type, and more than 15 chart data instances for each scenario. The dataset is continuously updated, and we welcome contributions of chart data collected from your own use cases. For more detailed information about the dataset, please visit [evaluations/recommend](https://github.com/antvis/GPT-Vis/tree/main/evaluations/recommend/README.md).
138+
136139
## License
137140

138141
[MIT](./LICENSE)

README.zh-CN.md

+3
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,9 @@ set_gpt_vis(content)
120120

121121
更多了解 👉 [streamlit-gpt-vis](https://github.com/antvis/GPT-Vis/bindings/streamlit-gpt-vis)
122122

123+
## 🤖 图表模型推荐数据集
124+
图表推荐数据集用于评测/微调大模型在“给定数据,推荐图表类型”任务上的能力。数据集目前涵盖了 16 种图表类型,每种图表类型下 1-3 个不同数据场景,每个场景下 15+ 个图表数据。数据会持续更新,也欢迎向我们贡献你的使用场景中收集的图表数据。数据集详细信息见 [evaluations/recommend](https://github.com/antvis/GPT-Vis/tree/main/evaluations/recommend/README.md)
125+
123126
## 💻 本地开发
124127

125128
```bash

evaluations/datastes/recommend/README.en.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,17 @@ In each data entry, source represents the user input. source.data contains the o
2525
"y": ["Population"] // Field for y-axis
2626
}
2727
}
28-
]
28+
]s
2929
}
3030
```
3131

3232
### Model Fine-Tuning Dataset
3333
The gpt_vis_train.jsonl file is a fine-tuning training dataset generated from the above original chart data. The generation strategy is as follows: randomly select half of the cases for each chart type (the remaining data is used for evaluation). Since the number of original data entries varies for each chart type, to avoid imbalanced chart quantities affecting recommendation results, some chart data entries are repeated a certain number of times to ensure there are 60 entries for each chart type in the training set.
3434

3535
### Evaluation Result File
36-
The evalResult.json file contains the results of our model evaluation after fine-tuning. In this file, every source entry is the original input, target is the expected output, and generation is the model's output. Comparing these entries allows the evaluation of recommendation accuracy.
36+
The `metrics.json` file contains the results of our model evaluation after fine-tuning. In this file, every source entry is the original input, target is the expected output, and generation is the model's output. Comparing these entries allows the evaluation of recommendation accuracy.
3737

3838
## Model's Performance on Chart Recommendation Task
39-
Using the above datasets, we achieved a chart type accuracy of 85% and an encode accuracy of 70% with fine-tuning based on the `qwen2.5-14b-instruct`.
39+
Using the above datasets, we achieved a chart type accuracy of 89% and an encode accuracy of 82% with fine-tuning based on the `qwen2.5-14b-instruct`.
4040

4141
It is important to note that the model recommendations can satisfy the requirement of "providing data and returning chart and configuration" in most scenarios. However, the model's output is not entirely controlled, which may result in invalid output or charts that cannot be successfully rendered. We recommend combining these with the recommendation modules in [@antv/ava](https://ava.antv.antgroup.com/api/advice/advisor). In scenarios where the model performance is suboptimal or where traditional rules fulfill the recommendation requirements, rule-based recommendation pipelines can be used as a fallback.

evaluations/datastes/recommend/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@
3434
`gpt_vis_train.jsonl` 文件是我们使用上述原始图表数据集生成的微调训练数据集。生成策略如下:每种图表随机抽取一半的 case (剩余数据用作评测),由于每种图表原始数据条数不同,为避免图表数量不均衡影响推荐结果,通过将部分图表数据重复一定次数,保证每种图表在训练集中有 60 条数据。
3535

3636
### 评测结果文件
37-
`evalResult.json` 文件是我们执行模型微调后,用模型评测的结果,其中每条数据的 `source` 为原始输入,`target` 为期望输出,`generation` 为模型输出,通过比对可以评估推荐的准确率
37+
`metrics.json` 文件是我们执行模型微调后,用模型评测的结果,其中每条数据的 `source` 为原始输入,`target` 为期望输出,`generation` 为模型输出,`correctness` 为图表类型是否推荐正确,`encodeScore` 为图表配置推荐结果的打分。评测指标的计算参考 `eval/eval-recommend.js` 文件
3838

3939
## 模型推荐图表效果说明
40-
我们使用上述数据集,基于 `qwen2.5-14b-instruct` 微调后的图表类型准确率可达 85%,`encode` 准确率达 70%。
40+
我们使用上述数据集,基于 `qwen2.5-14b-instruct` 微调后的图表类型准确率可达 89%,`encode` 准确率达 82%。
4141

4242
需要注意的是,模型推荐在大部分场景下能够满足“给出数据,返回图表及配置”的需求,但模型输出不完全可控,存在输出结果不合法、输出的图表无法绘制成功等情况。推荐结合 [@antv/ava](https://ava.antv.antgroup.com/api/advice/advisor) 中的推荐模块使用,在模型效果不佳或者传统规则已满足推荐需求的场景下,可以使用规则推荐的工程链路进行兜底。

evaluations/datastes/recommend/evalResult.json evaluations/datastes/recommend/eval.json

+41-32
Original file line numberDiff line numberDiff line change
@@ -646,7 +646,7 @@
646646
},
647647
"target": [
648648
{
649-
"type": "mutiple",
649+
"type": "multiple",
650650
"encode": {
651651
"x": [
652652
"有效期至"
@@ -3046,38 +3046,38 @@
30463046
"Year": 1916,
30473047
"Deaths": 300
30483048
}
3049-
],
3050-
"target": [
3051-
{
3052-
"type": "heatmap",
3053-
"encode": {
3054-
"x": [
3055-
"Entity"
3056-
],
3057-
"y": [
3058-
"Year"
3059-
],
3060-
"size": [
3061-
"Deaths"
3062-
]
3063-
}
3064-
},
3065-
{
3066-
"type": "scatter",
3067-
"encode": {
3068-
"x": [
3069-
"Entity"
3070-
],
3071-
"y": [
3072-
"Year"
3073-
],
3074-
"size": [
3075-
"Deaths"
3076-
]
3077-
}
3078-
}
30793049
]
30803050
},
3051+
"target": [
3052+
{
3053+
"type": "heatmap",
3054+
"encode": {
3055+
"x": [
3056+
"Entity"
3057+
],
3058+
"y": [
3059+
"Year"
3060+
],
3061+
"size": [
3062+
"Deaths"
3063+
]
3064+
}
3065+
},
3066+
{
3067+
"type": "scatter",
3068+
"encode": {
3069+
"x": [
3070+
"Entity"
3071+
],
3072+
"y": [
3073+
"Year"
3074+
],
3075+
"size": [
3076+
"Deaths"
3077+
]
3078+
}
3079+
}
3080+
],
30813081
"generation": [
30823082
{
30833083
"type": "line",
@@ -9912,6 +9912,15 @@
99129912
}
99139913
]
99149914
},
9915+
"target": [
9916+
{
9917+
"type": "scatter",
9918+
"encode": {
9919+
"x": ["排队时间"],
9920+
"y": ["满意度"]
9921+
}
9922+
}
9923+
],
99159924
"generation": [
99169925
{
99179926
"type": "scatter",
@@ -10220,4 +10229,4 @@
1022010229
}
1022110230
]
1022210231
}
10223-
]
10232+
]

evaluations/datastes/recommend/heatmap/01_two_dim_one_measure.json

+18-18
Original file line numberDiff line numberDiff line change
@@ -126,26 +126,26 @@
126126
"Year": 1916,
127127
"Deaths": 300
128128
}
129-
],
130-
"target": [
131-
{
132-
"type": "heatmap",
133-
"encode": {
134-
"x": ["Entity"],
135-
"y": ["Year"],
136-
"size": ["Deaths"]
137-
}
138-
},
139-
{
140-
"type": "scatter",
141-
"encode": {
142-
"x": ["Entity"],
143-
"y": ["Year"],
144-
"size": ["Deaths"]
145-
}
129+
]
130+
},
131+
"target": [
132+
{
133+
"type": "heatmap",
134+
"encode": {
135+
"x": ["Entity"],
136+
"y": ["Year"],
137+
"size": ["Deaths"]
138+
}
139+
},
140+
{
141+
"type": "scatter",
142+
"encode": {
143+
"x": ["Entity"],
144+
"y": ["Year"],
145+
"size": ["Deaths"]
146146
}
147+
}
147148
]
148-
}
149149
},
150150
{
151151
"source": {

evaluations/datastes/recommend/metrics.json

+1
Large diffs are not rendered by default.

evaluations/datastes/recommend/multiple-axes/01_base.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@
375375
},
376376
"target": [
377377
{
378-
"type": "mutiple",
378+
"type": "multiple",
379379
"encode": {
380380
"x": [
381381
"有效期至"

evaluations/datastes/recommend/scatter/01_two_measure_correlate.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
{"排队时间":15, "满意度": 1}
1919
]
2020
},
21-
"targe": [
21+
"target": [
2222
{
2323
"type": "scatter",
2424
"encode": {

evaluations/package.json

+1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
"scripts": {
55
"eval:data": "node ./scripts/eval/eval-data.js",
66
"eval:metrics": "node ./scripts/eval/eval-metrics.js",
7+
"eval:chart-recommend": "node ./scripts/eval/eval-recommend.js",
78
"prompt": "node ./scripts/prompt/generate-prompts.js"
89
},
910
"devDependencies": {
+70
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
import { evaluateChartEncodes } from '../helpers/evaluate-chart-encode.js';
2+
import { readDataset, writeDataset } from '../helpers/read-dataset.js';
3+
import _ from 'lodash'
4+
5+
export const evaluateChartRecommend = async () => {
6+
const evalDatasetPath = `datastes/recommend/eval.json`;
7+
const testDataset = await readDataset(evalDatasetPath);
8+
console.log('datasets count: ', testDataset.length);
9+
console.log('Beginning eval datasets...');
10+
const misMap = new Map();
11+
const scoredData = testDataset.map(data => {
12+
const target = data.target?.[0] ?? data.target;
13+
const gen = data.generation?.[0] ?? data.generation;
14+
if(!gen || !target) {
15+
// 数据缺失
16+
misMap.set('missing', (misMap.get('missing') ?? 0) + 1);
17+
return;
18+
}
19+
const chartTypeScore = gen.type === target.type ? 1 : 0;
20+
const encodeScore = evaluateChartEncodes(gen.encode, target.encode);
21+
if (!chartTypeScore) {
22+
const key = `${target.type}_to_${gen.type}`;
23+
if (misMap.has(key)) {
24+
misMap.set(key, misMap.get(key) + 1);
25+
} else {
26+
misMap.set(key, 1);
27+
}
28+
}
29+
return {
30+
...data,
31+
correctness: chartTypeScore,
32+
encodeScore,
33+
}
34+
}).filter(data => data);
35+
// save evaluate result
36+
await writeDataset(`datastes/recommend/metrics.json`, scoredData);
37+
// output metrics
38+
let score = 0;
39+
let chartTypeScore = 0;
40+
let encodeScore = 0;
41+
let chartTypeScoreMap = {}
42+
scoredData.forEach(data => {
43+
chartTypeScore += data.correctness;
44+
encodeScore += data.encodeScore;
45+
const target = data.target?.[0] ?? data.target;
46+
const chartType = target?.type
47+
chartTypeScoreMap[chartType] = {
48+
chartTypeScore: (chartTypeScoreMap[chartType]?.chartTypeScore ?? 0) + data.correctness,
49+
encodeScore: (chartTypeScoreMap[chartType]?.encodeScore ?? 0) + data.encodeScore,
50+
count: (chartTypeScoreMap[chartType]?.count ?? 0) + 1,
51+
};
52+
});
53+
score /= testDataset.length;
54+
chartTypeScore /= testDataset.length;
55+
encodeScore /= testDataset.length;
56+
chartTypeScoreMap = _.mapValues(chartTypeScoreMap, ((score) => ({
57+
chartTypeScore: score.chartTypeScore/score.count,
58+
encodeScore: score.encodeScore/score.count,
59+
count: score.count
60+
})))
61+
console.log('scoredData.length', scoredData.length, 'datasets count: ', testDataset.length)
62+
console.log('chart type recommend accuracy:', chartTypeScore)
63+
console.log('chart encode score:', encodeScore)
64+
console.log('misclassified:', misMap);
65+
console.log('chartTypeScoreMap', chartTypeScoreMap)
66+
}
67+
68+
evaluateChartRecommend().catch((error) => {
69+
console.error('Error evaluating chart recommendation:', error);
70+
});
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import _ from 'lodash';
2+
3+
const jaccardSimilarity = (a, b) => {
4+
const base = _.union(a, b).length;
5+
if (base === 0) return 0;
6+
return _.intersection(a, b).length / base;
7+
};
8+
9+
const removeEmpty = (v) => {
10+
const res = {};
11+
_.each(v, (value, key) => {
12+
if (typeof value === 'string') {
13+
if (value) {
14+
res[key] = value;
15+
}
16+
} else if (Array.isArray(value)) {
17+
if (value.length) {
18+
res[key] = value;
19+
}
20+
}
21+
});
22+
return res;
23+
};
24+
25+
/** 根据推荐的 chart encode 和期望的 chart encode 的相似度打分*/
26+
export const evaluateChartEncodes = (gen, ref) => {
27+
try {
28+
const v = [];
29+
const a = removeEmpty(gen);
30+
const b = removeEmpty(ref);
31+
const allKeys = _.union(Object.keys(a), Object.keys(b));
32+
_.each(allKeys, (key) => {
33+
const ao = a[key];
34+
const bo = b[key];
35+
if (!ao && !bo) {
36+
return;
37+
}
38+
if (Array.isArray(ao) && Array.isArray(bo)) {
39+
v.push(jaccardSimilarity(ao, bo));
40+
} else {
41+
v.push(ao === bo ? 1 : 0);
42+
}
43+
});
44+
return v.length ? _.sum(v) / v.length : gen == ref;
45+
} catch (e) {
46+
console.log('error', e)
47+
return 0;
48+
}
49+
};

evaluations/scripts/helpers/read-dataset.js

-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ export const readDataset = async (filePath) => {
66
const absolutefilePath = resolve(__dirProject, filePath);
77

88
const content = await readFile(absolutefilePath, 'utf-8');
9-
109
const data = JSON.parse(content);
1110

1211
return data;

0 commit comments

Comments
 (0)