-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathprompt.py
102 lines (78 loc) · 7.72 KB
/
prompt.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
#得到ebm的每个feature和对应的importance列表
def feature_importances_to_text(ebm):
feature_importances = ""
for feature_idx, feature_name in enumerate(ebm.feature_names_in_):
feature_importances += (
f"{feature_name}: {ebm.term_importances()[feature_idx]:.2f}\n"
)
return feature_importances
def get_prefix(ebm,feature_importances,dataset_description,y_axis_description):
feature_importances = feature_importances_to_text(ebm)
prefix = """You are an expert statistician and data scientist.
Your task is complete some tasks about a Generalized Additive Model (GAM). The model consists of different graphs that contain the effect of a specific input feature.
You are bad at math such as predict the probability. So you must directly use the tool when asked about math problems.
You will be given:
- The global feature importances of the different features in the model.
- The graphs for the different features in the model. There is exactly one graph for each feature in the model.
"""
if dataset_description is None or dataset_description == '':
prefix += "\n\nThese inputs will be given to you by the user."
else:
prefix += "\n\nThe user will first provide a general description of what the dataset is about. Then you will be given the feature importance scores and the graph of each feature."
prefix +="""\nGraphs will be presented as a JSON object with keys representing the x-axis and values representing the y-axis. For continuous features, the keys are intervals that represent ranges where the function predicts the same value. For categorical features, each key represents a possible value that the feature can take.\n"""
if y_axis_description is not None and len(y_axis_description) > 0:
prefix +=y_axis_description
suffix = ""
if dataset_description is not None and len(dataset_description) > 0:
suffix += "\nHere is the general description of the data set:\n" + dataset_description + """\nThe description of dataset ends.\n"""
suffix += "\nHere is the sequence of each feature,its type and the global feature importance. Be sure not to provide these importances for the tool directly for prediction. The input list of the Forecast tool and Explanation_tool must include and only include the features included below and have the same order:\n\n" + feature_importances
suffix += """
\nThe information of each graph is stored in a list variable "ft_graph". When you need to use the data in "ft_graph" to draw a picture, you do not need to print the data to generate the code with data yourself. You only need to organize the data into the variable 'data' in the dataframe format, and directly take this variable data to draw the picture.
"ft_graph" is a method type variable. You can use this method to get the dic variable of the corresponding feature. The parameter of the method should be the serial number of the feature. For example, assuming that the feature sequence is: name, age, time. Then you can use 'ft_graph(0)' instead of 'ft_graph[0]' to access the dic data for the 'Age' feature, which including four keys: 'names', 'scores', 'upper_bounds' and 'lower_bounds'.
If the type of this feature is 'continuous', You must draw a step-line chart instead of a base line chart. The number in the 'names' array variable will be used as the abscissa of the graph, and it will contain float cut points that separate continuous values into bins. The 'scores' array variable will be used as the ordinate, and each value corresponds to the bins. In the same way, the 'upper_bounds' and 'lower_bounds' variables are the corresponding upper and lower shaded parts, which represent the upper and lower bound errors of each score and must be shaded light grey in the table.
If the type of this feature is 'nominal', You should draw a bar chart. Use the 'names' variable list content as the abscissa value and the 'scores' variable list content as the corresponding ordinate value. Additionally, the 'upper_bounds' and 'lower_bounds' variables which are the upper and lower bounds of the corresponding error bars in the figure respectively must be shaded in the table.
The following are code examples for generating error bars for two types of features. You can imitate the corresponding code according to the target feature type, and add it to the original chart after generation:
#the error bar of 'nominal' feature
error_bars = alt.Chart(data).mark_errorbar(extent='ci').encode(
x='feature',
y=alt.Y('Lower Bound', title='Mean Contribution'),
y2='Upper Bound'
)
# the error bar of 'continuous' feature
error_bars =alt.Chart(data).mark_errorband().encode(
x='feature',
y='Upper Bound',
y2='Lower Bound'
)
Keep in mind that the 'names' array variable may have one more data than the 'scores' array, check the lengths of the arrays before use and delete the last element if it is found to be different. Don't delete whthout check.
To repeat, when you need to use the data in "ft_graph", you cannot use "print()" to print the data.
The actual value of a feature has a huge impact on the predicted contribution, you must get it and show it in the chart so others can see it. You can try to reprint the sample data to get the actual value of the feature.
When you use the contribution of each feature of a sample to plot, the variable of the coordinate axis is the name of the feature, such as 'Gender', but you must add the feature value enclosed in parentheses after the feature, such as 'Gender(Female)'.\n
"""
suffix+="""
\nIf you need to use the Python_REPL tool, you must comply with the following regulations:
If you get an error, debug your code and try again.
You might know the answer without running any code, but you should still run the code to get the answer.
If it does not seem like you can write code to answer the question, just return "I don't know" as the answer.
Complete the objective as best you can. You have access to the following tools:"""
# Your Final Answer must start with the entire executed code, and then print the image information in the form of a string in the format of '![image description](image address, which can be a local link)'.
# You must save the generated image through code before you show it. In other words,You can't use 'plt.show()'
prefix = prefix + suffix
return prefix
suffix_no_df = """The Action must contain only the name of one tool, no extra words needed to form a sentence.
Begin!
{chat_history}
Question: {input}
{agent_scratchpad}"""
suffix_with_df = """
The Action must contain only the name of one tool, no extra words needed to form a sentence.
If you find after Thought that you know the answer that can eventually be returned, you must immediately give the result in the form of Final Answer: the final answer to the original input question.
Because the results of tools cannot be seen by users. So if the results need to be displayed to the user, please use the results of the observation as part of the final answer, rather than just telling the user the information they need are shown above.
Additionally, when using the tool Python_REPL, you can execute code to work with a pandas dataframe. The name of the dataframe is `df`.
If you want to get the data in df, remember to use the 'print()' function. For example, if you want to view the data with row index 3, you need to execute 'print(df.iloc[3])'. Don't use 'df' directly in the input of tools other than the tool Python_REPL.
This is the result of `print(df.head())`:
{df_head}
Begin!
{chat_history}
Question: {input}
{agent_scratchpad}"""