-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Personal/xi/nl2sql refactor+eval #351
base: feature
Are you sure you want to change the base?
Conversation
# 移除默认的日志处理器 | ||
logger.remove() | ||
# 添加一个新的日志处理器,指定最低日志级别为 INFO,并输出到指定文件 | ||
log_file_path = "/Users/chuyu/Documents/rag_doc/text2sql_evaluation/spider/spider_eval.log" # 指定日志文件路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要用本地路径,可以用/tmp之类的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更新,log_file_path = "/tmp/log/spider/spider_eval.log"
else: | ||
embed_model_bge = None | ||
|
||
llm = DashScope( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议这里不要直接用DashScope和DashScopeEmbeddin,尽量用PaiEMbedding和PaiLLM,这样代码好维护
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更新
# 移除默认的日志处理器 | ||
logger.remove() | ||
# 添加一个新的日志处理器,指定最低日志级别为 INFO,并输出到指定文件 | ||
log_file_path = "/Users/chuyu/Documents/rag_doc/text2sql_evaluation/spider/bird_eval.log" # 指定日志文件路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
路径需要修复一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更新,log_file_path = "/tmp/log/bird/bird_eval.log"
@@ -327,7 +327,7 @@ def _add_nodes_to_index( | |||
seen_node_ids = set(self.index_struct.nodes_dict.values()) | |||
for node in nodes: | |||
if not self.vector_store.stores_text and node.node_id in seen_node_ids: | |||
logger.info( | |||
logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这条日志为啥改成了debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
批量测评,尤其在bird上,log显示太多了
) | ||
|
||
|
||
DEFAULT_SQL_REVISION_PROMPT = PromptTemplate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里prompt为啥中英文混杂。。。是不是后面统一写成英文好些?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已统一成英文
tables = [item["table_name"] for item in db_description_dict["table_info"]] | ||
for table_name in tables: | ||
all_table_descriptions.append( | ||
_get_table_desc(table_name, db_description_dict["table_info"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是写的有问题,不需要先得到所有的table名字,再找到对应的info,直接便利all_table_descriptions更简单
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更新,直接遍历
) | ||
from pai_rag.integrations.data_analysis.data_analysis_config import SqlAnalysisConfig | ||
|
||
cls_cache = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
模块resolve机制全部写在单独的位置,不能每个文件都有。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已更新
) | ||
|
||
|
||
class BirdEvaluator(SQLEvaluator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件可以放在evaluations目录,文件不要太长,可以把brid和spider分开写
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已重新放置并拆分
assert False, "Error col: {}".format(tok) | ||
|
||
|
||
def parse_col_unit(toks, start_idx, tables_with_alias, schema, default_tables=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的很多文件感觉分不清,parse和process_sql的关系是?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已拆分为eval_spider和eval_bird
from itertools import chain | ||
|
||
|
||
threadLock = threading.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个锁是干啥的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接搬的spider repo上的评测代码,看了下好像确实没有用到,不排除是遗留代码,暂且先和repo中的保持一致吧
717731c
to
837d634
Compare
☂️ Python Coverage
Overall Coverage
New Files
Modified Files
|
重构代码逻辑
增加公开数据集上的批量评测功能
修复若干bugs