Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Personal/xi/nl2sql refactor+eval #351

Open
wants to merge 16 commits into
base: feature
Choose a base branch
from
Open

Conversation

aero-xi
Copy link
Collaborator

@aero-xi aero-xi commented Jan 23, 2025

重构代码逻辑
增加公开数据集上的批量评测功能
修复若干bugs

@aero-xi aero-xi requested a review from moria97 January 23, 2025 10:37
# 移除默认的日志处理器
logger.remove()
# 添加一个新的日志处理器,指定最低日志级别为 INFO,并输出到指定文件
log_file_path = "/Users/chuyu/Documents/rag_doc/text2sql_evaluation/spider/spider_eval.log" # 指定日志文件路径
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用本地路径,可以用/tmp之类的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新,log_file_path = "/tmp/log/spider/spider_eval.log"

else:
embed_model_bge = None

llm = DashScope(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议这里不要直接用DashScope和DashScopeEmbeddin,尽量用PaiEMbedding和PaiLLM,这样代码好维护

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新

# 移除默认的日志处理器
logger.remove()
# 添加一个新的日志处理器,指定最低日志级别为 INFO,并输出到指定文件
log_file_path = "/Users/chuyu/Documents/rag_doc/text2sql_evaluation/spider/bird_eval.log" # 指定日志文件路径
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

路径需要修复一下

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新,log_file_path = "/tmp/log/bird/bird_eval.log"

@@ -327,7 +327,7 @@ def _add_nodes_to_index(
seen_node_ids = set(self.index_struct.nodes_dict.values())
for node in nodes:
if not self.vector_store.stores_text and node.node_id in seen_node_ids:
logger.info(
logger.debug(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这条日志为啥改成了debug

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

批量测评,尤其在bird上,log显示太多了

)


DEFAULT_SQL_REVISION_PROMPT = PromptTemplate(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里prompt为啥中英文混杂。。。是不是后面统一写成英文好些?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已统一成英文

tables = [item["table_name"] for item in db_description_dict["table_info"]]
for table_name in tables:
all_table_descriptions.append(
_get_table_desc(table_name, db_description_dict["table_info"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是写的有问题,不需要先得到所有的table名字,再找到对应的info,直接便利all_table_descriptions更简单

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新,直接遍历

)
from pai_rag.integrations.data_analysis.data_analysis_config import SqlAnalysisConfig

cls_cache = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

模块resolve机制全部写在单独的位置,不能每个文件都有。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新

)


class BirdEvaluator(SQLEvaluator):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件可以放在evaluations目录,文件不要太长,可以把brid和spider分开写

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已重新放置并拆分

assert False, "Error col: {}".format(tok)


def parse_col_unit(toks, start_idx, tables_with_alias, schema, default_tables=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的很多文件感觉分不清,parse和process_sql的关系是?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已拆分为eval_spider和eval_bird

from itertools import chain


threadLock = threading.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个锁是干啥的

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里直接搬的spider repo上的评测代码,看了下好像确实没有用到,不排除是遗留代码,暂且先和repo中的保持一致吧

@aero-xi aero-xi force-pushed the personal/xi/nl2sql_op2 branch from 717731c to 837d634 Compare January 26, 2025 06:02
Copy link

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
9393 4692 50% 40% 🟢

New Files

File Coverage Status
src/pai_rag/integrations/data_analysis/text2sql/db_connector.py 38% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_info_collector.py 14% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_info_index.py 51% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_info_node.py 29% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_info_retriever.py 35% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_info_selector.py 48% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_loader.py 20% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_query.py 25% 🟢
src/pai_rag/integrations/data_analysis/text2sql/db_retriever_filter.py 25% 🟢
src/pai_rag/integrations/data_analysis/text2sql/query_processor.py 63% 🟢
src/pai_rag/integrations/data_analysis/text2sql/sql_generator.py 23% 🟢
src/pai_rag/integrations/data_analysis/text2sql/utils/constants.py 100% 🟢
src/pai_rag/integrations/data_analysis/text2sql/utils/info_utils.py 14% 🟢
src/pai_rag/integrations/data_analysis/text2sql/utils/prompts.py 100% 🟢
src/pai_rag/integrations/data_analysis/text2sql/utils/sql_utils.py 38% 🟢
TOTAL 42% 🟢

Modified Files

File Coverage Status
src/pai_rag/core/rag_module.py 83% 🟢
src/pai_rag/integrations/data_analysis/data_analysis_tool.py 45% 🟢
src/pai_rag/integrations/index/pai/multimodal/multimodal_index.py 53% 🟢
src/pai_rag/integrations/vector_stores/faiss/my_faiss.py 87% 🟢
TOTAL 67% 🟢

updated for commit: 171fedc by action🐍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants