-
Notifications
You must be signed in to change notification settings - Fork 2
[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The head ref may contain hidden characters: "SISC2-42-AI-\uAE40\uC131\uC77C-\uC790\uC0B0\uAC00\uCE58-\uBCC0\uD654-\uCF54\uB4DC-\uC791\uC131"
Conversation
개요새로운 일일 데이터 수집 모듈을 도입하여 yfinance와 PostgreSQL을 사용해 가격, 재무제표, 거시경제 지표 데이터를 수집 및 업데이트합니다. 또한 자산 보고 유틸리티를 리팩토링하고 학습 파이프라인 설정을 업데이트합니다. 변경사항
시퀀스 다이어그램sequenceDiagram
participant Client
participant run_all as run_all()
participant Price as run_price_pipeline()
participant Financials as run_financials_pipeline()
participant Macro as run_macro_pipeline()
participant YF as yfinance
participant DB as PostgreSQL
Client->>run_all: 시작
run_all->>run_all: 설정 구축 (db_name, tickers, series_map)
run_all->>Price: 가격 파이프라인 실행
Price->>DB: 마지막 날짜 조회
Price->>YF: OHLCV 데이터 요청
YF-->>Price: 시계열 데이터 반환
Price->>DB: price_data 테이블에 UPSERT
run_all->>Financials: 재무제표 파이프라인 실행
Financials->>YF: IS/BS/CF 데이터 요청
YF-->>Financials: 재무 데이터 반환
Financials->>DB: financials 테이블에 UPSERT
run_all->>Macro: 거시경제 파이프라인 실행
Macro->>DB: 마지막 날짜 조회
Macro->>YF: 시계열 데이터 요청
YF-->>Macro: 데이터 반환
Macro->>DB: macro_data 테이블에 UPSERT
run_all-->>Client: 완료
코드 리뷰 난이도🎯 3 (중간) | ⏱️ ~25분 추가 검토 필요 영역:
관련 가능성 있는 PR
제안하는 검토자
시
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
AI/libs/utils/daily_inquest.py(1 hunks)AI/libs/utils/save_reports_to_db.py(2 hunks)AI/requirements.txt(1 hunks)AI/transformer/training/train_transformer.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
AI/libs/utils/daily_inquest.py (1)
AI/libs/utils/get_db_conn.py (2)
get_db_conn(105-118)get_engine(121-129)
AI/libs/utils/save_reports_to_db.py (1)
AI/libs/utils/get_db_conn.py (2)
get_db_conn(105-118)get_engine(121-129)
🪛 Ruff (0.14.5)
AI/libs/utils/daily_inquest.py
33-33: Possible SQL injection vector through string-based query construction
(S608)
AI/libs/utils/save_reports_to_db.py
30-30: Do not catch blind exception: Exception
(BLE001)
34-39: Possible SQL injection vector through string-based query construction
(S608)
46-50: Possible SQL injection vector through string-based query construction
(S608)
| engine = get_engine(db_name) | ||
| with engine.connect() as conn: | ||
| res = conn.execute(text(f"SELECT MAX({date_col}) FROM {table};")).scalar() | ||
| if res is None: | ||
| return None | ||
| # res 가 date/datetime 타입이라 가정 | ||
| return res |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
테이블/컬럼명을 그대로 포맷팅하면 SQL 주입 위험이 있습니다
get_last_date_in_table에서 text(f"SELECT MAX({date_col}) FROM {table};")를 사용합니다. 설정 파일이나 호출부에서 table, date_col 값을 잘못 넘기면 임의 SQL이 실행될 수 있습니다. 호출 전에 영문자/숫자/밑줄만 허용하는 검증을 넣거나, sqlalchemy.sql.Identifier 등을 이용해 식별자를 안전하게 감싸도록 수정해 주세요.
🧰 Tools
🪛 Ruff (0.14.5)
33-33: Possible SQL injection vector through string-based query construction
(S608)
🤖 Prompt for AI Agents
In AI/libs/utils/daily_inquest.py around lines 31 to 37, the function builds SQL
by interpolating table and column names directly which permits SQL injection;
validate both `table` and `date_col` to only contain ASCII letters, digits and
underscores (reject or raise on anything else) or alternatively use SQLAlchemy
identifier escaping (wrap names as safe Identifier/quoted_name via SQLAlchemy
utilities) and then construct the SELECT using SQLAlchemy text or expression
with those safe identifiers; ensure you import the required SQLAlchemy
identifier/quoting helpers or add a simple regex whitelist check before
composing the query and keep the rest of the logic identical.
| def _to_decimal(x) -> Decimal: | ||
| if isinstance(x, Decimal): | ||
| return x | ||
| try: | ||
| return Decimal(str(x)) | ||
| except Exception: | ||
| return Decimal(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
현금 파싱 실패를 0으로 삼키면 잔고가 왜곡됩니다
Line 27에서 파싱이 실패하면 Decimal(0)을 돌려주는데, 이 상태로 계속 계산하면 실제로는 값이 비어 있거나 잘못된 행을 0원으로 처리해 버립니다. 이후 체결 로직이 0원을 기준으로 움직이면 즉시 잘못된 자산 잔고가 기록됩니다. 실패 시에는 예외를 던지거나 최소한 None을 반환해 상위 로직이 안전하게 중단되도록 바꿔 주세요.
🧰 Tools
🪛 Ruff (0.14.5)
30-30: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In AI/libs/utils/save_reports_to_db.py around lines 25 to 31, _to_decimal
currently swallows parsing errors and returns Decimal(0), which masks
bad/missing cash values; change it to raise a clear exception (e.g., ValueError
with the original error message) instead of returning Decimal(0) so upstream
logic fails fast and does not treat bad input as zero; update the function
signature/type hints accordingly and ensure callers either handle or propagate
the exception (or explicitly validate for None if you prefer returning None
instead).
| sql = text(f""" | ||
| SELECT {ASSETS_CASH_COLUMN} | ||
| FROM public.{ASSETS_TABLE} | ||
| WHERE {ASSETS_ID_COLUMN} = :rid | ||
| FOR UPDATE | ||
| """) | ||
| row = conn.execute(sql, {"rid": ASSETS_ROW_ID}).fetchone() | ||
| if not row: | ||
| return None | ||
| return _to_decimal(row[0]) | ||
|
|
||
| def build_insert_params(rows: Iterable[ReportRow], created_at: datetime) -> List[dict]: | ||
| """ | ||
| 한국어 주석: | ||
| - SQLAlchemy의 named parameter 형태(dict)로 변환. | ||
| """ | ||
| def _update_cash(conn, new_cash: Decimal) -> None: | ||
| sql = text(f""" | ||
| UPDATE public.{ASSETS_TABLE} | ||
| SET {ASSETS_CASH_COLUMN} = :cash | ||
| WHERE {ASSETS_ID_COLUMN} = :rid | ||
| """) | ||
| conn.execute(sql, {"cash": str(new_cash), "rid": ASSETS_ROW_ID}) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
환경 변수 값을 그대로 포맷팅하면 SQL 주입 취약점이 생깁니다
Lines 35-51에서 ASSETS_TABLE, ASSETS_ID_COLUMN, ASSETS_CASH_COLUMN을 f-string으로 끼워 넣고 있습니다. .env나 설정 값을 통해 이 식별자들이 오염되면 FOR UPDATE/UPDATE 문 전체가 조작될 수 있어요. 최소한 영문자·숫자·밑줄만 허용하도록 화이트리스트 검증을 거치거나 sqlalchemy.sql.identifier 계열로 안전하게 감싸 주세요.
🧰 Tools
🪛 Ruff (0.14.5)
34-39: Possible SQL injection vector through string-based query construction
(S608)
46-50: Possible SQL injection vector through string-based query construction
(S608)
🤖 Prompt for AI Agents
In AI/libs/utils/save_reports_to_db.py around lines 34 to 52, the SQL statements
interpolate ASSETS_TABLE, ASSETS_ID_COLUMN and ASSETS_CASH_COLUMN directly into
f-strings which creates an SQL injection risk if those env/config values are
tampered with; replace the direct f-string interpolation with a safe approach:
validate each identifier against a strict whitelist/regex allowing only letters,
digits and underscores (and error/raise if validation fails) or use SQLAlchemy's
identifier-quoting utilities (e.g. sqlalchemy.sql.schema.Table/Column or
quoted_name/identifier functions) to construct the SELECT ... FOR UPDATE and
UPDATE statements safely, while still binding values (rid, cash) as parameters.
| pathlib | ||
| aa No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
불필요하고 위험한 의존성은 PR에서 제거해야 합니다
pathlib은 Python 표준 라이브러리이므로 requirements에 추가하면 PyPI 백포트가 깔려 내장 모듈과 충돌할 수 있습니다. 또한 이번 PR의 어떤 코드에서도 aa 패키지를 쓰지 않는데, 목적이 불분명한 서드파티 패키지를 고정하면 공급망/빌드 실패 위험만 커집니다. 두 항목을 삭제하거나 실제 사용 근거를 제시해 주세요.
-pathlib
-aa📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pathlib | |
| aa |
🤖 Prompt for AI Agents
AI/requirements.txt around lines 14-15: remove the two entries "pathlib" and
"aa" from requirements.txt because pathlib is part of the Python stdlib
(installing a PyPI backport can conflict) and "aa" is unused and increases
supply-chain risk; if either is actually required, replace the removal by adding
a brief code-reference comment and pin a specific safe version (e.g., aa==x.y.z)
and update any import/usage locations to justify keeping it.
|
코드 재작성해야되서 따로 짜서 넣을게요 |
야후에서 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가
Summary by CodeRabbit
New Features
Refactor
Chores
✏️ Tip: You can customize this high-level summary in your review settings.