[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127

KIMSE0NG1L · 2025-11-20T09:17:48Z

야후에서 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가

Summary by CodeRabbit

New Features
- 일일 가격 데이터, 재무 정보, 거시 경제 지표 자동 수집 및 저장 기능 추가
- 보고서 저장 프로세스의 자산 관리 기능 강화
Refactor
- 모델 학습 가중치 파일 저장 경로를 날짜 기반으로 동적 변경
Chores
- 프로젝트 의존성 업데이트

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-20T09:17:58Z

개요

새로운 일일 데이터 수집 모듈을 도입하여 yfinance와 PostgreSQL을 사용해 가격, 재무제표, 거시경제 지표 데이터를 수집 및 업데이트합니다. 또한 자산 보고 유틸리티를 리팩토링하고 학습 파이프라인 설정을 업데이트합니다.

변경사항

그룹 / 파일	변경 요약
신규 일일 데이터 수집 모듈 `AI/data_ingestion/daily_ingest.py`	타임존 유틸(today_kst), DB 헬퍼(get_last_date_in_table), 가격/재무/거시 데이터 fetch-upsert 파이프라인, 통합 오케스트레이션 함수(run_all) 추가. 총 14개의 새로운 공개 함수
자산 보고 데이터베이스 유틸 `AI/libs/utils/save_reports_to_db.py`	ASSETS_* 환경 변수 기반 설정 상수 4개 추가; 소수점 정규화, 자산 현금 fetch/update 헬퍼 추가; save_reports_to_db 워크플로우 리팩토링으로 자산 상태 인식 로직 적용
의존성 `AI/requirements.txt`	새로운 의존성 "aa" 추가
학습 파이프라인 구성 `AI/transformer/training/train_transformer.py`	모델 출력 경로에 현재 날짜(YYYYMMDD) 기반 동적 이름 적용; model_out 경로를 `AI/transformer/weights/{today_str}.weights.h5`로 변경

시퀀스 다이어그램

sequenceDiagram
    participant Client
    participant run_all as run_all()
    participant Price as run_price_pipeline()
    participant Financials as run_financials_pipeline()
    participant Macro as run_macro_pipeline()
    participant YF as yfinance
    participant DB as PostgreSQL

    Client->>run_all: 시작
    run_all->>run_all: 설정 구축 (db_name, tickers, series_map)
    
    run_all->>Price: 가격 파이프라인 실행
    Price->>DB: 마지막 날짜 조회
    Price->>YF: OHLCV 데이터 요청
    YF-->>Price: 시계열 데이터 반환
    Price->>DB: price_data 테이블에 UPSERT
    
    run_all->>Financials: 재무제표 파이프라인 실행
    Financials->>YF: IS/BS/CF 데이터 요청
    YF-->>Financials: 재무 데이터 반환
    Financials->>DB: financials 테이블에 UPSERT
    
    run_all->>Macro: 거시경제 파이프라인 실행
    Macro->>DB: 마지막 날짜 조회
    Macro->>YF: 시계열 데이터 요청
    YF-->>Macro: 데이터 반환
    Macro->>DB: macro_data 테이블에 UPSERT
    
    run_all-->>Client: 완료

코드 리뷰 난이도

🎯 3 (중간) | ⏱️ ~25분

추가 검토 필요 영역:

새로운 daily_ingest.py 모듈의 데이터 정규화 로직 및 yfinance 통합 (특히 fetch_financials_from_yf의 is/bs/cf 데이터 수집 및 병합 방식)
save_reports_to_db.py의 자산 현금 잠금 및 업데이트 로직 (동시성 안전성)
재무제표 UPSERT의 키 설계 및 중복 처리
거시경제 시리즈 매핑 및 메타데이터 처리

제안하는 검토자

Kosw6

시

🐰 와! 데이터가 흐르네요
가격에서 재무, 거시까지 쭉쭉
KST 시간에 맞춰 매일 일어나는
PostgreSQL로의 멋진 여정
파이프라인 오케스트라, 우아하게~!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	PR 제목이 변경 사항의 주요 내용을 명확하게 반영합니다. 일별 주가, 재무제표, 거시데이터의 ETL 수집 기능 추가라는 핵심 변화를 정확히 설명하고 있습니다.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch SISC2-42-AI-김성일-자산가치-변화-코드-작성

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84d2797 and 21bd4b6.

📒 Files selected for processing (4)

AI/libs/utils/daily_inquest.py (1 hunks)
AI/libs/utils/save_reports_to_db.py (2 hunks)
AI/requirements.txt (1 hunks)
AI/transformer/training/train_transformer.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

AI/libs/utils/daily_inquest.py (1)

AI/libs/utils/get_db_conn.py (2)

get_db_conn (105-118)

get_engine (121-129)

AI/libs/utils/save_reports_to_db.py (1)

AI/libs/utils/get_db_conn.py (2)

get_db_conn (105-118)

get_engine (121-129)

🪛 Ruff (0.14.5)

AI/libs/utils/daily_inquest.py

33-33: Possible SQL injection vector through string-based query construction

(S608)

AI/libs/utils/save_reports_to_db.py

30-30: Do not catch blind exception: Exception

(BLE001)

34-39: Possible SQL injection vector through string-based query construction

(S608)

46-50: Possible SQL injection vector through string-based query construction

(S608)

coderabbitai · 2025-11-20T09:26:14Z

AI/libs/utils/daily_inquest.py

+    engine = get_engine(db_name)
+    with engine.connect() as conn:
+        res = conn.execute(text(f"SELECT MAX({date_col}) FROM {table};")).scalar()
+    if res is None:
+        return None
+    # res 가 date/datetime 타입이라 가정
+    return res


⚠️ Potential issue | 🟠 Major

테이블/컬럼명을 그대로 포맷팅하면 SQL 주입 위험이 있습니다
get_last_date_in_table에서 text(f"SELECT MAX({date_col}) FROM {table};")를 사용합니다. 설정 파일이나 호출부에서 table, date_col 값을 잘못 넘기면 임의 SQL이 실행될 수 있습니다. 호출 전에 영문자/숫자/밑줄만 허용하는 검증을 넣거나, sqlalchemy.sql.Identifier 등을 이용해 식별자를 안전하게 감싸도록 수정해 주세요.

🧰 Tools

🪛 Ruff (0.14.5)

33-33: Possible SQL injection vector through string-based query construction

(S608)

🤖 Prompt for AI Agents

In AI/libs/utils/daily_inquest.py around lines 31 to 37, the function builds SQL by interpolating table and column names directly which permits SQL injection; validate both `table` and `date_col` to only contain ASCII letters, digits and underscores (reject or raise on anything else) or alternatively use SQLAlchemy identifier escaping (wrap names as safe Identifier/quoted_name via SQLAlchemy utilities) and then construct the SELECT using SQLAlchemy text or expression with those safe identifiers; ensure you import the required SQLAlchemy identifier/quoting helpers or add a simple regex whitelist check before composing the query and keep the rest of the logic identical.

coderabbitai · 2025-11-20T09:26:15Z

AI/libs/utils/save_reports_to_db.py

+def _to_decimal(x) -> Decimal:
+    if isinstance(x, Decimal):
+        return x
+    try:
+        return Decimal(str(x))
+    except Exception:
+        return Decimal(0)


⚠️ Potential issue | 🟠 Major

현금 파싱 실패를 0으로 삼키면 잔고가 왜곡됩니다
Line 27에서 파싱이 실패하면 Decimal(0)을 돌려주는데, 이 상태로 계속 계산하면 실제로는 값이 비어 있거나 잘못된 행을 0원으로 처리해 버립니다. 이후 체결 로직이 0원을 기준으로 움직이면 즉시 잘못된 자산 잔고가 기록됩니다. 실패 시에는 예외를 던지거나 최소한 None을 반환해 상위 로직이 안전하게 중단되도록 바꿔 주세요.

🧰 Tools

🪛 Ruff (0.14.5)

30-30: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In AI/libs/utils/save_reports_to_db.py around lines 25 to 31, _to_decimal currently swallows parsing errors and returns Decimal(0), which masks bad/missing cash values; change it to raise a clear exception (e.g., ValueError with the original error message) instead of returning Decimal(0) so upstream logic fails fast and does not treat bad input as zero; update the function signature/type hints accordingly and ensure callers either handle or propagate the exception (or explicitly validate for None if you prefer returning None instead).

coderabbitai · 2025-11-20T09:26:15Z

AI/libs/utils/save_reports_to_db.py

+    sql = text(f"""
+        SELECT {ASSETS_CASH_COLUMN}
+        FROM public.{ASSETS_TABLE}
+        WHERE {ASSETS_ID_COLUMN} = :rid
+        FOR UPDATE
+    """)
+    row = conn.execute(sql, {"rid": ASSETS_ROW_ID}).fetchone()
+    if not row:
+        return None
+    return _to_decimal(row[0])

-def build_insert_params(rows: Iterable[ReportRow], created_at: datetime) -> List[dict]:
-    """
-    한국어 주석:
-    - SQLAlchemy의 named parameter 형태(dict)로 변환.
-    """
+def _update_cash(conn, new_cash: Decimal) -> None:
+    sql = text(f"""
+        UPDATE public.{ASSETS_TABLE}
+        SET {ASSETS_CASH_COLUMN} = :cash
+        WHERE {ASSETS_ID_COLUMN} = :rid
+    """)
+    conn.execute(sql, {"cash": str(new_cash), "rid": ASSETS_ROW_ID})
+


⚠️ Potential issue | 🔴 Critical

환경 변수 값을 그대로 포맷팅하면 SQL 주입 취약점이 생깁니다
Lines 35-51에서 ASSETS_TABLE, ASSETS_ID_COLUMN, ASSETS_CASH_COLUMN을 f-string으로 끼워 넣고 있습니다. .env나 설정 값을 통해 이 식별자들이 오염되면 FOR UPDATE/UPDATE 문 전체가 조작될 수 있어요. 최소한 영문자·숫자·밑줄만 허용하도록 화이트리스트 검증을 거치거나 sqlalchemy.sql.identifier 계열로 안전하게 감싸 주세요.

🧰 Tools

🪛 Ruff (0.14.5)

34-39: Possible SQL injection vector through string-based query construction

(S608)

46-50: Possible SQL injection vector through string-based query construction

(S608)

🤖 Prompt for AI Agents

In AI/libs/utils/save_reports_to_db.py around lines 34 to 52, the SQL statements interpolate ASSETS_TABLE, ASSETS_ID_COLUMN and ASSETS_CASH_COLUMN directly into f-strings which creates an SQL injection risk if those env/config values are tampered with; replace the direct f-string interpolation with a safe approach: validate each identifier against a strict whitelist/regex allowing only letters, digits and underscores (and error/raise if validation fails) or use SQLAlchemy's identifier-quoting utilities (e.g. sqlalchemy.sql.schema.Table/Column or quoted_name/identifier functions) to construct the SELECT ... FOR UPDATE and UPDATE statements safely, while still binding values (rid, cash) as parameters.

coderabbitai · 2025-11-20T09:26:15Z

AI/requirements.txt

+pathlib
+aa


⚠️ Potential issue | 🟠 Major

불필요하고 위험한 의존성은 PR에서 제거해야 합니다
pathlib은 Python 표준 라이브러리이므로 requirements에 추가하면 PyPI 백포트가 깔려 내장 모듈과 충돌할 수 있습니다. 또한 이번 PR의 어떤 코드에서도 aa 패키지를 쓰지 않는데, 목적이 불분명한 서드파티 패키지를 고정하면 공급망/빌드 실패 위험만 커집니다. 두 항목을 삭제하거나 실제 사용 근거를 제시해 주세요.

-pathlib -aa

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pathlib

aa

🤖 Prompt for AI Agents

AI/requirements.txt around lines 14-15: remove the two entries "pathlib" and "aa" from requirements.txt because pathlib is part of the Python stdlib (installing a PyPI backport can conflict) and "aa" is unused and increases supply-chain risk; if either is actually required, replace the removal by adding a brief code-reference comment and pin a specific safe version (e.g., aa==x.y.z) and update any import/usage locations to justify keeping it.

twq110 · 2025-11-24T07:06:15Z

코드 재작성해야되서 따로 짜서 넣을게요

twq110 and others added 4 commits November 6, 2025 17:21

[AI] SISC2-42 [FEAT] 자산가치 DB 저장 코드 작성

b3252d2

[AI] SISC-42 [docs] 시험

b406001

[AI] SISC2 - 42 [FIX] 자산가치 변화코드 작성

a7de312

[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가

21bd4b6

KIMSE0NG1L requested a review from Kosw6 as a code owner November 20, 2025 09:17

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

twq110 closed this Nov 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127

[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127

Uh oh!

KIMSE0NG1L commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

coderabbitai bot Nov 20, 2025

Uh oh!

twq110 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		pathlib
		aa No newline at end of file

[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127

[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127

Uh oh!

Conversation

KIMSE0NG1L commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

개요

변경사항

시퀀스 다이어그램

코드 리뷰 난이도

관련 가능성 있는 PR

제안하는 검토자

시

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

twq110 commented Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KIMSE0NG1L commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading