[AI] SISC2-39 [FIX] 최종 파이프라인 검증 #80

twq110 · 2025-11-05T07:59:37Z

초기 가중치 학습 완료(transformer/weights 안에 있는 initial.weights.h5)
db 연결 코드 수정
xai 리포트 저장방식 수정

Summary by CodeRabbit

리팩터링
- 데이터 흐름을 DB 중심으로 재구성하여 안정성·무결성 향상 및 대규모 데이터 처리 최적화
- 로깅·출력 방식 일부 정비 및 시간대/타임스탬프 처리 표준화
New Features
- XAI(설명형 AI) 리포트를 생성해 데이터베이스에 저장하는 기능 추가
- 학습 파이프라인에서 티커 자동 로드 및 DB 기반 데이터 소스 기본화
작업
- 개발환경 정리: 가상환경 무시 규칙 추가 및 설정 파일 인코딩 문제 수정

이제 get_db_conn() 에 db 이름만 작성하면 해당 db키를 얻을 수 있음

coderabbitai · 2025-11-05T07:59:57Z

Walkthrough

파이프라인을 DB 중심으로 재구성하고 DB 엔진/연결 유틸을 추가했으며, XAI 리포트 영속화 모듈을 도입하고 트랜스포머 입력·로깅 및 트레이닝 데이터 수집을 DB 기반으로 전환했습니다.

Changes

응집력 / 파일(들)	변경 요약
저장소 및 설정 파일 `\`.gitignore``,` `AI/configs/config.json``	`/.venv/`를 `.gitignore`에 추가; `config.json`의 숨겨진 BOM 제거
DB 연결 유틸리티 및 쿼리 *`AI/libs/utils/`** `\`AI/libs/utils/get_db_conn.py``,` `AI/libs/utils/fetch_ohlcv.py``	`get_db_conn(config)` → `get_db_conn(db_name: str)`로 변경, `get_engine(db_name)` 추가; 설정 캐시 및 SQLAlchemy URL 생성; `fetch_ohlcv` 시그니처가 `db_name: str`로 변경되고 SQLAlchemy engine 기반 쿼리/데이터 정규화로 전환
리포트 저장 모듈 `\`AI/libs/core/save_reports_to_db.py``	신규 모듈 추가: `ReportRow` 타입, `save_reports_to_db(rows, db_name)` 및 보조 함수(스키마 보장, 파라미터 빌드, utcnow 등); xai_reports 테이블에 배치 삽입 구현
핵심 파이프라인 재구성 `\`AI/libs/core/pipeline.py``	MARKET_DB_NAME / REPORT_DB_NAME / REQUIRED_LOG_COLS 상수 추가; `run_signal_transformer` 및 `run_xai_report` 시그니처/반환형 변경; Finder→Transformer→XAI→save_reports_to_db 흐름으로 재설계; `run_pipeline()` 공개 함수 추가
트랜스포머 진입점 및 모듈 *`AI/transformer/` `\`AI/transformer/main.py``,` `AI/transformer/modules/inference.py``,` `AI/transformer/modules/models.py``,` `AI/transformer/modules/init**.py``	`run_transformer(config)` → `run_transformer(weights_path)`로 변경; weights 경로를 PROJECT_ROOT 기반으로 결정(존재 여부와 무관하게 경로 할당); `inference.py`의 로깅을 `print`로 대체; `models.py`에 `type: ignore` 주석 및 반환 어노테이션 제거; 모듈 헤더 주석 수정
트레이닝 데이터 수집 재구성 `\`AI/transformer/training/train_transformer.py``	Yahoo 기반 수집 제거 → DB 기반 수집으로 전환; `load_all_tickers_from_db()` 및 `_fetch_db_ohlcv_for_tickers()` 추가; 청크 기반 쿼리 및 UTC/Asia-Seoul 타임존 처리 통합; 반환 타입 표기 정정(`Dict[str, Any]`)
기타 메타/헤더 변경 `\`AI/transformer/modules/init.py``	파일 헤더 주석 변경(인코딩/BOM 관련 흔적); 기능적 변경 없음

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Pipeline as run_pipeline()
    participant Finder
    participant Transformer
    participant XAI
    participant SaveReports as save_reports_to_db()
    participant DB

    Client->>Pipeline: 호출

    rect rgb(220,235,255)
    Pipeline->>Finder: 티커 조회 (MARKET_DB_NAME)
    Finder->>DB: 쿼리
    DB-->>Finder: 데이터
    Finder-->>Pipeline: 티커 목록
    end

    rect rgb(220,255,220)
    Pipeline->>Transformer: run_signal_transformer(tickers, db_name)
    Transformer->>DB: OHLCV 조회
    DB-->>Transformer: OHLCV
    Transformer-->>Pipeline: logs DataFrame
    end

    rect rgb(255,245,220)
    Pipeline->>XAI: run_xai_report(logs)
    XAI->>XAI: 행별 페이로드 생성·API 호출
    XAI-->>Pipeline: [(ticker,signal,price,date,report), ...]
    end

    rect rgb(235,230,255)
    Pipeline->>SaveReports: save_reports_to_db(rows, REPORT_DB_NAME)
    SaveReports->>DB: INSERT (batch)
    DB-->>SaveReports: 결과
    SaveReports-->>Pipeline: 저장 완료
    end

    Pipeline-->>Client: Optional[List[str]] 반환

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45분

추가로 검토할 항목:
- AI/libs/core/pipeline.py의 전체 제어 흐름 및 외부 호출부와의 시그니처 일관성
- AI/libs/utils/get_db_conn.py의 구성 로드·캐싱, URL 인코딩, 예외 처리
- AI/libs/core/save_reports_to_db.py의 스키마 보장(ALTER TABLE) 및 배치 트랜잭션 로직
- 트레이닝 모듈의 타임존 변환 및 청크 쿼리 정합성
- 트랜스포머 진입점의 weights_path 처리(존재하지 않는 경로 취급) 및 로그 변경 영향

Possibly related PRs

[AI] SISC2-37 [FEAT] transform 학습 파일 저장기능 구현 #71 — 동일 모듈군(파이프라인, DB 유틸, fetch_ohlcv, 트랜스포머/트레이닝)에서 유사한 시그니처·흐름 변경을 포함하여 강한 연관성 있음
[AI] SIC2-4 [FEAT] run_xai 구현 #55 — XAI 리포트 생성/연계(run_xai_report / generate_report) 관련 코드와 기능적 연관성 있음

Suggested reviewers

Kosw6

시

🐰 루트에 길을 내었네, DB로 향해
시그널은 모여들고 리포트는 쌓여가네
가중치 길 잃어도 길은 남아
테이블 위에 춤추는 행들이여
당근 하나로 축하하자 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	제목이 풀 리퀘스트의 주요 변경 사항을 포괄적으로 요약하고 있습니다. DB 연결 코드 수정, XAI 리포트 저장 방식 수정, 가중치 학습 완료 등 주요 변경 내용들이 '최종 파이프라인 검증'에 반영되어 있습니다.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch SISC2-39-AI-석재빈-최종-파이프라인-검증

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9de0e2 and 0e8f353.

📒 Files selected for processing (1)

AI/transformer/modules/__init__.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

AI/transformer/modules/init.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

AI/configs/config.json (1)
1-19: 치명적인 보안 문제: 데이터베이스 자격 증명이 하드코딩되어 있습니다.

이 파일에 데이터베이스 비밀번호가 평문으로 노출되어 있습니다 (npg_hWkg04MwGlYs, npg_lo0rC9aOyFkw). 이는 심각한 보안 취약점입니다.

즉시 다음 조치가 필요합니다:

이 파일을 .gitignore에 추가

Git 히스토리에서 민감 정보 제거 (git-filter-repo 또는 BFG Repo-Cleaner 사용)

노출된 데이터베이스 비밀번호를 즉시 변경

환경 변수 또는 AWS Secrets Manager와 같은 비밀 관리 서비스를 사용하도록 리팩토링

다음과 같이 수정하세요:

1) .gitignore에 추가:
+# 데이터베이스 설정 (민감 정보)
+AI/configs/config.json
2) config.json을 config.json.example로 변경하고 실제 값 제거:
{
  "db": {
    "host": "your-db-host",
    "user": "your-user",
    "password": "YOUR_PASSWORD_HERE",
    "dbname": "your-dbname",
    "port": 5432
  },
  "report_DB": {
    "host": "your-report-db-host",
    "user": "your-user",
    "password": "YOUR_PASSWORD_HERE",
    "dbname": "your-dbname",
    "port": 5432
  }
}
3) 환경 변수 기반으로 변경 (권장):
import os
config = {
    "db": {
        "host": os.getenv("DB_HOST"),
        "user": os.getenv("DB_USER"),
        "password": os.getenv("DB_PASSWORD"),
        "dbname": os.getenv("DB_NAME"),
        "port": int(os.getenv("DB_PORT", "5432"))
    }
}
.gitignore (1)
1-30: 치명적인 누락: config.json 파일이 .gitignore에 없습니다.

AI/configs/config.json 파일에 데이터베이스 자격 증명이 포함되어 있지만, 이 파일이 .gitignore에 추가되지 않았습니다. 이로 인해 민감한 정보가 버전 관리에 노출되고 있습니다.

다음 항목을 .gitignore에 추가하세요:
 # ===== Python (AI / Data) =====
 __pycache__/
 .ipynb_checkpoints/
 *.pyc
 /venv/
 /env
 /.vs
 /.venv/
+
+# 데이터베이스 설정 파일 (민감 정보)
+AI/configs/config.json
+AI/configs/*.json
+!AI/configs/config.json.example

🧹 Nitpick comments (4)

AI/transformer/modules/inference.py (1)
39-44: 로깅 접근 방식이 print 문으로 변경되었습니다.

구조화된 로깅 대신 print 문을 사용하면 프로덕션 환경에서 로그 수집, 필터링, 분석이 어려워집니다. Python의 표준 logging 모듈 사용을 권장합니다.

다음과 같이 리팩토링을 고려하세요:
+import logging
+
+logger = logging.getLogger(__name__)
+
 def _load_or_build_model(seq_len: int, n_features: int, weights_path: Optional[str]) -> Model:
     """가중치 로드 전용. 가중치 경로 없으면 경고 후 랜덤 초기화(추론 품질↓)."""
     model = build_transformer_classifier(seq_len, n_features)
     if weights_path:
         try:
             model.load_weights(weights_path)
-            print(f"[INFER] 가중치 로드 완료 : {weights_path}")
+            logger.info(f"가중치 로드 완료: {weights_path}")
         except Exception as e:
-            print(f"[INFER][WARN] 가중치 로드 실패 → 랜덤 초기화: {e}")
+            logger.warning(f"가중치 로드 실패 → 랜덤 초기화: {e}")
     else:
-        print("[INFER][WARN] weights_path 미지정 → 랜덤 초기화로 진행")
+        logger.warning("weights_path 미지정 → 랜덤 초기화로 진행")
     return model
AI/transformer/modules/models.py (2)
3-3: 타입 체크 억제가 추가되었습니다.

# type: ignore[reportMissingImports] 주석이 TensorFlow import 경고를 무시합니다. 이는 임시 해결책이지만, 근본적으로는 올바른 타입 스텁을 설치하는 것이 좋습니다.

TensorFlow 타입 스텁 설치를 고려하세요:
pip install tensorflow-stubs
6-6: 반환 타입 어노테이션이 제거되었습니다.

positional_encoding 함수의 반환 타입 (-> tf.Tensor)이 제거되어 타입 안정성이 약간 감소했습니다.

타입 안정성 유지를 위해 반환 타입을 복원하는 것을 고려하세요:
-def positional_encoding(maxlen: int, d_model: int):
+def positional_encoding(maxlen: int, d_model: int) -> tf.Tensor:
AI/libs/utils/fetch_ohlcv.py (1)
14-14: interval 매개변수가 현재 미사용입니다.

interval 매개변수가 현재 사용되지 않지만, 문서화된 대로 향후 확장을 위해 유지되고 있습니다. 정적 분석 경고를 방지하려면 다음을 고려하세요:
 def fetch_ohlcv(
     ticker: str,
     start: str,
     end: str,
-    interval: str = "1d",
+    interval: str = "1d",  # noqa: ARG001 - 향후 사용 예정
     db_name: str = "db",
 ) -> pd.DataFrame:
또는 더 명확하게:
def fetch_ohlcv(
    ticker: str,
    start: str,
    end: str,
    interval: str = "1d",  # TODO: 다중 interval 지원 추가 예정
    db_name: str = "db",
) -> pd.DataFrame:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcace4f and 3c62645.

⛔ Files ignored due to path filters (2)

AI/transformer/scaler/scaler.pkl is excluded by !**/*.pkl
transformer/scaler/scaler.pkl is excluded by !**/*.pkl

📒 Files selected for processing (11)

.gitignore (1 hunks)
AI/configs/config.json (1 hunks)
AI/libs/core/pipeline.py (2 hunks)
AI/libs/utils/fetch_ohlcv.py (1 hunks)
AI/libs/utils/get_db_conn.py (1 hunks)
AI/libs/utils/save_reports_to_db.py (1 hunks)
AI/transformer/main.py (2 hunks)
AI/transformer/modules/__init__.py (1 hunks)
AI/transformer/modules/inference.py (7 hunks)
AI/transformer/modules/models.py (1 hunks)
AI/transformer/training/train_transformer.py (12 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

AI/libs/utils/save_reports_to_db.py (1)

AI/libs/utils/get_db_conn.py (2)

get_db_conn (105-118)

get_engine (121-129)

AI/transformer/training/train_transformer.py (3)

AI/transformer/modules/features.py (1)

build_features (90-131)

AI/transformer/modules/models.py (1)

build_transformer_classifier (16-44)

AI/libs/utils/get_db_conn.py (1)

get_db_conn (105-118)

AI/libs/utils/fetch_ohlcv.py (1)

AI/libs/utils/get_db_conn.py (2)

get_db_conn (105-118)

get_engine (121-129)

AI/libs/core/pipeline.py (4)

AI/libs/utils/save_reports_to_db.py (1)

save_reports_to_db (65-99)

AI/libs/utils/fetch_ohlcv.py (1)

fetch_ohlcv (10-68)

AI/transformer/main.py (1)

run_transformer (12-84)

AI/xai/run_xai.py (1)

run_xai (29-33)

🪛 Ruff (0.14.3)

AI/libs/utils/save_reports_to_db.py

37-37: Multiple statements on one line (colon)

(E701)

38-38: Multiple statements on one line (colon)

(E701)

39-39: Multiple statements on one line (colon)

(E701)

40-40: Multiple statements on one line (colon)

(E701)

41-41: Multiple statements on one line (colon)

(E701)

AI/transformer/training/train_transformer.py

85-86: try-except-pass detected, consider logging the exception

(S110)

85-85: Do not catch blind exception: Exception

(BLE001)

89-89: Avoid specifying long messages outside the exception class

(TRY003)

104-104: Avoid specifying long messages outside the exception class

(TRY003)

166-166: Unused function argument: feats

(ARG001)

254-254: Avoid specifying long messages outside the exception class

(TRY003)

266-266: Avoid specifying long messages outside the exception class

(TRY003)

436-451: Possible SQL injection vector through string-based query construction

(S608)

452-452: Consider [start_date, end_date, *chunk] instead of concatenation

Replace with [start_date, end_date, *chunk]

(RUF005)

484-485: try-except-pass detected, consider logging the exception

(S110)

484-484: Do not catch blind exception: Exception

(BLE001)

514-514: Avoid specifying long messages outside the exception class

(TRY003)

AI/libs/utils/get_db_conn.py

46-46: Avoid specifying long messages outside the exception class

(TRY003)

52-52: Avoid specifying long messages outside the exception class

(TRY003)

56-56: Avoid specifying long messages outside the exception class

(TRY003)

68-68: Avoid specifying long messages outside the exception class

(TRY003)

73-73: Avoid specifying long messages outside the exception class

(TRY003)

77-77: Avoid specifying long messages outside the exception class

(TRY003)

AI/libs/utils/fetch_ohlcv.py

14-14: Unused function argument: interval

(ARG001)

AI/transformer/modules/inference.py

40-40: Do not catch blind exception: Exception

(BLE001)

AI/libs/core/pipeline.py

73-73: Do not catch blind exception: Exception

(BLE001)

112-112: Do not catch blind exception: Exception

(BLE001)

118-118: Do not catch blind exception: Exception

(BLE001)

164-164: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (5)

.gitignore (1)

30-30: Python 가상 환경 제외가 추가되었습니다.

.venv/ 디렉토리 추가가 올바릅니다.

AI/transformer/modules/inference.py (1)

40-41: 일반 Exception 캐치는 이 컨텍스트에서 적절합니다.

가중치 로딩 실패 시 여러 유형의 예외가 발생할 수 있으므로, 폴백 동작과 함께 일반 Exception을 캐치하는 것은 합리적인 접근입니다.

AI/libs/utils/get_db_conn.py (1)

1-129: 잘 구조화된 DB 연결 유틸리티 리팩토링입니다.

이 리팩토링은 다음과 같은 좋은 관행을 따릅니다:

중앙화된 설정 관리 (config.json)

설정 캐싱으로 중복 로딩 방지

SQLAlchemy 엔진에 pool_pre_ping=True 사용 (운영 안정성 향상)

URL 인코딩으로 특수 문자가 포함된 비밀번호 안전하게 처리

필수 키 검증 및 명확한 오류 메시지

정적 분석 도구가 긴 오류 메시지에 대해 경고하지만 (TRY003), 이러한 자세한 메시지는 디버깅에 유용하므로 현재 구현이 적절합니다.

참고: 이 모듈은 config.json의 민감 정보에 의존하므로, 해당 파일이 버전 관리에서 제외되어야 합니다 (config.json 리뷰 참조).

AI/transformer/main.py (1)

19-19: API 시그니처가 weights_path 기반으로 변경되었습니다.

config 딕셔너리 대신 명시적인 weights_path 매개변수를 사용하도록 변경된 것은 더 명확한 API 설계입니다.

AI/libs/utils/fetch_ohlcv.py (1)

1-68: SQLAlchemy 엔진 기반으로 잘 리팩토링되었습니다.

이 변경사항은 여러 개선을 포함합니다:

SQLAlchemy 엔진 사용으로 연결 풀링 및 안정성 향상

명명된 매개변수(:ticker, :start, :end) 사용으로 SQL 인젝션 방지

일관된 DataFrame 스키마 정규화

빈 결과에 대한 명시적 처리

coderabbitai · 2025-11-05T08:04:14Z

AI/libs/core/pipeline.py

+    #end_date = _utcnow() # 한국 시간 기준 당일 종가까지 사용, 서버 사용시 주석 해제
+    end_date = datetime.strptime("2024-10-30", "%Y-%m-%d") #임시 고정 날짜
    start_date = end_date - timedelta(days=600)


⚠️ Potential issue | 🔴 Critical

고정된 end_date 때문에 2024-10-30 이후 데이터가 모두 빠집니다.

Line 55에서 end_date를 2024-10-30으로 고정해 두면, 오늘(2025-11-05) 실행 시에도 1년 이상 지난 가격만 조회되고 최신 신호·리포트가 전혀 생성되지 않습니다. _utcnow()를 복원하거나 최소한 현재 일자를 기준으로 계산하도록 반드시 되돌려 주세요.

- #end_date = _utcnow() # 한국 시간 기준 당일 종가까지 사용, 서버 사용시 주석 해제 - end_date = datetime.strptime("2024-10-30", "%Y-%m-%d") #임시 고정 날짜 + end_date = _utcnow().replace(hour=0, minute=0, second=0, microsecond=0)

🤖 Prompt for AI Agents

In AI/libs/core/pipeline.py around lines 54 to 56, end_date is hard-coded to 2024-10-30 which excludes all post-2024-10-30 data; restore dynamic calculation by replacing the fixed date with the current UTC date (e.g., call _utcnow() or datetime.utcnow()) so end_date reflects "today" and then compute start_date as end_date - timedelta(days=600); ensure you use the same timezone/UTC helper used elsewhere (_utcnow()) to keep consistency.

추후변경예정
이슈 등록해놔

🧩 Analysis chain

@twq110, 알겠습니다! 나중에 수정하시겠다는 점 확인했습니다. 이 문제를 추적하기 위해 이슈를 등록해드리겠습니다.

🏁 Script executed:

#!/bin/bash # GitHub CLI를 사용하여 이슈 생성 gh issue create \ --repo SISC-IT/sisc-web \ --title "파이프라인 end_date 하드코딩 제거 필요" \ --body "## 설명 AI/libs/core/pipeline.py의 run_signal_transformer 함수에서 end_date가 2024-10-30으로 고정되어 있어, 최신 데이터가 조회되지 않는 문제가 있습니다. ## 위치 - 파일: AI/libs/core/pipeline.py - 라인: 54-56 ## 현재 코드 \`\`\`python #end_date = _utcnow() # 한국 시간 기준 당일 종가까지 사용, 서버 사용시 주석 해제 end_date = datetime.strptime(\"2024-10-30\", \"%Y-%m-%d\") #임시 고정 날짜 \`\`\` ## 필요한 작업 현재 UTC 날짜를 기준으로 동적으로 계산하도록 변경이 필요합니다. ## 참조 - PR: https://github.com/SISC-IT/sisc-web/pull/80 - 리뷰 코멘트: https://github.com/SISC-IT/sisc-web/pull/80#discussion_r2493423198 - 요청자: @twq110" \ --assignee twq110

Length of output: 45

@twq110, 이슈가 성공적으로 등록되었습니다! 🎫

이슈 링크: #82

이슈에 현재 상황과 필요한 작업 내용을 정리해두었고, twq110님께 할당해두었습니다. 나중에 편하실 때 수정하시면 되겠습니다!

🧬 Code Graph Analysis Results

<AI/xai/run_xai.py> (lines 29-33)

def run_xai(decision: Dict[str, Any], api_key: str) -> str: evidence = decision.pop("evidence", []) report = generate_report_from_yf(decision, evidence, api_key) return report

<AI/libs/utils/get_db_conn.py> (lines 105-118)

def get_db_conn(db_name: str): """ - psycopg2 로우 커넥션 생성(직접 커서 열어 사용할 때) - pandas 경고가 싫다면 read_sql에는 get_engine() 사용을 권장 """ cfg = _get_db_config(db_name) return psycopg2.connect( host=cfg["host"], user=cfg["user"], password=cfg["password"], dbname=cfg["dbname"], port=int(cfg.get("port", 5432)), sslmode=cfg.get("sslmode", None), # 필요 시 자동 적용 )

<AI/libs/utils/save_reports_to_db.py> (lines 65-99)

def save_reports_to_db(rows: List[ReportRow], db_name: str) -> int: """ 한국어 주석: - SQLAlchemy로 안전하게 INSERT. - pandas 경고 제거, 커넥션 관리 자동화, 프리핑으로 죽은 커넥션 방지. """ if not rows: print("[INFO] 저장할 리포트가 없습니다.") return 0 engine = get_engine(db_name) ensure_table_schema(engine) created_at = utcnow() params = build_insert_params(rows, created_at) if not params: print("[WARN] 유효한 저장 파라미터가 없어 INSERT를 건너뜁니다.") return 0 insert_sql = text(""" INSERT INTO public.xai_reports (ticker, signal, price, date, report, created_at) VALUES (:ticker, :signal, :price, :date, :report, :created_at) """) inserted = 0 # 대량이면 청크 분할 권장 CHUNK = 1000 with engine.begin() as conn: for i in range(0, len(params), CHUNK): batch = params[i:i+CHUNK] conn.execute(insert_sql, batch) inserted += len(batch) print(f"--- {inserted}개의 XAI 리포트가 데이터베이스에 저장되었습니다. ---") return inserted

<AI/finder/main.py> (lines 17-72)

def run_finder(): """ 전체 프로세스를 조율하여 최종 Top 3 투자 종목 반환 """ # --- 1단계: 의존성 객체 및 데이터 준비 --- try: llm = get_ollama_client() # ✅ 헬스체크 및 모델 확인 포함 except Exception as e: print(str(e)) return [] csv_path = os.path.join(project_root, "data", "stability_score_2025.csv") try: stability_df = pd.read_csv(csv_path) except FileNotFoundError: print(f"오류: {csv_path} 파일을 찾을 수 없습니다.") return [] # --- 2단계: 주간 뉴스 데이터 수집 및 요약 --- try: weekly_news_df = news_processing.get_weekly_news_summary(days=5, llm_client=llm) except requests.exceptions.ConnectionError as e: print(f"[LLM 연결 오류] 뉴스 요약 단계에서 LLM 서버 연결 실패: {e}") return [] except requests.exceptions.Timeout as e: print(f"[LLM 타임아웃] 뉴스 요약 단계에서 응답 지연: {e}") return [] except Exception as e: print(f"[예기치 못한 오류] 뉴스 요약 단계: {e}") return [] if weekly_news_df is None or getattr(weekly_news_df, "empty", False): print("분석할 뉴스 데이터가 없어 프로세스를 종료합니다.") return [] # --- 3단계: 뉴스 데이터와 재무 데이터를 기반으로 Top 3 종목 선정 --- try: top_3_tickers = ticker_selector.select_top_stocks( news_summary_df=weekly_news_df, stability_df=stability_df, llm_client=llm ) except requests.exceptions.ConnectionError as e: print(f"[LLM 연결 오류] 종목 선정 단계에서 LLM 서버 연결 실패: {e}") return [] except requests.exceptions.Timeout as e: print(f"[LLM 타임아웃] 종목 선정 단계에서 응답 지연: {e}") return [] except Exception as e: print(f"[예기치 못한 오류] 종목 선정 단계: {e}") return [] print("\n🎉 [Finder 모듈 최종 결과] 투자 추천 Top 3 종목 🎉") print(top_3_tickers) return top_3_tickers

<AI/libs/utils/fetch_ohlcv.py> (lines 10-68)

def fetch_ohlcv( ticker: str, start: str, end: str, interval: str = "1d", db_name: str = "db", ) -> pd.DataFrame: """ 특정 티커, 날짜 범위의 OHLCV 데이터를 DB에서 불러오기 (SQLAlchemy 엔진 사용) Args: ticker (str): 종목 코드 (예: "AAPL") start (str): 시작일자 'YYYY-MM-DD' (inclusive) end (str): 종료일자 'YYYY-MM-DD' (inclusive) interval (str): 데이터 간격 ('1d' 등) - 현재 테이블이 일봉만 제공하면 무시됨 db_name (str): get_engine()가 참조할 설정 블록 이름 (예: "db", "report_DB") Returns: pd.DataFrame: 컬럼 = [ticker, date, open, high, low, close, adjusted_close, volume] (date 컬럼은 pandas datetime으로 변환됨) """ # 1) SQLAlchemy engine 얻기 ( configs/config.json 기준 ) engine = get_engine(db_name) # 2) 쿼리: named parameter(:ticker 등) 사용 -> 안전하고 가독성 좋음 # - interval 분기가 필요하면 테이블/파티션 구조에 따라 쿼리를 분기하도록 확장 가능 query = text(""" SELECT ticker, date, open, high, low, close, adjusted_close, volume FROM public.price_data WHERE ticker = :ticker AND date BETWEEN :start AND :end ORDER BY date; """) # 3) DB에서 읽기 (with 문으로 커넥션 자동 정리) with engine.connect() as conn: df = pd.read_sql( query, con=conn, # 꼭 키워드 인자로 con=conn params={"ticker": ticker, "start": start, "end": end}, # 튜플 X, 딕셔너리 O ) # 4) 후처리: 컬럼 정렬 및 date 타입 통일 if df is None or df.empty: # 빈 DataFrame이면 일관된 컬럼 스키마로 반환 return pd.DataFrame(columns=["ticker", "date", "open", "high", "low", "close", "adjusted_close", "volume"]) # date 컬럼을 datetime으로 변경 (UTC로 맞추고 싶으면 pd.to_datetime(..., utc=True) 사용) if "date" in df.columns: df["date"] = pd.to_datetime(df["date"]) # 선택: 컬럼 순서 고정 (일관성 유지) desired_cols = ["ticker", "date", "open", "high", "low", "close", "adjusted_close", "volume"] # 존재하는 컬럼만 가져오기 cols_present = [c for c in desired_cols if c in df.columns] df = df.loc[:, cols_present] return df

<AI/transformer/main.py> (lines 12-84)

def run_transformer( *, finder_df: pd.DataFrame, seq_len: int, pred_h: int, raw_data: pd.DataFrame, run_date: Optional[str] = None, weights_path: Optional[str] = None, interval: str = "1d", ) -> Dict[str, pd.DataFrame]: """ Parameters ---------- finder_df : pd.DataFrame ['ticker'] 컬럼 포함. Finder 단계에서 선정된 추론 대상 종목 목록. seq_len : int 모델 입력 시퀀스 길이(예: 64). pred_h : int 예측 지평(예: 5). 라벨링/정책 기준(로그, 가중치 산정 보조)에 쓰이며 추론 확률 계산 자체에는 직접 관여하지 않음. raw_data : pd.DataFrame OHLCV 시계열. 필수 컬럼: ['ticker','open','high','low','close','volume', ('ts_local' or 'date')] run_date : Optional[str] 'YYYY-MM-DD' 형식. 지정 시, 해당 날짜(포함)까지의 데이터만 사용해 추론. 미지정 시, Asia/Seoul 기준 당일 종가까지 사용. config : Optional[dict] config["transformer"]["model_path"] 에 학습된 가중치 경로가 존재해야 함. 예) {"transformer": {"model_path": "artifacts/transformer_cls.h5"}} (추후 추론 방식 옵션이 늘어나면 이 dict 에 플래그/파라미터를 확장하세요.) interval : str 캔들 간격 표기(로그용). 예: '1d', '1h' 등. Returns ------- Dict[str, pd.DataFrame] {"logs": DataFrame} 형식. 컬럼: ["ticker","date","action","price","weight", "feature1","feature2","feature3","prob1","prob2","prob3"] Notes ----- - 이 래퍼는 '이름/시그니처의 안정성' 확보가 목적입니다. 내부 추론 엔진이 변경되어도 외부 호출부 수정 없이 교체가 가능합니다. """ # 1) weights_path 경로지정 PROJECT_ROOT = Path(__file__).resolve().parents[1] weights_dir = PROJECT_ROOT / "transformer" / "weights" candidate = weights_dir / "initial.weights.h5" weights_path = str(candidate) if candidate.exists(): print(f"[TRANSFORMER] weights_path 설정됨: {weights_path}") if not weights_path: print("[TRANSFORMER][WARN] weights_path 미설정 → 가중치 없이 랜덤 초기화로 추론될 수 있음(품질 저하).") print(" config 예시: {'transformer': {'weights_path': 'weights/initial.weights.h5'}}") # 2) 실제 추론 실행(모듈 위임) return run_inference( finder_df=finder_df, raw_data=raw_data, seq_len=seq_len, pred_h=pred_h, weights_path=weights_path, # ★ 학습 가중치 경로 전달 run_date=run_date, interval=interval, )

coderabbitai · 2025-11-05T08:04:14Z

AI/libs/utils/save_reports_to_db.py

+    with engine.begin() as conn:
+        cols = conn.execute(text("""
+            SELECT column_name FROM information_schema.columns
+            WHERE table_schema='public' AND table_name='xai_reports';
+        """)).fetchall()
+        existing = {r[0] for r in cols}
+        need = {"ticker", "signal", "price", "date", "report", "created_at"}
+        missing = need - existing
+        if missing:
+            parts = []
+            if "ticker" in missing: parts.append("ADD COLUMN IF NOT EXISTS ticker varchar(20) NOT NULL")
+            if "signal" in missing: parts.append("ADD COLUMN IF NOT EXISTS signal varchar(10) NOT NULL")
+            if "price" in missing:  parts.append("ADD COLUMN IF NOT EXISTS price numeric(10,2) NOT NULL")
+            if "date" in missing:   parts.append("ADD COLUMN IF NOT EXISTS date date NOT NULL")
+            if "report" in missing: parts.append("ADD COLUMN IF NOT EXISTS report text")
+            if "created_at" in missing:
+                parts.append("ADD COLUMN IF NOT EXISTS created_at timestamptz NOT NULL DEFAULT now()")
+            conn.execute(text(f"ALTER TABLE public.xai_reports {', '.join(parts)};"))
+


⚠️ Potential issue | 🔴 Critical

테이블이 없으면 ALTER TABLE이 즉시 실패합니다.

새 환경에서 public.xai_reports 테이블이 전혀 없는 상태로 실행하면 Line 44의 ALTER TABLE이 UndefinedTable 예외를 던져 파이프라인이 바로 중단됩니다. 정보 스키마 조회는 빈 결과만 돌려주기 때문에 현재 코드로는 테이블 생성이 전혀 이뤄지지 않습니다. CREATE TABLE IF NOT EXISTS를 먼저 실행해 기본 스키마를 만들어 준 뒤 누락 컬럼을 추가하도록 수정해 주세요.

with engine.begin() as conn: + conn.execute(text(""" + CREATE TABLE IF NOT EXISTS public.xai_reports ( + ticker varchar(20) NOT NULL, + signal varchar(10) NOT NULL, + price numeric(10,2) NOT NULL, + date date NOT NULL, + report text, + created_at timestamptz NOT NULL DEFAULT now() + ); + """)) cols = conn.execute(text(""" SELECT column_name FROM information_schema.columns WHERE table_schema='public' AND table_name='xai_reports'; """)).fetchall()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with engine.begin() as conn:

cols = conn.execute(text("""

SELECT column_name FROM information_schema.columns

WHERE table_schema='public' AND table_name='xai_reports';

""")).fetchall()

existing = {r[0] for r in cols}

need = {"ticker", "signal", "price", "date", "report", "created_at"}

missing = need - existing

if missing:

parts = []

if "ticker" in missing: parts.append("ADD COLUMN IF NOT EXISTS ticker varchar(20) NOT NULL")

if "signal" in missing: parts.append("ADD COLUMN IF NOT EXISTS signal varchar(10) NOT NULL")

if "price" in missing: parts.append("ADD COLUMN IF NOT EXISTS price numeric(10,2) NOT NULL")

if "date" in missing: parts.append("ADD COLUMN IF NOT EXISTS date date NOT NULL")

if "report" in missing: parts.append("ADD COLUMN IF NOT EXISTS report text")

if "created_at" in missing:

parts.append("ADD COLUMN IF NOT EXISTS created_at timestamptz NOT NULL DEFAULT now()")

conn.execute(text(f"ALTER TABLE public.xai_reports {', '.join(parts)};"))

with engine.begin() as conn:

conn.execute(text("""

CREATE TABLE IF NOT EXISTS public.xai_reports (

ticker varchar(20) NOT NULL,

signal varchar(10) NOT NULL,

price numeric(10,2) NOT NULL,

date date NOT NULL,

report text,

created_at timestamptz NOT NULL DEFAULT now()

);

"""))

cols = conn.execute(text("""

SELECT column_name FROM information_schema.columns

WHERE table_schema='public' AND table_name='xai_reports';

""")).fetchall()

existing = {r[0] for r in cols}

need = {"ticker", "signal", "price", "date", "report", "created_at"}

missing = need - existing

if missing:

parts = []

if "ticker" in missing: parts.append("ADD COLUMN IF NOT EXISTS ticker varchar(20) NOT NULL")

if "signal" in missing: parts.append("ADD COLUMN IF NOT EXISTS signal varchar(10) NOT NULL")

if "price" in missing: parts.append("ADD COLUMN IF NOT EXISTS price numeric(10,2) NOT NULL")

if "date" in missing: parts.append("ADD COLUMN IF NOT EXISTS date date NOT NULL")

if "report" in missing: parts.append("ADD COLUMN IF NOT EXISTS report text")

if "created_at" in missing:

parts.append("ADD COLUMN IF NOT EXISTS created_at timestamptz NOT NULL DEFAULT now()")

conn.execute(text(f"ALTER TABLE public.xai_reports {', '.join(parts)};"))

🧰 Tools

🪛 Ruff (0.14.3)

37-37: Multiple statements on one line (colon)

(E701)

38-38: Multiple statements on one line (colon)

(E701)

39-39: Multiple statements on one line (colon)

(E701)

40-40: Multiple statements on one line (colon)

(E701)

41-41: Multiple statements on one line (colon)

(E701)

🤖 Prompt for AI Agents

In AI/libs/utils/save_reports_to_db.py around lines 27 to 45, the ALTER TABLE will fail if the public.xai_reports table does not exist; before querying/altering columns run a CREATE TABLE IF NOT EXISTS public.xai_reports (...) with the minimal required columns and defaults (ticker, signal, price, date, report, created_at) to ensure the table exists, then re-query information_schema and only run ALTER TABLE to add any missing columns; keep these statements inside the same engine.begin() context to preserve atomicity and avoid raising UndefinedTable.

coderabbitai · 2025-11-05T08:04:14Z

AI/transformer/main.py

+    weights_path = str(candidate)
+    if candidate.exists():
+
+            print(f"[TRANSFORMER] weights_path 설정됨: {weights_path}")

    if not weights_path:
-        _log("[TRANSFORMER][WARN] weights_path 미설정 → 가중치 없이 랜덤 초기화로 추론될 수 있음(품질 저하).")
-        _log("  config 예시: {'transformer': {'weights_path': 'weights/inital.weights.h5'}}")
+        print("[TRANSFORMER][WARN] weights_path 미설정 → 가중치 없이 랜덤 초기화로 추론될 수 있음(품질 저하).")
+        print("  config 예시: {'transformer': {'weights_path': 'weights/initial.weights.h5'}}")


⚠️ Potential issue | 🟠 Major

논리 오류: weights_path가 무조건 설정되어 경고가 실행되지 않습니다.

Line 65에서 weights_path = str(candidate)로 무조건 설정되므로, Line 70의 if not weights_path 체크는 항상 거짓입니다. 따라서 Lines 71-72의 경고 메시지가 절대 실행되지 않습니다.

또한 파일이 존재하지 않을 때 적절한 처리가 누락되었습니다.

다음과 같이 수정하세요:

PROJECT_ROOT = Path(__file__).resolve().parents[1] weights_dir = PROJECT_ROOT / "transformer" / "weights" candidate = weights_dir / "initial.weights.h5" - weights_path = str(candidate) if candidate.exists(): - - print(f"[TRANSFORMER] weights_path 설정됨: {weights_path}") - - if not weights_path: - print("[TRANSFORMER][WARN] weights_path 미설정 → 가중치 없이 랜덤 초기화로 추론될 수 있음(품질 저하).") - print(" config 예시: {'transformer': {'weights_path': 'weights/initial.weights.h5'}}") + weights_path = str(candidate) + print(f"[TRANSFORMER] weights_path 설정됨: {weights_path}") + else: + weights_path = None + print(f"[TRANSFORMER][WARN] 가중치 파일이 존재하지 않음: {candidate}") + print("[TRANSFORMER][WARN] 랜덤 초기화로 추론될 수 있음(품질 저하).") + print(" 가중치 파일을 다음 경로에 배치하세요: transformer/weights/initial.weights.h5")

🤖 Prompt for AI Agents

In AI/transformer/main.py around lines 65 to 72, the code sets weights_path = str(candidate) unconditionally so the subsequent if not weights_path check never triggers and there is no handling when the file doesn't exist; change the logic to first check whether candidate is truthy and candidate.exists(), only then set weights_path = str(candidate) and print the “[TRANSFORMER] weights_path 설정됨” message; otherwise set weights_path to None (or empty string) and print the warning lines about missing weights and example config; ensure you also handle the case candidate is None or not a Path-like object before calling exists().

너가 혼자 바꿀수 있나? 수정해줘

@twq110, 죄송하지만 저는 직접 코드를 커밋할 수 없습니다. 대신 수정된 코드를 제공해드리겠습니다. 아래 코드를 적용해주세요:

# 1) weights_path 경로지정 PROJECT_ROOT = Path(__file__).resolve().parents[1] weights_dir = PROJECT_ROOT / "transformer" / "weights" candidate = weights_dir / "initial.weights.h5" if candidate.exists(): weights_path = str(candidate) print(f"[TRANSFORMER] weights_path 설정됨: {weights_path}") else: weights_path = None print(f"[TRANSFORMER][WARN] 가중치 파일이 존재하지 않음: {candidate}") print("[TRANSFORMER][WARN] 랜덤 초기화로 추론될 수 있음(품질 저하).") print(" 가중치 파일을 다음 경로에 배치하세요: transformer/weights/initial.weights.h5")

변경 사항:

candidate.exists() 체크를 먼저 수행

파일이 존재할 때만 weights_path에 경로를 설정

파일이 없을 때는 weights_path = None으로 설정하고 경고 메시지 출력

불필요한 들여쓰기 오류 수정

이렇게 수정하면 가중치 파일이 없을 때 적절한 경고가 표시되고, run_inference에서도 None 값을 처리할 수 있습니다.

coderabbitai · 2025-11-05T08:04:14Z

AI/transformer/modules/__init__.py

@@ -1,3 +1,3 @@
-# AI/finder/__init__.py
+# AI/tranformer/modules/__init__.py


⚠️ Potential issue | 🟡 Minor

헤더 주석에 오타와 BOM 문자가 있습니다.

헤더 주석에 두 가지 문제가 있습니다:

오타: "tranformer" → "transformer"

파일 시작 부분에 숨겨진 BOM 문자가 있습니다

다음과 같이 수정하세요:

-# AI/tranformer/modules/__init__.py +# AI/transformer/modules/__init__.py

BOM 문자를 제거하려면 UTF-8 BOM 없이 저장하세요 (대부분의 에디터에서 "UTF-8" 또는 "UTF-8 without BOM"으로 설정).

🤖 Prompt for AI Agents

In AI/transformer/modules/__init__.py around line 1, the header comment has a typo ("tranformer" → "transformer") and a leading BOM character; remove the BOM by saving the file as UTF-8 without BOM and correct the header text to "AI/transformer/modules/__init__.py" (fix the spelling only in the header comment), then re-save the file ensuring no BOM remains.

twq110

ㅇㅇ

twq110 · 2025-11-06T01:21:04Z

AI/libs/core/pipeline.py

+    #end_date = _utcnow() # 한국 시간 기준 당일 종가까지 사용, 서버 사용시 주석 해제
+    end_date = datetime.strptime("2024-10-30", "%Y-%m-%d") #임시 고정 날짜
    start_date = end_date - timedelta(days=600)


추후변경예정
이슈 등록해놔

twq110 · 2025-11-06T01:28:41Z

AI/transformer/main.py

+    weights_path = str(candidate)
+    if candidate.exists():
+
+            print(f"[TRANSFORMER] weights_path 설정됨: {weights_path}")

    if not weights_path:
-        _log("[TRANSFORMER][WARN] weights_path 미설정 → 가중치 없이 랜덤 초기화로 추론될 수 있음(품질 저하).")
-        _log("  config 예시: {'transformer': {'weights_path': 'weights/inital.weights.h5'}}")
+        print("[TRANSFORMER][WARN] weights_path 미설정 → 가중치 없이 랜덤 초기화로 추론될 수 있음(품질 저하).")
+        print("  config 예시: {'transformer': {'weights_path': 'weights/initial.weights.h5'}}")


너가 혼자 바꿀수 있나? 수정해줘

twq110 added 6 commits November 1, 2025 23:01

[AI] SISC2-39 [FiX] 가중치 파일 경로 오타 수정

30d75f1

[AI] SISC2-39 [REFACTOR] db 연결방식 수정

17d8dfb

이제 get_db_conn() 에 db 이름만 작성하면 해당 db키를 얻을 수 있음

[AI] SISC2-39 [FEAT] 초기 가중치 학습

8e4bf76

[AI] SISC2-39 [FIX] db 연결방식 수정

b5d04b6

[AI] SISC2-39 [FIX] xai 리포트 문자열에서 튜플로 변경

67b5935

[AI] SISC2-39 [FIX] 오타수정

3c62645

twq110 requested a review from Kosw6 as a code owner November 5, 2025 07:59

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

discipline24 and others added 2 commits November 5, 2025 17:28

Merge branch 'main' into SISC2-39-AI-석재빈-최종-파이프라인-검증

d9de0e2

[AI] SISC2-39 [FIX] 오타수정

0e8f353

twq110 commented Nov 6, 2025

View reviewed changes

twq110 merged commit 9e68e24 into main Nov 6, 2025
1 check passed

twq110 deleted the SISC2-39-AI-석재빈-최종-파이프라인-검증 branch November 6, 2025 01:43

coderabbitai bot mentioned this pull request Nov 6, 2025

파이프라인 end_date 하드코딩 제거 필요 #82

Open

twq110 changed the title ~~Sisc2 39 ai 석재빈 최종 파이프라인 검증~~ [AI] SISC2-39 [FIX] 최종 파이프라인 검증 Nov 6, 2025

coderabbitai bot mentioned this pull request Nov 6, 2025

[AI] SISC2-40 [FIX] inference 오류 수정 #83

Merged

coderabbitai bot mentioned this pull request Nov 13, 2025

[AI] Sisc2 46 ai 자산배분 학습 코드 작성 #101

Merged

coderabbitai bot mentioned this pull request Nov 20, 2025

[AI] SISC - 42 [FEAT] 일별 주가·재무제표·거시데이터 ETL 수집 기능 추가 #127

Closed

coderabbitai bot mentioned this pull request Jan 16, 2026

Refact/ai part refactoring #180

Merged

		@@ -1,3 +1,3 @@
		# AI/finder/__init__.py
		# AI/tranformer/modules/__init__.py

[AI] SISC2-39 [FIX] 최종 파이프라인 검증 #80

[AI] SISC2-39 [FIX] 최종 파이프라인 검증 #80

Uh oh!

Conversation

twq110 commented Nov 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

시

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twq110 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twq110 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

twq110 left a comment

Choose a reason for hiding this comment

Uh oh!

twq110 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

twq110 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

twq110 commented Nov 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

coderabbitai bot Nov 5, 2025 •

edited

Loading

coderabbitai bot Nov 5, 2025 •

edited

Loading