[AI] SISC-185 [FEAT] 강화학습 기본코드 작성 by twq110 · Pull Request #186 · SISC-IT/sisc-web

twq110 · 2026-01-17T04:19:10Z

포트폴리오 조정과 매매를 강화학습을 적용했습니다.
차후 고도화 및 테스트를 통해 매매수익률을 높이도록 바꿀 예정

Summary by CodeRabbit

새 기능
- 강화학습 기반 거래 에이전트 및 학습 스크립트 추가
- 단일 심볼 및 포트폴리오 백테스트 실행 기능 추가(플롯·리포트 포함)
- Gym 기반 거래 환경 추가
버그 수정
- 데이터 정규화·NaN/무한값 처리 강화
- 조정 종가(adjusted_close) 통합 및 누락 시 보완 처리
- 기술지표(RSI 등) 계산 시 경계값 처리 보강
리팩터링
- 트레이더·전략·백테스트 패키지의 공개 API 정리 및 재구성
- 시뮬레이터 초기 잔고 및 보상/상태 반환 로직 개선

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-17T04:19:20Z

Walkthrough

조정된 종가(adjusted_close) 처리 강화, 단일/포트폴리오 백테스트 재구성(Backtrader), 강화학습 환경 및 PPO 학습 스크립트 추가, 전략 클래스 정리/신규 도입, 시그널 지표 계산 흐름과 시뮬레이터 보상·초기화 동작 일부 수정.

Changes

Cohort / File(s)	변경 사항 요약
데이터 로더 및 지표 `AI/libs/database/fetcher.py`, `AI/modules/signal/core/features.py`	날짜 컬럼을 항상 datetime으로 변환; adjusted_close가 있으면 NaN 보정 또는 close로 합성; 최종 컬럼은 존재하는 원하는 열만 순서 유지하여 선택; 지표 계산은 adjusted_close 우선 적용, 무한값/NaN 정리 및 RSI 보호 로직 추가
백테스트 패키지 내보내기 `AI/modules/trader/backtest/__init__.py`	포트폴리오/단일 백테스터 실행기 함수 재내보내기(`run_portfolio_backtest`, `run_single_backtest`)
포트폴리오 백테스트 추가/교체 `AI/modules/trader/backtest/run_portfolio.py`, removed `.../run_portfolio.py`	지연 모델 로딩, 동적 피처 결정, 포트폴리오 할당 계산과 rebalancing 수행하는 `AIPortfolioStrategy` 및 `run_backtest()` 추가; 이전 전체 모듈 삭제
단일 자산 백테스트 추가/교체 `AI/modules/trader/backtest/run_backtrader_single.py`, removed `.../run_backtrader_single.py`	AIScoreObserver, TransformerWalkForwardStrategy, `run_single_backtest()` 등 단일자산 워크포워드/모델 기반 백테스터 추가; 이전 구현 파일 삭제
코어·전략 패키지 초기화 `AI/modules/trader/core/__init__.py`, `AI/modules/trader/strategies/__init__.py`	TradingAccount, Simulator, RuleBasedStrategy, calculate_portfolio_allocation 등을 패키지 수준에서 재내보내기
전략 구현 변경 `AI/modules/trader/strategies/rule_based.py`, `AI/modules/trader/strategies/rl_agent.py`	RuleBasedStrategy 클래스화(단순 score 기반 액션 반환); PPO 모델 기반 `RLAgentStrategy` 추가(모델 로드/폴백, get_action)
강화학습 환경 및 학습 스크립트 `AI/modules/trader/train/rl_env.py`, `AI/modules/trader/train/train_ppo.py`	Gymnasium 호환 StockTradingEnv 추가(관찰/행동 매핑, step/reset); Stable-Baselines3 PPO 학습 스크립트(train_agent) 및 모델 저장
시뮬레이터/계정 변경 `AI/modules/trader/core/simulator.py`	Simulator 생성자에 initial_balance 인자 추가 및 account 초기화/재설정에 사용; step 보상 계산에서 이전 자산 존재성·0 나눔 방어 및 반환값(next_state 또는 None) 변경
의존성 업데이트 `AI/requirements.txt`	`stable-baselines3`, `shimmy`, `gymnasium` 등 추가

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant train_ppo as train_ppo()
    participant Env as StockTradingEnv
    participant Loader as SignalDataLoader
    participant Sim as Simulator
    participant PPO as PPO Model
    participant Storage as ModelStorage

    User->>train_ppo: train_agent()
    train_ppo->>Env: __init__()
    Env->>Loader: load historical data
    Loader-->>Env: OHLCV DataFrame
    train_ppo->>PPO: create MlpPolicy / learn()
    
    loop Episode
        PPO->>Env: reset()
        Env->>Sim: __init__ / reset
        Sim-->>Env: initial observation
        loop Step
            PPO->>Env: step(action)
            Env->>Sim: execute mapped action
            Sim-->>Env: reward, done
            Env-->>PPO: observation, reward, done
        end
    end

    train_ppo->>Storage: save model
    Storage-->>User: model saved

sequenceDiagram
    actor User
    participant run_backtest as run_backtest()
    participant Loader as SignalDataLoader
    participant TI as add_technical_indicators
    participant Strategy as AIPortfolioStrategy
    participant Model as TransformerModel (lazy)
    participant Alloc as calculate_portfolio_allocation
    participant Cerebro as Backtrader

    User->>run_backtest: run_backtest()
    run_backtest->>Loader: load data for tickers
    Loader-->>run_backtest: OHLCV per ticker
    run_backtest->>TI: apply indicators
    TI-->>run_backtest: features added
    run_backtest->>Cerebro: initialize and attach Strategy
    loop On each bar
        Cerebro->>Strategy: next()
        Strategy->>Model: lazy load on first need
        Model-->>Strategy: predict scores
        Strategy->>Alloc: compute target weights
        Alloc-->>Strategy: target weights
        Strategy->>Cerebro: order_target_percent per ticker
    end
    run_backtest->>User: print final metrics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

[기능추가][AI] RL(강화학습) 코드 추가 #185: 강화학습 구성요소(StockTradingEnv, RLAgent, 학습 스크립트) 추가는 해당 이슈의 목표와 직접적으로 연결됩니다.

Possibly related PRs

Refact/ai part refactoring #180: 데이터 로더 및 지표(adjusted_close 처리) 변경과 features 통합 부분에서 코드 수준 연관성이 높습니다.

Suggested reviewers

Kosw6

Poem

🐇 들판에 뛰어든 작은 토끼가 말하네,
조정된 종가로 길을 그렸고,
백테스터는 춤추며 포지션을 맞추고,
RL은 배워서 결정을 속삭이네,
코드 밭에 당근 하나, 모두의 디버그에 행운을!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.93% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	PR 제목은 강화학습 기본코드 작성이라는 주요 변경사항을 명확하게 요약하고 있으며, 변경 사항의 핵심을 잘 전달합니다.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@AI/modules/trader/backtest/run_portfolio.py`:
- Around line 120-130: The condition that prevents continuing when the model or
features failed initialization is wrong: change the guard in run_portfolio from
using logical AND to logical OR so it returns whenever either self.model or
self.feature_columns is None; locate the check that currently reads "if
self.model is None and self.feature_columns is None: return" and update it to
"if self.model is None or self.feature_columns is None: return" to avoid calling
calculate_portfolio_allocation with a missing model (references:
_initialize_model, self.model, self.feature_columns,
calculate_portfolio_allocation).

In `@AI/modules/trader/strategies/rl_agent.py`:
- Around line 20-25: Wrap the PPO.load(model_path) call in a try-except inside
the RL agent initialization (the code block in
AI/modules/trader/strategies/rl_agent.py that assigns self.model) so that if
PPO.load raises ValueError, KeyError, RuntimeError or other Exception you catch
it, log or print the exception details and then set self.model = None and emit
the existing fallback message (e.g., "⚠️ RL 모델 파일 없음... (랜덤 모드로 동작)"). Ensure
the try block contains self.model = PPO.load(model_path) and the except block
handles the exceptions, prints the error + context, and leaves the agent in
random mode.

In `@AI/modules/trader/train/rl_env.py`:
- Around line 19-31: The env's initial_balance isn't being propagated: update
StockTradingEnv.__init__ to pass the initial_balance into the Simulator
constructor, then modify Simulator.__init__ to accept an initial_balance
parameter (store it on self) and use that value when instantiating
TradingAccount; also update Simulator.reset to accept/use self.initial_balance
(or an optional initial_balance arg) when recreating TradingAccount so resets
respect the passed initial balance. Ensure you reference
StockTradingEnv.__init__, Simulator.__init__, Simulator.reset, and
TradingAccount when making these changes.

🧹 Nitpick comments (11)

AI/libs/database/fetcher.py (2)
14-14: interval 파라미터가 선언되었지만 사용되지 않습니다.

함수 본문에서 interval 파라미터가 실제로 사용되지 않습니다. 독스트링에는 "현재 일봉만 지원"이라고 되어 있지만, 파라미터 자체가 존재하면 다른 interval도 지원되는 것처럼 오해할 수 있습니다.

향후 확장 계획이 없다면 파라미터를 제거하거나, 유효하지 않은 interval 입력 시 에러를 발생시키는 것을 고려해 주세요.

Also applies to: 24-24

65-68: 누락된 컬럼에 대한 경고 로깅을 고려해 보세요.

현재 로직은 desired_cols에 있지만 실제 데이터에 없는 컬럼을 조용히 생략합니다. DB 스키마 변경이나 데이터 문제 발생 시 디버깅이 어려울 수 있습니다.
♻️ 선택적 개선안: 누락 컬럼 로깅 추가
+    import logging
+    logger = logging.getLogger(__name__)
+
     # 컬럼 순서 정리
     desired_cols = ["ticker", "date", "open", "high", "low", "close", "adjusted_close", "volume"]
     cols_present = [c for c in desired_cols if c in df.columns]
+    missing_cols = set(desired_cols) - set(cols_present)
+    if missing_cols:
+        logger.warning(f"Missing columns in fetched data for {ticker}: {missing_cols}")
     df = df.loc[:, cols_present]
AI/modules/signal/core/features.py (1)

73-76: 데이터 정제 전략에 대한 참고 사항

현재 bfill() 사용 시 ma60의 경우 첫 59개 행이 60번째 행의 값으로 채워집니다. 이는 초기 데이터에 인위적인 패턴을 만들 수 있습니다.

강화학습 학습 시 이 초기 구간을 제외하거나, 더 긴 warm-up 기간을 설정하는 것을 고려해 볼 수 있습니다. 현재 방식도 일반적인 접근법이므로 필수 변경은 아닙니다.
AI/modules/trader/backtest/run_backtrader_single.py (1)
121-129: decision['amount'] 값이 사용되지 않습니다.

RuleBasedStrategy.get_action()은 amount: 0.99를 반환하지만, 실제 매수 시 하드코딩된 0.95를 사용합니다. 일관성을 위해 decision['amount']를 활용하는 것이 좋습니다.
♻️ 제안된 수정
         if decision['type'] == 'BUY':
-            # 보유 현금의 95%만큼 매수 계산 (Backtrader 로직)
             cash = self.broker.get_cash()
             price = self.datas[0].close[0]
-            # 수수료 고려하여 안전하게 계산
-            size = int((cash * 0.95) / price)
+            # decision['amount']를 활용하여 매수 비율 결정
+            buy_ratio = decision['amount'] * 0.95  # 수수료 버퍼 적용
+            size = int((cash * buy_ratio) / price)
             if size > 0:
                 self.log(f"BUY 신호 (Score: {score:.2f})")
                 self.order = self.buy(size=size)
AI/modules/trader/backtest/run_portfolio.py (3)
89-93: 모호한 변수명 l을 수정해주세요.

변수 l은 숫자 1과 혼동될 수 있습니다. 가독성을 위해 low로 변경하는 것이 좋습니다.
♻️ 제안된 수정
                 o = d.open.get(ago=0, size=fetch_len)
                 h = d.high.get(ago=0, size=fetch_len)
-                l = d.low.get(ago=0, size=fetch_len)
+                low = d.low.get(ago=0, size=fetch_len)
                 c = d.close.get(ago=0, size=fetch_len)
                 v = d.volume.get(ago=0, size=fetch_len)
                 
                 if len(o) < fetch_len: continue

-                df = pd.DataFrame({'open': o, 'high': h, 'low': l, 'close': c, 'volume': v})
+                df = pd.DataFrame({'open': o, 'high': h, 'low': low, 'close': c, 'volume': v})
115-116: 예외 발생 시 로깅을 추가하세요.

현재 예외가 조용히 무시되어 디버깅이 어려울 수 있습니다. 최소한 경고 로그를 남기는 것이 좋습니다.
♻️ 제안된 수정
-            except Exception:
+            except Exception as e:
+                print(f"⚠️ {ticker} 데이터 처리 실패: {e}")
                 continue
125-130: 사용되지 않는 변수 scores에 언더스코어 접두사를 붙이세요.

scores 변수가 사용되지 않습니다. 의도적으로 무시하는 경우 _scores로 변경하면 린터 경고를 방지할 수 있습니다.
♻️ 제안된 수정
-        target_weights, scores = calculate_portfolio_allocation(
+        target_weights, _scores = calculate_portfolio_allocation(
             data_map=data_map,
             model=self.model,
             feature_columns=self.feature_columns,
             config=self.p.strategy_config
         )
AI/requirements.txt (1)

17-20: RL 의존성 호환성 관리를 개선해주세요.

stable-baselines3, shimmy, gymnasium 추가 시 버전 호환성이 중요합니다. Stable-Baselines3 v2.x는 Gymnasium을 필수로 사용하며, Shimmy v2.0+는 Gymnasium 1.x와 호환됩니다. 현재 버전 제약이 없어 환경마다 다른 버전이 설치될 수 있습니다.

재현 가능한 빌드를 위해 다음 중 하나를 권장합니다:

pip-compile 또는 Poetry를 사용하여 requirements-lock.txt 또는 poetry.lock 생성 (모든 전이 의존성 포함)

또는 최소 버전 제약 추가: stable-baselines3>=2.0, gymnasium>=1.0, shimmy>=2.0 (필요시)
AI/modules/trader/train/train_ppo.py (2)
13-17: sys.path 직접 수정은 배포/실행 환경에서 충돌 위험이 있습니다.

모듈 실행 또는 패키지 설치 방식으로 전환해 경로 주입을 피하는 편이 안정적입니다.

21-55: 학습 파라미터/저장 경로를 함수 인자로 분리하면 재사용성이 좋아집니다.

현재는 티커/기간/스텝/저장 경로가 하드코딩이라 실험 반복에 불편이 있습니다. 기본값은 유지하되 인자화하는 편이 안전합니다.
♻️ 예시 수정안
-def train_agent():
+def train_agent(
+    ticker="AAPL",
+    start_date="2020-01-01",
+    end_date="2023-12-31",
+    total_timesteps=50_000,
+    save_dir=None,
+):
     print("🚀 [RL] PPO 트레이딩 에이전트 학습 시작")
-    
-    ticker = "AAPL" # 학습할 종목
-    start_date = "2020-01-01"
-    end_date = "2023-12-31"
+    # 학습 대상/기간 기본값은 인자로 관리
     
     # 1. 환경 생성 (벡터화된 환경)
     env = DummyVecEnv([lambda: StockTradingEnv(ticker, start_date, end_date)])
@@
-    total_timesteps = 50_000
     print(f"   - 학습 기간: {start_date} ~ {end_date}")
     print(f"   - 총 스텝: {total_timesteps}")
@@
-    save_dir = os.path.join(project_root, "AI", "data", "weights")
+    save_dir = save_dir or os.path.join(project_root, "AI", "data", "weights")
AI/modules/trader/train/rl_env.py (1)

17-18: metadata에 ClassVar 타입 힌트 추가를 고려해 주세요.

정적 분석 경고(RUF012)를 줄이고 의도를 명확히 할 수 있습니다.

coderabbitai · 2026-01-17T04:24:14Z

AI/modules/trader/backtest/run_portfolio.py

+        # 모델 로드 실패 시 중단 방지
+        if self.model is None and self.feature_columns is None:
+             return
+
+        # 2. 전략 코어 호출
+        target_weights, scores = calculate_portfolio_allocation(
+            data_map=data_map,
+            model=self.model,
+            feature_columns=self.feature_columns,
+            config=self.p.strategy_config
+        )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's explore the repository structure find . -type f -name "run_portfolio.py" | head -20

Repository: SISC-IT/sisc-web

Length of output: 105

🏁 Script executed:

#!/bin/bash # List files in the repository to understand structure git ls-files | grep -E "(trader|portfolio)" | head -30

Repository: SISC-IT/sisc-web

Length of output: 617

🏁 Script executed:

#!/bin/bash # Find Python files in AI/modules directory fd -e py "trader" | head -20

Repository: SISC-IT/sisc-web

Length of output: 139

🏁 Script executed:

#!/bin/bash # Read the run_portfolio.py file to understand the actual code wc -l AI/modules/trader/backtest/run_portfolio.py

Repository: SISC-IT/sisc-web

Length of output: 107

🏁 Script executed:

#!/bin/bash # Read the entire run_portfolio.py file cat -n AI/modules/trader/backtest/run_portfolio.py

Repository: SISC-IT/sisc-web

Length of output: 9474

🏁 Script executed:

#!/bin/bash # Read the portfolio_logic.py file to see calculate_portfolio_allocation wc -l AI/modules/trader/strategies/portfolio_logic.py

Repository: SISC-IT/sisc-web

Length of output: 110

🏁 Script executed:

#!/bin/bash # Read portfolio_logic.py to understand calculate_portfolio_allocation cat -n AI/modules/trader/strategies/portfolio_logic.py

Repository: SISC-IT/sisc-web

Length of output: 3792

🏁 Script executed:

#!/bin/bash # Verify the exact condition and context around line 121-122 sed -n '104,130p' AI/modules/trader/backtest/run_portfolio.py | cat -n

Repository: SISC-IT/sisc-web

Length of output: 1232

🏁 Script executed:

#!/bin/bash # Double-check the _initialize_model return behavior sed -n '43,71p' AI/modules/trader/backtest/run_portfolio.py | cat -n

Repository: SISC-IT/sisc-web

Length of output: 1340

121-122번 줄의 조건 로직 오류: AND를 OR로 수정해야 합니다.

_initialize_model이 예외 발생 시 None을 반환하면 self.model은 None이지만 self.feature_columns는 이미 설정됩니다(107번 줄). 현재 121-122번 줄의 and 조건으로는 두 조건이 모두 None일 때만 반환하므로, 모델이 None이고 피처 컬럼이 설정된 경우 검사를 통과하여 125번 줄에서 None 모델을 calculate_portfolio_allocation에 전달합니다.

calculate_portfolio_allocation이 63-68번 줄에서 if model: 검사로 None 모델을 처리하므로 즉각적인 실패는 피해집니다만, 조건 로직이 의도와 맞지 않습니다.

if self.model is None or self.feature_columns is None: return

🧰 Tools

🪛 Ruff (0.14.13)

125-125: Unpacked variable scores is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🤖 Prompt for AI Agents

In `@AI/modules/trader/backtest/run_portfolio.py` around lines 120 - 130, The condition that prevents continuing when the model or features failed initialization is wrong: change the guard in run_portfolio from using logical AND to logical OR so it returns whenever either self.model or self.feature_columns is None; locate the check that currently reads "if self.model is None and self.feature_columns is None: return" and update it to "if self.model is None or self.feature_columns is None: return" to avoid calling calculate_portfolio_allocation with a missing model (references: _initialize_model, self.model, self.feature_columns, calculate_portfolio_allocation).

AI/modules/trader/strategies/rl_agent.py

AI/modules/trader/train/rl_env.py

Add initial balance parameter to Simulator instantiation

twq110 added 2 commits January 17, 2026 13:14

[AI] SISC-185 [FEAT] RL 기본코드 작성 및 조정종가 코드 수정

1b56751

[AI] SISC-185 [FEAT] RL 학습코드작성

c086151

twq110 requested a review from Kosw6 as a code owner January 17, 2026 04:19

coderabbitai bot reviewed Jan 17, 2026

View reviewed changes

twq110 added 2 commits January 17, 2026 14:17

Update Simulator initialization with initial balance

0f7c59c

Add initial balance parameter to Simulator instantiation

[AI] SISC-185 [FIX] 초기자금 초기화 버그 수정

97c8a08

twq110 merged commit c701355 into main Jan 17, 2026
1 check passed

This was referenced Jan 24, 2026

Feat/sisc 185 ai transformer model upgrade #194

Merged

[AI] SISC-195 계산 프로세스 피쳐 코드 수정 리펙토링 #201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI] SISC-185 [FEAT] 강화학습 기본코드 작성#186

[AI] SISC-185 [FEAT] 강화학습 기본코드 작성#186
twq110 merged 4 commits intomainfrom
feat/SISC-185-AI-RL-trading-code

twq110 commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

twq110 commented Jan 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

twq110 commented Jan 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 17, 2026 •

edited

Loading