Skip to content

[AI] [FEAT]: Kaggle 기반 자동 학습 파이프라인 구축#344

Open
vmgfh878-art wants to merge 7 commits intomainfrom
20260326-#343-AI-kaggle-pipeline
Open

[AI] [FEAT]: Kaggle 기반 자동 학습 파이프라인 구축#344
vmgfh878-art wants to merge 7 commits intomainfrom
20260326-#343-AI-kaggle-pipeline

Conversation

@vmgfh878-art
Copy link

@vmgfh878-art vmgfh878-art commented Mar 26, 2026

개요

운영 서버 메모리 2GB 한계로 불가능했던 딥러닝 모델 학습을
Kaggle GPU를 활용해 자동화한 파이프라인 구축

Closes #343

변경사항

추가

  • AI/scripts/ 파이프라인 스크립트 5개
  • AI/pipelines/weekly_routine.py
  • .github/workflows/train.yml
  • 4개 모델 train_kaggle.py
  • AI/kaggle_notebooks/ 4개 모델 노트북

수정

  • PatchTST/wrapper.py (get_signals 추가, module. 제거)
  • TCN/wrapper.py (module. 제거)
  • artifact_paths.py (PatchTST 저장 경로 수정)

Summary by CodeRabbit

  • 새로운 기능

    • 주간 자동 모델 재훈련 파이프라인(정기 스케줄 + 수동 실행) 추가
    • TCN, iTransformer, PatchTST, Transformer용 Kaggle 학습 노트북 및 독립 실행형 학습 스크립트 추가
    • 학습 가중치 자동 다운로드 및 원격 서버 배포 자동화 도구 추가
  • 기타

    • AI 데이터/모델 산출물 관련 .gitignore 항목 확장
    • 모델 가중치 로드 호환성 개선(병렬 저장 형식 지원) 및 학습/배포 흐름 안정성 개선

@vmgfh878-art vmgfh878-art self-assigned this Mar 26, 2026
@vmgfh878-art vmgfh878-art added the AI AI 이슈 label Mar 26, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

주간 자동 학습·배포 파이프라인과 Kaggle 기반 모델 학습/배포용 스크립트·노트북·워크플로우를 추가합니다. DB→Parquet 추출, Kaggle 데이터셋 업로드, 노트북 푸시/상태 폴링, 가중치 다운로드, SSH/SCP로 서버 배포 흐름을 구현합니다.

Changes

Cohort / File(s) Summary
워크플로우
​.github/workflows/train.yml
주간(cron) 및 수동(workflow_dispatch) 트리거 가능한 GitHub Actions 워크플로우 추가; Python 3.10/의존성 설치, Kaggle 자격증명 설정, AI/pipelines/weekly_routine.py 실행 및 --skip-extract/--skip-upload 입력 처리.
파이프라인·오케스트레이션
AI/pipelines/weekly_routine.py, AI/scripts/extract_to_parquet.py, AI/scripts/upload_to_kaggle.py, AI/scripts/trigger_training.py, AI/scripts/download_weights.py, AI/scripts/deploy_to_server.py
단계별 실행·실패 중단 로직 추가: DB→Parquet 추출(SSH 터널/재시도), Kaggle 데이터셋 버전 업로드, 노트북 푸시 및 상태 폴링(타임아웃/에러 처리), 커널 출력(가중치) 다운로드, 원격 서버로 SCP 전송. SSH/비밀키·타임아웃·오류 핸들링 주의.
Kaggle 노트북 및 메타데이터
AI/kaggle_notebooks/patchtst/notebook.ipynb, AI/kaggle_notebooks/patchtst/kernel-metadata.json, AI/kaggle_notebooks/transformer/notebook.ipynb, AI/kaggle_notebooks/transformer/kernel-metadata.json, AI/kaggle_notebooks/itransformer/notebook.ipynb, AI/kaggle_notebooks/itransformer/kernel-metadata.json, AI/kaggle_notebooks/TCN/notebook.ipynb, AI/kaggle_notebooks/TCN/kernel-metadata.json
PatchTST/TCN/Transformer/iTransformer 용 Kaggle 노트북 및 kernel-metadata 추가. 외부 repo 클론, 동적 import, 종속 데이터셋 선언, GPU/인터넷 설정 포함.
모델별 Kaggle 학습 스크립트
AI/modules/signal/models/PatchTST/train_kaggle.py, AI/modules/signal/models/TCN/train_kaggle.py, AI/modules/signal/models/itransformer/train_kaggle.py, AI/modules/signal/models/transformer/transformer_train_kaggle.py
Parquet 기반 전처리·시퀀스 빌드·모델 정의·학습 루프(체크포인트, 스케일러 및 메타데이터 저장) 구현. 데이터 경로·스케일러 일관성·시드·장치 설정 검토 필요.
모델 래퍼 변경
AI/modules/signal/models/PatchTST/wrapper.py, AI/modules/signal/models/TCN/wrapper.py
DataParallel로 저장된 state_dictmodule. 접두사 제거 로직 추가. PatchTST get_signals 시그니처 단순화 및 불필요한 대체 구현 제거.
아티팩트 경로·무시 규칙
AI/modules/signal/core/artifact_paths.py, .gitignore
PatchTST 아티팩트 기본명 변경(patchtst_model.pt, patchtst_scaler.pkl) 및 AI 데이터/모델 관련 파일(AI/data/kaggle_data/, *.parquet, *.pt, *.keras, *.pkl, kaggle.json, .kaggle/) 무시 규칙 추가.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as "GitHub Actions"
    participant Pipeline as "weekly_routine.py"
    participant DB as "PostgreSQL (SSH 터널)"
    participant Parquet as "Parquet 저장소\n(AI/data/kaggle_data)"
    participant Kaggle as "Kaggle (Datasets/Kernels)"
    participant Weights as "로컬 가중치\n(AI/data/weights)"
    participant Server as "프로덕션 서버\n(SSH/SCP)"

    GHA->>Pipeline: 트리거 (cron / workflow_dispatch)
    Pipeline->>DB: extract_to_parquet.py 실행 (SSH 터널 열기)
    DB-->>Parquet: Parquet 파일 생성
    Pipeline->>Kaggle: upload_to_kaggle.py 실행 (datasets version)
    Pipeline->>Kaggle: trigger_training.py 실행 (kernels push + 상태 폴링)
    Kaggle-->>Weights: 커널 실행 후 가중치 생성/출력
    Pipeline->>Kaggle: download_weights.py 실행 (kaggle kernels output)
    Kaggle-->>Weights: 가중치 로컬 다운로드
    Pipeline->>Server: deploy_to_server.py 실행 (SSH 연결 및 SCP 전송)
    Server-->>Server: 모델 파일 갱신
    Pipeline-->>GHA: 완료 상태 반환
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • discipline24
  • Kosw6

Poem

🐰 깡충, 깡충 코드밭을 뛰며
Parquet 씨앗 심고 노트북 하늘에 별을 띄워요,
매주 한 번 모델이 자라 가중치 배달 오면
서버에 전해져 또다시 예측을 하네요,
깡충, 당근 한 입 축하해요 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed 제목이 PR의 주요 변경사항인 Kaggle 기반 자동 학습 파이프라인 구축을 명확히 설명하고 있습니다.
Linked Issues check ✅ Passed PR이 #343의 모든 주요 목표를 충족합니다. 5개 스크립트(extract_to_parquet, upload_to_kaggle, trigger_training, download_weights, deploy_to_server), weekly_routine.py, GitHub Actions 워크플로우, 4개 모델 노트북, wrapper.py 수정이 모두 구현되어 있습니다.
Out of Scope Changes check ✅ Passed 모든 변경사항이 #343의 Kaggle 기반 파이프라인 구축 목표와 직접 관련되어 있습니다. .gitignore 추가는 새로운 아티팩트 파일 무시를 위해 필요하며, artifact_paths.py 수정은 PatchTST 모델 경로 정정으로 범위 내입니다.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 20260326-#343-AI-kaggle-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (2)
AI/kaggle_notebooks/TCN/notebook.ipynb-60-62 (1)

60-62: ⚠️ Potential issue | 🟡 Minor

패키지 버전 명시 필요.

버전 미지정 시 재현성 문제가 발생할 수 있습니다. 특히 library_context에서 manifest가 numpy<2.0을 요구한다고 명시되어 있으나, 현재 설치 명령은 최신 버전(2.0+)을 설치할 수 있습니다.

🔧 버전 명시 제안
-"!pip install -q scikit-learn pandas numpy torch"
+"!pip install -q scikit-learn pandas 'numpy<2.0' torch"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/kaggle_notebooks/TCN/notebook.ipynb` around lines 60 - 62, The pip install
cell currently runs "!pip install -q scikit-learn pandas numpy torch" without
pinned versions; update that cell (the pip install command string) to pin
package versions to reproducible releases (e.g., ensure numpy is constrained to
<2.0 as required by the library_context/manifest and pin compatible versions of
scikit-learn, pandas, and torch) so the notebook installs deterministic,
manifest-compatible packages.
AI/scripts/deploy_to_server.py-76-77 (1)

76-77: ⚠️ Potential issue | 🟡 Minor

RSA 키만 지원 - Ed25519/ECDSA 키 사용 시 실패

RSAKey.from_private_key()는 RSA 키만 처리합니다. Ed25519나 ECDSA 키 사용 시 예외가 발생합니다.

♻️ 다중 키 타입 지원
-    # 환경변수로 받은 키 문자열을 파일처럼 사용
-    private_key = paramiko.RSAKey.from_private_key(io.StringIO(SSH_KEY_STR))
+    # 다양한 키 타입 지원
+    key_file = io.StringIO(SSH_KEY_STR)
+    private_key = None
+    for key_class in (paramiko.Ed25519Key, paramiko.ECDSAKey, paramiko.RSAKey):
+        try:
+            key_file.seek(0)
+            private_key = key_class.from_private_key(key_file)
+            break
+        except paramiko.SSHException:
+            continue
+    if private_key is None:
+        raise ValueError("지원되지 않는 SSH 키 형식입니다.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/scripts/deploy_to_server.py` around lines 76 - 77, The current code only
calls paramiko.RSAKey.from_private_key(io.StringIO(SSH_KEY_STR)), which fails
for Ed25519/ECDSA keys; update the logic that creates private_key to try parsing
SSH_KEY_STR with multiple Paramiko key classes (e.g.,
paramiko.RSAKey.from_private_key, paramiko.ECDSAKey.from_private_key,
paramiko.Ed25519Key.from_private_key, and paramiko.DSSKey.from_private_key)
using io.StringIO(SSH_KEY_STR), catching exceptions and falling through until
one succeeds, and raise a clear error if none parse; reference the private_key
variable and the from_private_key calls to locate where to implement this
fallback parsing.
🧹 Nitpick comments (9)
AI/kaggle_notebooks/TCN/notebook.ipynb (1)

33-42: git clone 실패 시 에러 처리 추가 권장.

subprocess.run() 결과의 returncode를 확인하지 않아 clone 실패 시 후속 셀에서 불명확한 에러가 발생할 수 있습니다.

♻️ 에러 처리 추가 제안
 "result = subprocess.run(\n",
 "    ['git', 'clone', '--depth', '1',\n",
 "     'https://github.com/SISC-IT/sisc-web.git',\n",
 "     '/kaggle/working/sisc-web'],\n",
 "    capture_output=True, text=True\n",
 ")\n",
 "print(result.stdout)\n",
-"print(result.stderr)"
+"print(result.stderr)\n",
+"if result.returncode != 0:\n",
+"    raise RuntimeError(f'Git clone 실패: {result.stderr}')"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/kaggle_notebooks/TCN/notebook.ipynb` around lines 33 - 42, The
subprocess.run call that clones the repo doesn't check for errors; update the
clone logic that uses subprocess.run and the result variable by either adding
check=True to subprocess.run or explicitly inspecting result.returncode (and
using result.stderr/result.stdout) after the call; if returncode != 0, log/print
a clear error message including result.stderr and raise an exception (e.g.,
RuntimeError or re-raise CalledProcessError) so downstream cells don't proceed
on a failed git clone.
.gitignore (1)

47-47: test_*.py 전역 ignore는 범위가 너무 넓습니다.

Line 47 때문에 저장소 어디서든 새 테스트 파일이 untracked 상태로 남습니다. 파이프라인/모델 테스트를 추가해도 CI에 못 올라갈 수 있으니, 임시 스크립트가 필요한 특정 디렉터리로 한정하는 편이 안전합니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore at line 47, 현재 .gitignore의 전역 패턴 "test_*.py"가 저장소 전체의 테스트 파일을 무시해
CI에 올라가지 못할 위험이 있습니다; .gitignore에서 "test_*.py" 항목을 제거하거나 범위를 좁혀 특정 디렉터리로 한정(예:
"scripts/test_*.py" 또는 "dev_tools/test_*.py" 등)하도록 수정하고, 변경한 패턴이 의도한 위치에서만 무시되는지
확인한 뒤 커밋하세요; 참조 대상: .gitignore의 "test_*.py" 패턴을 찾아 교체하거나 삭제하십시오.
AI/modules/signal/models/PatchTST/wrapper.py (1)

191-193: get_signals() 시그니처는 기본 계약과 맞춰 두는 편이 좋습니다.

지금도 호출은 되지만, BaseSignalModel과 다른 시그니처가 되면서 override/type-checking 신뢰도가 떨어집니다. 다른 래퍼들과 동일하게 ticker_id, sector_id를 명시적으로 유지해 두는 쪽이 유지보수에 안전합니다.

예시 수정
-    def get_signals(self, df, **kwargs):
+    def get_signals(self, df: pd.DataFrame, ticker_id: int = 0, sector_id: int = 0) -> Dict[str, float]:
         """BaseSignalModel 추상 메서드 구현 - predict()로 위임"""
         return self.predict(df)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/PatchTST/wrapper.py` around lines 191 - 193, Update
the get_signals method to match the BaseSignalModel contract by adding the
explicit ticker_id and sector_id parameters (e.g., def get_signals(self, df,
ticker_id=None, sector_id=None, **kwargs)) while keeping the current behavior of
delegating to predict; ensure the method accepts those parameters and forwards
the dataframe to self.predict(df) (and any necessary kwargs) so
overrides/type-checking remain consistent with other wrappers and
BaseSignalModel.
AI/modules/signal/models/itransformer/train_kaggle.py (1)

326-327: 불필요한 int() 호출

len()은 이미 정수를 반환합니다.

-        "n_train_samples": int(len(X_train)),
-        "n_val_samples"  : int(len(X_val)),
+        "n_train_samples": len(X_train),
+        "n_val_samples"  : len(X_val),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/itransformer/train_kaggle.py` around lines 326 -
327, The dict entries for "n_train_samples" and "n_val_samples" wrap len(...)
with an unnecessary int() call; remove the int() so use "n_train_samples":
len(X_train) and "n_val_samples": len(X_val) (locate these keys in the training
metadata construction in train_kaggle.py).
.github/workflows/train.yml (2)

58-63: Kaggle 자격증명 JSON 생성 시 특수문자 처리 고려

시크릿 값에 따옴표나 특수문자가 포함되면 JSON이 깨지거나 셸 인젝션이 발생할 수 있습니다. jq를 사용하면 더 안전합니다.

🛡️ 개선 제안
      - name: Setup Kaggle credentials
        run: |
          mkdir -p ~/.kaggle
-         echo '{"username":"${{ secrets.KAGGLE_USERNAME }}","key":"${{ secrets.KAGGLE_KEY }}"}' \
-           > ~/.kaggle/kaggle.json
+         jq -n --arg user "${{ secrets.KAGGLE_USERNAME }}" --arg key "${{ secrets.KAGGLE_KEY }}" \
+           '{"username": $user, "key": $key}' > ~/.kaggle/kaggle.json
          chmod 600 ~/.kaggle/kaggle.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/train.yml around lines 58 - 63, Replace the fragile echo
that constructs the Kaggle JSON with a safe JSON builder: use jq (or printf
piped to jq -n --arg) to create the JSON object from the GitHub secrets and
write it to ~/.kaggle/kaggle.json, then set chmod 600; specifically update the
"Setup Kaggle credentials" step to pass ${{ secrets.KAGGLE_USERNAME }} and ${{
secrets.KAGGLE_KEY }} as jq arguments (instead of interpolating them into a raw
string) to avoid JSON corruption or shell injection.

31-31: GitHub Actions 버전 업데이트 권장

Static analysis에서 actions/checkout@v3actions/setup-python@v4가 오래된 버전이라고 경고합니다. 최신 버전으로 업데이트하세요.

♻️ 권장 수정
      - name: Checkout repository
-       uses: actions/checkout@v3
+       uses: actions/checkout@v4

      # ────────────────────────────────────────
      # 2. Python 환경 세팅
      # ────────────────────────────────────────
      - name: Setup Python
-       uses: actions/setup-python@v4
+       uses: actions/setup-python@v5

Also applies to: 37-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/train.yml at line 31, Update the GitHub Actions steps that
reference actions/checkout@v3 and actions/setup-python@v4 to their current
stable releases: locate the `uses: actions/checkout@v3` and `uses:
actions/setup-python@v4` lines and bump them to the latest recommended major
versions (e.g., actions/checkout@v4 and actions/setup-python@v5 or the most
recent stable tags), then commit the change and run a workflow lint/validation
to ensure compatibility with any inputs you rely on.
AI/modules/signal/models/PatchTST/train_kaggle.py (2)

185-194: 예외 처리 개선: 예외 체인 유지 및 구체적 예외 타입 사용

raise ... from e를 사용하여 원본 예외 정보를 보존하세요. 디버깅 시 근본 원인 파악에 도움됩니다.

♻️ 개선 제안
         try:
             df_t = add_technical_indicators(df_t)
             df_t = add_multi_timeframe_features(df_t)
             processed.append(df_t)
-        except Exception as e:
+        except (ValueError, KeyError, TypeError) as e:
             fail_count += 1
             print(f"\n[경고] {ticker} 피처 계산 실패 ({fail_count}/{fail_limit}): {e}")
             if fail_count >= fail_limit:
-                raise RuntimeError(f"피처 계산 실패가 {fail_limit}개를 초과했습니다.")
+                raise RuntimeError(f"피처 계산 실패가 {fail_limit}개를 초과했습니다.") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/PatchTST/train_kaggle.py` around lines 185 - 194,
The except block in the try around add_technical_indicators and
add_multi_timeframe_features swallows the original exception; update the handler
in train_kaggle.py to re-raise a RuntimeError using "raise RuntimeError(... )
from e" so the exception chain is preserved and include the original exception
message, and consider narrowing the except to specific exceptions thrown by
add_technical_indicators/add_multi_timeframe_features (e.g.,
ValueError/TypeError) instead of a bare Exception; keep the existing
fail_count/fail_limit logic and logging but ensure the re-raise uses "from e".

321-321: 불필요한 f-string 접두사

플레이스홀더가 없는 f-string입니다.

-    print(f"\n>> 완료")
+    print("\n>> 완료")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/PatchTST/train_kaggle.py` at line 321, Replace the
unnecessary f-string in the final print statement: locate the print call in
train_kaggle.py that currently uses print(f"\n>> 완료") and change it to a normal
string print (remove the leading "f") so it reads print("\n>> 완료"), leaving the
rest of the function and surrounding code unchanged.
AI/pipelines/weekly_routine.py (1)

30-30: 사용되지 않는 import: traceback

traceback 모듈이 import되었지만 사용되지 않습니다.

♻️ 제거
 import subprocess
-import traceback
 from datetime import datetime
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/pipelines/weekly_routine.py` at line 30, Remove the unused import of the
traceback module: delete the "import traceback" line in
AI/pipelines/weekly_routine.py (the unused symbol is traceback) so there are no
unused imports left; run linters to confirm no other unused imports remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/kaggle_notebooks/itransformer/notebook.ipynb`:
- Around line 73-90: The current logic may load and run train_kaggle.py twice
and attempts a fallback using a non-existent path; replace the exec/open(...)
path with a single importlib-based loader: check that train_script exists and if
not raise/print and exit immediately, otherwise use
importlib.util.spec_from_file_location('train_kaggle', train_script),
module_from_spec(spec), spec.loader.exec_module(module) and call module.train();
remove the exec(open(...).read()) and the erroneous else branch that reuses the
missing path so training is loaded and invoked exactly once via module.train().

In `@AI/kaggle_notebooks/patchtst/notebook.ipynb`:
- Around line 36-41: Remove the second and third notebook cells that copy/load
non-existent files and keep the first cell's clone + sys.path approach; instead
import PatchTST_Model and the feature functions directly from the cloned repo
(from AI.modules.signal.models.PatchTST.architecture import PatchTST_Model and
from AI.modules.features.legacy.technical_features import
add_technical_indicators, add_multi_timeframe_features), set any needed env vars
(e.g., PARQUET_DIR, WEIGHTS_DIR), and if you must run the training script
dynamically, load the repository's train_kaggle.py via
importlib.util.spec_from_file_location pointing to
/kaggle/working/sisc-web/AI/modules/signal/models/PatchTST/train_kaggle.py and
invoke its train() entry point instead of copying non-existent files; also
ensure kernel_sources in kernel-metadata.json is populated or left empty
consistently with this import strategy.

In `@AI/kaggle_notebooks/TCN/notebook.ipynb`:
- Around line 69-79: Replace the exec-based loading of train_script with an
importlib-based dynamic module import: when train_script exists, create a module
spec from that path (use importlib.util.spec_from_file_location and
importlib.util.module_from_spec), load the module via spec.loader.exec_module,
then call the exported train_model_kaggle function from the loaded module (guard
the call with hasattr/module attribute check and raise or log a clear error if
train_model_kaggle is missing), and remove the open(...).read() pattern so no
file descriptor is left open; keep the existing fallback print for the else
branch.

In `@AI/kaggle_notebooks/transformer/notebook.ipynb`:
- Around line 48-51: The code sets os.environ before importing or defining os
which causes a NameError; add an import os at the top of the cell (before any
use of os.environ) so the os module is available; locate the cell containing the
os.environ assignments (references to os.environ and the symbol os) and insert
import os as the first statement in that cell.
- Around line 36-39: 이 셀에서 발생하는 NameError/ModuleNotFoundError를 고치려면 먼저 shutil를
import하고 기술 지표 모듈을 올바른 경로로 가져오세요: 현재 코드의 from technical_features import
add_technical_indicators, add_multi_timeframe_features 대신 이미 사용중인
AI.modules.features.legacy.technical_features에서 add_technical_indicators와
add_multi_timeframe_features를 import하거나 기존 import(예:
AI.modules.features.legacy.technical_features)를 재사용하도록 수정하여 모듈 경로를 정확히 참조하고
shutil.copy 호출 전에 shutil이 import되었는지 확인하세요.

In `@AI/modules/signal/core/artifact_paths.py`:
- Around line 104-110: The resolver is returning lowercase "patchtst" paths
while training/saving uses "PatchTST", causing mismatched directories on
case-sensitive filesystems; update the handling in
AI/modules/signal/core/artifact_paths.py so that when normalized_model ==
"patchtst" you resolve to the same cased directory and filenames used by the
trainer (use "PatchTST" for resolved_model_dir and the matching artifact
filenames), i.e. change the resolved_model_dir, model_path and scaler_path
construction in the ModelArtifactPaths return so they reference "PatchTST" and
the exact saved filenames produced by AI/modules/signal/models/PatchTST/train.py
(use the same casing and basename as the saver).

In `@AI/modules/signal/models/itransformer/train_kaggle.py`:
- Around line 160-171: The scaler is being refit per ticker inside the for
ticker, group in df.groupby("ticker") loop (using scaler.fit_transform when
fit_scaler=True), causing leakage and inconsistent statistics; instead, if
fit_scaler is True, call scaler.fit once on the full dataset (e.g., on
df[available_feats].values.astype(np.float32)) before entering the loop, then
inside the loop only call scaler.transform on feat_vals; if fit_scaler is False,
ensure scaler is already fitted and continue using scaler.transform for
feat_vals (reference symbols: df.groupby("ticker"), available_feats, feat_vals,
fit_scaler, scaler.fit_transform, scaler.transform).

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py`:
- Around line 201-203: best_state is currently assigned a shallow reference to
model.state_dict(), so it gets mutated during later training and ends up
containing the final weights; replace assignments that save the best model
(where best_state = model.state_dict()) with a deep copy: import copy and set
best_state = copy.deepcopy(model.state_dict()) so best_state holds an
independent snapshot of parameters at the best validation loss.

In `@AI/modules/signal/models/TCN/wrapper.py`:
- Around line 217-233: The batch-prep loop uses get_standard_training_data()
directly which bypasses the same preprocessing used by predict(); update the
loop over ticker_data_map in the TCN wrapper to call
_prepare_dataframe(df.copy()) (or the same helper used by predict()) to perform
required column validation and inf/NaN cleaning before extracting
self.feature_columns, then proceed with seq_len checks, scaling
(self.scaler.transform) and appending to tensor_list and valid_tickers; ensure
you still handle exceptions and preserve the existing behavior around seq_len,
feature_columns, scaler, tensor_list and valid_tickers so batch and single-shot
prediction use identical preprocessing.

In `@AI/modules/signal/models/transformer/transformer_train_kaggle.py`:
- Around line 71-73: The saved artifact names in transformer_train_kaggle.py
('model_name' and 'scaler_name') use unsuffixed filenames which will not match
the resolver conventions in AI/modules/signal/core/artifact_paths.py (which
expects names like multi_horizon_model_prod.keras /
multi_horizon_scaler_prod.pkl or *_test for tests); update the 'model_name' and
'scaler_name' values to include the proper suffix (e.g., append _prod) so they
match the resolver lookup scheme and ensure the stored files can be found during
download/deploy.
- Around line 149-167: The current shuffle-based train_test_split after calling
loader.create_dataset causes temporal leakage; instead, perform a chronological
split on full_df before creating sequences so the scaler is fit only on training
data and validation windows are strictly later. Concretely: split full_df by a
date cutoff into train_df and val_df (no random shuffle), call
loader.create_dataset(train_df, ...) to produce X_ts_train, X_ticker_train,
X_sector_train, y_train and fit the scaler there, then call
loader.create_dataset(val_df, ..., scaler=trained_scaler or with a flag to only
transform) to produce X_ts_val, X_tick_val, X_sec_val, y_val; remove the later
train_test_split(...) shuffle=True usage and ensure horizons / n_outputs logic
remains unchanged.

In `@AI/pipelines/weekly_routine.py`:
- Around line 155-172: The script currently ignores run_weekly_pipeline()'s
result so the process always exits 0; capture its return or exception in the
__main__ block and propagate failure by exiting with a non-zero code: call
result = run_weekly_pipeline(...), then if result indicates failure (e.g., False
or a non-zero status) call sys.exit(1) or re-raise the caught exception;
alternatively wrap run_weekly_pipeline(...) in try/except and on exception log
the error and call sys.exit(1). Ensure you import sys if needed and reference
the existing run_weekly_pipeline call in the __main__ block.

In `@AI/scripts/deploy_to_server.py`:
- Around line 89-112: The deploy_model function currently always returns True
even if files are missing or transfer errors occur; update deploy_model to track
failures by creating a failed list/local flag: for each fname in model['files']
check os.path.exists(local_path) and append fname to failed if missing, wrap
scp.put(local_path, remote_path) in a try/except to append fname on exception,
and at the end return False if failed is non-empty (and optionally log which
filenames failed) otherwise return True; refer to deploy_model, model['files'],
local_path, remote_path, os.path.exists, scp.put and the printed messages when
implementing.
- Around line 73-74: Replace the unsafe AutoAddPolicy usage by enforcing host
key verification: instead of
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()), set
ssh.set_missing_host_key_policy(paramiko.RejectPolicy()) and load the known host
key from a secret (e.g. SSH_HOST_KEY) into the client before connecting; read
SSH_HOST_KEY from the environment/CI secret, parse it and add it with
ssh.get_host_keys().add(hostname, key_type, key_data) (or use
SSHClient.load_host_keys when writing the secret to a temp known_hosts file) so
the code around SSHClient() and set_missing_host_key_policy(...) performs strict
verification.

In `@AI/scripts/download_weights.py`:
- Around line 30-40: The download script currently can return success without
validating downloaded files, risking stale weights being reused; update the
logic in download_weights.py (referencing MODELS) to download each model into a
temporary directory first, verify presence and exact names of the expected files
(the same filenames deploy_to_server.py expects), only after validation
atomically replace/rsync into the model's dst_dir (empty/backup the dst_dir
before move), and fail/return False if any expected file is missing or names
differ; apply the same change to the code block around lines 61-80 that handles
additional models.

In `@AI/scripts/extract_to_parquet.py`:
- Around line 178-184: The SELECT in read_sql_safe currently omits two features
used downstream (correlation_spike and surprise_cpi), causing a mismatch with
AI/modules/signal/models/itransformer/train_kaggle.py; fix by adding
correlation_spike and surprise_cpi to the SELECT projection from
macroeconomic_indicators in extract_to_parquet.py (ensure read_sql_safe and
resulting DataFrame include these columns so they persist into the output
parquet).
- Around line 123-132: The retry loop leaks DB connections because conn.close()
is only called on success; update the loop around get_conn() and
pd.read_sql(query, conn) (inside the retry in extract_to_parquet.py) to always
close the connection on each attempt — either by using a try/finally that calls
conn.close() if conn is not None or by acquiring the connection with a context
manager — so every opened connection from get_conn() is closed even when
pd.read_sql throws.

In `@AI/scripts/trigger_training.py`:
- Around line 34-44: NOTEBOOKS entries use project_root joined to
"kaggle_notebooks/..." but the repo layout places notebooks under
AI/kaggle_notebooks, so update the NOTEBOOKS constant (the dicts for "PatchTST"
and "Transformer") to include the missing "AI" path segment when building
notebook_dir (e.g., include "AI" as an extra path component in os.path.join for
each notebook_dir) so trigger_notebook() can locate the folders.
- Around line 94-117: In the status-polling loop (the code that calls
subprocess.run(["kaggle", "kernels", "status", slug], ...) inside the
trigger_training flow), check result.returncode immediately after subprocess.run
and treat any non-zero return code as a hard failure: log/print the error
details (result.stderr or result.stdout), return False, and stop polling instead
of sleeping and retrying; keep the existing checks for
"complete"/"error"/"running" only when returncode == 0 to avoid unnecessary
5-minute retries (follow the same returncode handling pattern used in
trigger_notebook()).

In `@AI/scripts/upload_to_kaggle.py`:
- Around line 34-39: The subprocess.run call in upload_to_kaggle.py references
an undefined variable KAGGLE_DATA_DIR causing a NameError; change that argument
to use the existing OUTPUT_DIR (or rename OUTPUT_DIR to KAGGLE_DATA_DIR
consistently) so the call in subprocess.run([... "-p", KAGGLE_DATA_DIR, ...])
uses the defined upload directory; update any other occurrences to keep the
variable name consistent and ensure the kaggle upload command runs.

---

Minor comments:
In `@AI/kaggle_notebooks/TCN/notebook.ipynb`:
- Around line 60-62: The pip install cell currently runs "!pip install -q
scikit-learn pandas numpy torch" without pinned versions; update that cell (the
pip install command string) to pin package versions to reproducible releases
(e.g., ensure numpy is constrained to <2.0 as required by the
library_context/manifest and pin compatible versions of scikit-learn, pandas,
and torch) so the notebook installs deterministic, manifest-compatible packages.

In `@AI/scripts/deploy_to_server.py`:
- Around line 76-77: The current code only calls
paramiko.RSAKey.from_private_key(io.StringIO(SSH_KEY_STR)), which fails for
Ed25519/ECDSA keys; update the logic that creates private_key to try parsing
SSH_KEY_STR with multiple Paramiko key classes (e.g.,
paramiko.RSAKey.from_private_key, paramiko.ECDSAKey.from_private_key,
paramiko.Ed25519Key.from_private_key, and paramiko.DSSKey.from_private_key)
using io.StringIO(SSH_KEY_STR), catching exceptions and falling through until
one succeeds, and raise a clear error if none parse; reference the private_key
variable and the from_private_key calls to locate where to implement this
fallback parsing.

---

Nitpick comments:
In @.github/workflows/train.yml:
- Around line 58-63: Replace the fragile echo that constructs the Kaggle JSON
with a safe JSON builder: use jq (or printf piped to jq -n --arg) to create the
JSON object from the GitHub secrets and write it to ~/.kaggle/kaggle.json, then
set chmod 600; specifically update the "Setup Kaggle credentials" step to pass
${{ secrets.KAGGLE_USERNAME }} and ${{ secrets.KAGGLE_KEY }} as jq arguments
(instead of interpolating them into a raw string) to avoid JSON corruption or
shell injection.
- Line 31: Update the GitHub Actions steps that reference actions/checkout@v3
and actions/setup-python@v4 to their current stable releases: locate the `uses:
actions/checkout@v3` and `uses: actions/setup-python@v4` lines and bump them to
the latest recommended major versions (e.g., actions/checkout@v4 and
actions/setup-python@v5 or the most recent stable tags), then commit the change
and run a workflow lint/validation to ensure compatibility with any inputs you
rely on.

In @.gitignore:
- Line 47: 현재 .gitignore의 전역 패턴 "test_*.py"가 저장소 전체의 테스트 파일을 무시해 CI에 올라가지 못할 위험이
있습니다; .gitignore에서 "test_*.py" 항목을 제거하거나 범위를 좁혀 특정 디렉터리로 한정(예:
"scripts/test_*.py" 또는 "dev_tools/test_*.py" 등)하도록 수정하고, 변경한 패턴이 의도한 위치에서만 무시되는지
확인한 뒤 커밋하세요; 참조 대상: .gitignore의 "test_*.py" 패턴을 찾아 교체하거나 삭제하십시오.

In `@AI/kaggle_notebooks/TCN/notebook.ipynb`:
- Around line 33-42: The subprocess.run call that clones the repo doesn't check
for errors; update the clone logic that uses subprocess.run and the result
variable by either adding check=True to subprocess.run or explicitly inspecting
result.returncode (and using result.stderr/result.stdout) after the call; if
returncode != 0, log/print a clear error message including result.stderr and
raise an exception (e.g., RuntimeError or re-raise CalledProcessError) so
downstream cells don't proceed on a failed git clone.

In `@AI/modules/signal/models/itransformer/train_kaggle.py`:
- Around line 326-327: The dict entries for "n_train_samples" and
"n_val_samples" wrap len(...) with an unnecessary int() call; remove the int()
so use "n_train_samples": len(X_train) and "n_val_samples": len(X_val) (locate
these keys in the training metadata construction in train_kaggle.py).

In `@AI/modules/signal/models/PatchTST/train_kaggle.py`:
- Around line 185-194: The except block in the try around
add_technical_indicators and add_multi_timeframe_features swallows the original
exception; update the handler in train_kaggle.py to re-raise a RuntimeError
using "raise RuntimeError(... ) from e" so the exception chain is preserved and
include the original exception message, and consider narrowing the except to
specific exceptions thrown by
add_technical_indicators/add_multi_timeframe_features (e.g.,
ValueError/TypeError) instead of a bare Exception; keep the existing
fail_count/fail_limit logic and logging but ensure the re-raise uses "from e".
- Line 321: Replace the unnecessary f-string in the final print statement:
locate the print call in train_kaggle.py that currently uses print(f"\n>> 완료")
and change it to a normal string print (remove the leading "f") so it reads
print("\n>> 완료"), leaving the rest of the function and surrounding code
unchanged.

In `@AI/modules/signal/models/PatchTST/wrapper.py`:
- Around line 191-193: Update the get_signals method to match the
BaseSignalModel contract by adding the explicit ticker_id and sector_id
parameters (e.g., def get_signals(self, df, ticker_id=None, sector_id=None,
**kwargs)) while keeping the current behavior of delegating to predict; ensure
the method accepts those parameters and forwards the dataframe to
self.predict(df) (and any necessary kwargs) so overrides/type-checking remain
consistent with other wrappers and BaseSignalModel.

In `@AI/pipelines/weekly_routine.py`:
- Line 30: Remove the unused import of the traceback module: delete the "import
traceback" line in AI/pipelines/weekly_routine.py (the unused symbol is
traceback) so there are no unused imports left; run linters to confirm no other
unused imports remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 583ecab4-5362-4c58-9617-171cf32089f0

📥 Commits

Reviewing files that changed from the base of the PR and between 96336e3 and 9cdc10c.

📒 Files selected for processing (23)
  • .github/workflows/train.yml
  • .gitignore
  • AI/kaggle_notebooks/TCN/kernel-metadata.json
  • AI/kaggle_notebooks/TCN/notebook.ipynb
  • AI/kaggle_notebooks/itransformer/kernel-metadata.json
  • AI/kaggle_notebooks/itransformer/notebook.ipynb
  • AI/kaggle_notebooks/patchtst/kernel-metadata.json
  • AI/kaggle_notebooks/patchtst/notebook.ipynb
  • AI/kaggle_notebooks/transformer/kernel-metadata.json
  • AI/kaggle_notebooks/transformer/notebook.ipynb
  • AI/modules/signal/core/artifact_paths.py
  • AI/modules/signal/models/PatchTST/train_kaggle.py
  • AI/modules/signal/models/PatchTST/wrapper.py
  • AI/modules/signal/models/TCN/tcn_train_kaggle.py
  • AI/modules/signal/models/TCN/wrapper.py
  • AI/modules/signal/models/itransformer/train_kaggle.py
  • AI/modules/signal/models/transformer/transformer_train_kaggle.py
  • AI/pipelines/weekly_routine.py
  • AI/scripts/deploy_to_server.py
  • AI/scripts/download_weights.py
  • AI/scripts/extract_to_parquet.py
  • AI/scripts/trigger_training.py
  • AI/scripts/upload_to_kaggle.py

Comment on lines +36 to +39
"shutil.copy('/kaggle/working/transformer_architecture.py', '/kaggle/working/architecture.py')\n",
"\n",
"from technical_features import add_technical_indicators, add_multi_timeframe_features\n",
"print('>> import 완료')"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

이 셀은 그대로는 NameError/ModuleNotFoundError로 멈춥니다.

shutil를 import하지 않았고, Line 38의 technical_features는 첫 셀에서 추가한 repo 경로 기준 top-level 모듈도 아닙니다. 이미 가져온 AI.modules.features.legacy.technical_features를 재사용하거나 여기서 정확한 모듈 경로로 import하세요.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/kaggle_notebooks/transformer/notebook.ipynb` around lines 36 - 39, 이 셀에서
발생하는 NameError/ModuleNotFoundError를 고치려면 먼저 shutil를 import하고 기술 지표 모듈을 올바른 경로로
가져오세요: 현재 코드의 from technical_features import add_technical_indicators,
add_multi_timeframe_features 대신 이미 사용중인
AI.modules.features.legacy.technical_features에서 add_technical_indicators와
add_multi_timeframe_features를 import하거나 기존 import(예:
AI.modules.features.legacy.technical_features)를 재사용하도록 수정하여 모듈 경로를 정확히 참조하고
shutil.copy 호출 전에 shutil이 import되었는지 확인하세요.

Comment on lines +123 to +132
for attempt in range(1, 4):
try:
conn = get_conn()
df = pd.read_sql(query, conn)
conn.close()
return df
except Exception as e:
print(f" [시도 {attempt}/3] 실패: {e}")
time.sleep(3)
raise RuntimeError(f"'{desc}' 쿼리 3회 모두 실패")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

재시도 경로에서 DB 연결이 누수됩니다.

Line 127의 conn.close()는 성공 케이스에서만 실행됩니다. pd.read_sql()이 예외를 던지면 연결이 열린 채 남아서, 재시도나 여러 테이블 추출 중에 커넥션 한도를 빠르게 소모할 수 있습니다.

수정 예시
 def read_sql_safe(query: str, desc: str = "") -> pd.DataFrame:
     """연결 끊김 시 최대 3회 재시도"""
     for attempt in range(1, 4):
+        conn = None
         try:
             conn = get_conn()
             df   = pd.read_sql(query, conn)
-            conn.close()
             return df
         except Exception as e:
             print(f"   [시도 {attempt}/3] 실패: {e}")
             time.sleep(3)
+        finally:
+            if conn is not None:
+                conn.close()
     raise RuntimeError(f"'{desc}' 쿼리 3회 모두 실패")
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 129-129: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/scripts/extract_to_parquet.py` around lines 123 - 132, The retry loop
leaks DB connections because conn.close() is only called on success; update the
loop around get_conn() and pd.read_sql(query, conn) (inside the retry in
extract_to_parquet.py) to always close the connection on each attempt — either
by using a try/finally that calls conn.close() if conn is not None or by
acquiring the connection with a context manager — so every opened connection
from get_conn() is closed even when pd.read_sql throws.

Comment on lines +178 to +184
df = read_sql_safe("""
SELECT date, cpi, gdp, interest_rate, unemployment_rate,
us10y, us2y, yield_spread, vix_close, dxy_close,
wti_price, gold_price, credit_spread_hy
FROM macroeconomic_indicators
ORDER BY date
""", "macroeconomic_indicators")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

거시 지표 추출 컬럼이 downstream 학습 스크립트와 맞지 않습니다.

AI/modules/signal/models/itransformer/train_kaggle.py:115-120correlation_spikesurprise_cpi까지 feature 후보로 봅니다. 여기서 둘 다 빼면 Kaggle 학습에서는 조용히 누락된 채로 학습돼서, 기존 파이프라인과 feature set이 달라집니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/scripts/extract_to_parquet.py` around lines 178 - 184, The SELECT in
read_sql_safe currently omits two features used downstream (correlation_spike
and surprise_cpi), causing a mismatch with
AI/modules/signal/models/itransformer/train_kaggle.py; fix by adding
correlation_spike and surprise_cpi to the SELECT projection from
macroeconomic_indicators in extract_to_parquet.py (ensure read_sql_safe and
resulting DataFrame include these columns so they persist into the output
parquet).

Comment on lines +94 to +117
result = subprocess.run(
["kaggle", "kernels", "status", slug],
capture_output=True,
text=True
)

output = result.stdout.lower()

if "complete" in output:
print(f" [{notebook['name']}] 학습 완료! ✅")
return True
elif "error" in output or "failed" in output:
print(f" [{notebook['name']}] 학습 실패! ❌")
print(result.stdout)
return False
elif "running" in output:
check_count += 1
elapsed = check_count * 5
print(f" [{notebook['name']}] 학습 중... ({elapsed}분 경과)")
else:
check_count += 1
print(f" [{notebook['name']}] 상태: {result.stdout.strip()}")

time.sleep(300) # 5분 대기
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "trigger_training.py" -type f

Repository: SISC-IT/sisc-web

Length of output: 92


🏁 Script executed:

cat -n ./AI/scripts/trigger_training.py | head -150 | tail -70

Repository: SISC-IT/sisc-web

Length of output: 2361


🏁 Script executed:

rg "subprocess\.run" ./AI/scripts/trigger_training.py -A 5

Repository: SISC-IT/sisc-web

Length of output: 366


🏁 Script executed:

cat -n ./AI/scripts/trigger_training.py | head -80 | tail -40

Repository: SISC-IT/sisc-web

Length of output: 1624


상태 폴링에서 CLI 오류를 즉시 실패로 처리해야 합니다.

kaggle kernels status 명령이 인증, 네트워크, 또는 slug 문제로 non-zero를 반환해도 현재 코드는 stdout만 검사하고 5분씩 재시도합니다. GitHub Actions에서 최대 12시간까지 불필요한 재시도가 발생할 수 있습니다. returncode를 먼저 확인하고 즉시 중단하세요. (참고: 같은 파일의 trigger_notebook() 함수는 이미 returncode를 올바르게 검사합니다.)

수정 예시
         result = subprocess.run(
             ["kaggle", "kernels", "status", slug],
             capture_output=True,
             text=True
         )
+        if result.returncode != 0:
+            print(f"   [{notebook['name']}] 상태 조회 실패! ❌")
+            print(result.stderr.strip())
+            return False
 
         output = result.stdout.lower()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result = subprocess.run(
["kaggle", "kernels", "status", slug],
capture_output=True,
text=True
)
output = result.stdout.lower()
if "complete" in output:
print(f" [{notebook['name']}] 학습 완료! ✅")
return True
elif "error" in output or "failed" in output:
print(f" [{notebook['name']}] 학습 실패! ❌")
print(result.stdout)
return False
elif "running" in output:
check_count += 1
elapsed = check_count * 5
print(f" [{notebook['name']}] 학습 중... ({elapsed}분 경과)")
else:
check_count += 1
print(f" [{notebook['name']}] 상태: {result.stdout.strip()}")
time.sleep(300) # 5분 대기
result = subprocess.run(
["kaggle", "kernels", "status", slug],
capture_output=True,
text=True
)
if result.returncode != 0:
print(f" [{notebook['name']}] 상태 조회 실패! ❌")
print(result.stderr.strip())
return False
output = result.stdout.lower()
if "complete" in output:
print(f" [{notebook['name']}] 학습 완료! ✅")
return True
elif "error" in output or "failed" in output:
print(f" [{notebook['name']}] 학습 실패! ❌")
print(result.stdout)
return False
elif "running" in output:
check_count += 1
elapsed = check_count * 5
print(f" [{notebook['name']}] 학습 중... ({elapsed}분 경과)")
else:
check_count += 1
print(f" [{notebook['name']}] 상태: {result.stdout.strip()}")
time.sleep(300) # 5분 대기
🧰 Tools
🪛 Ruff (0.15.7)

[error] 94-94: subprocess call: check for execution of untrusted input

(S603)


[error] 95-95: Starting a process with a partial executable path

(S607)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/scripts/trigger_training.py` around lines 94 - 117, In the status-polling
loop (the code that calls subprocess.run(["kaggle", "kernels", "status", slug],
...) inside the trigger_training flow), check result.returncode immediately
after subprocess.run and treat any non-zero return code as a hard failure:
log/print the error details (result.stderr or result.stdout), return False, and
stop polling instead of sleeping and retrying; keep the existing checks for
"complete"/"error"/"running" only when returncode == 0 to avoid unnecessary
5-minute retries (follow the same returncode handling pattern used in
trigger_notebook()).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/train.yml (1)

35-44: 의존성 버전 고정 및 캐싱 권장.

의존성 버전이 고정되지 않아 빌드 재현성 문제가 발생할 수 있습니다. 또한 setup-python 액션의 캐싱 기능을 활용하면 빌드 속도를 개선할 수 있습니다.

♻️ 권장 수정

requirements-train.txt 파일을 생성하여 버전을 고정하거나, 최소한 주요 버전을 명시하는 것을 권장합니다:

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.10'
+         cache: 'pip'

      - name: Install dependencies
        run: |
          pip install \
-           kaggle \
-           pandas \
-           pyarrow \
-           psycopg2-binary \
-           paramiko \
-           sshtunnel \
-           scp
+           kaggle>=1.6.0 \
+           pandas>=2.0.0 \
+           pyarrow>=14.0.0 \
+           psycopg2-binary>=2.9.0 \
+           paramiko>=3.0.0 \
+           sshtunnel>=0.4.0 \
+           scp>=0.14.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/train.yml around lines 35 - 44, The Install dependencies
step currently runs pip install with unpinned packages; create a
requirements-train.txt listing the exact package versions (or minimum major
versions) for kaggle, pandas, pyarrow, psycopg2-binary, paramiko, sshtunnel, scp
and update the "Install dependencies" step to pip install -r
requirements-train.txt; also enable caching by using the setup-python/cache or
actions/cache keyed on the Python version and requirements-train.txt (refer to
the existing setup-python action and the "Install dependencies" step) to speed
builds and improve reproducibility.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/train.yml:
- Around line 27-33: Update the GitHub Actions steps that reference
actions/checkout@v3 and actions/setup-python@v4 to their current supported
releases (e.g., actions/checkout@v4 and actions/setup-python@v5 or the latest
stable major versions); locate the steps that use the strings
"actions/checkout@v3" and "actions/setup-python@v4" in the workflow and replace
them with the updated action versions, then run a workflow lint or dry-run to
verify compatibility.
- Around line 46-51: Replace the fragile echo JSON creation in the "Setup Kaggle
credentials" step with a jq-based safe generator: capture the secrets
($KAGGLE_USERNAME and $KAGGLE_KEY) into environment variables and run something
like jq -n --arg u "$KAGGLE_USERNAME" --arg k "$KAGGLE_KEY"
'{username:$u,key:$k}' > ~/.kaggle/kaggle.json, then chmod 600
~/.kaggle/kaggle.json; this ensures special characters in
KAGGLE_KEY/KAGGLE_USERNAME are properly escaped and produces valid JSON.

---

Nitpick comments:
In @.github/workflows/train.yml:
- Around line 35-44: The Install dependencies step currently runs pip install
with unpinned packages; create a requirements-train.txt listing the exact
package versions (or minimum major versions) for kaggle, pandas, pyarrow,
psycopg2-binary, paramiko, sshtunnel, scp and update the "Install dependencies"
step to pip install -r requirements-train.txt; also enable caching by using the
setup-python/cache or actions/cache keyed on the Python version and
requirements-train.txt (refer to the existing setup-python action and the
"Install dependencies" step) to speed builds and improve reproducibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6b1f2880-10f3-48f4-85bb-c96eb2f1744e

📥 Commits

Reviewing files that changed from the base of the PR and between 9cdc10c and 1004415.

📒 Files selected for processing (1)
  • .github/workflows/train.yml

Comment on lines +27 to +33
- name: Checkout repository
uses: actions/checkout@v3

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

GitHub Actions 버전 업데이트 필요.

actions/checkout@v3actions/setup-python@v4는 더 이상 최신 GitHub Actions 러너에서 지원되지 않을 수 있습니다. 최신 버전으로 업데이트하세요.

🔧 권장 수정
      - name: Checkout repository
-       uses: actions/checkout@v3
+       uses: actions/checkout@v4

      - name: Setup Python
-       uses: actions/setup-python@v4
+       uses: actions/setup-python@v5
        with:
          python-version: '3.10'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Checkout repository
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Checkout repository
uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
🧰 Tools
🪛 actionlint (1.7.11)

[error] 28-28: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)


[error] 31-31: the runner of "actions/setup-python@v4" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/train.yml around lines 27 - 33, Update the GitHub Actions
steps that reference actions/checkout@v3 and actions/setup-python@v4 to their
current supported releases (e.g., actions/checkout@v4 and
actions/setup-python@v5 or the latest stable major versions); locate the steps
that use the strings "actions/checkout@v3" and "actions/setup-python@v4" in the
workflow and replace them with the updated action versions, then run a workflow
lint or dry-run to verify compatibility.

Comment on lines +46 to +51
- name: Setup Kaggle credentials
run: |
mkdir -p ~/.kaggle
echo '{"username":"${{ secrets.KAGGLE_USERNAME }}","key":"${{ secrets.KAGGLE_KEY }}"}' \
> ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Kaggle 인증 JSON 생성 방식 개선 필요.

현재 방식은 KAGGLE_KEY에 특수 문자(따옴표, 백슬래시 등)가 포함된 경우 JSON이 손상될 수 있습니다. jq를 사용하여 안전하게 JSON을 생성하는 것을 권장합니다.

🛡️ 권장 수정
      - name: Setup Kaggle credentials
        run: |
          mkdir -p ~/.kaggle
-         echo '{"username":"${{ secrets.KAGGLE_USERNAME }}","key":"${{ secrets.KAGGLE_KEY }}"}' \
-           > ~/.kaggle/kaggle.json
+         jq -n \
+           --arg user "${{ secrets.KAGGLE_USERNAME }}" \
+           --arg key "${{ secrets.KAGGLE_KEY }}" \
+           '{"username": $user, "key": $key}' > ~/.kaggle/kaggle.json
          chmod 600 ~/.kaggle/kaggle.json

jqubuntu-latest 러너에 기본 설치되어 있으므로 추가 설치가 필요 없습니다.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Setup Kaggle credentials
run: |
mkdir -p ~/.kaggle
echo '{"username":"${{ secrets.KAGGLE_USERNAME }}","key":"${{ secrets.KAGGLE_KEY }}"}' \
> ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
- name: Setup Kaggle credentials
run: |
mkdir -p ~/.kaggle
jq -n \
--arg user "${{ secrets.KAGGLE_USERNAME }}" \
--arg key "${{ secrets.KAGGLE_KEY }}" \
'{"username": $user, "key": $key}' > ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/train.yml around lines 46 - 51, Replace the fragile echo
JSON creation in the "Setup Kaggle credentials" step with a jq-based safe
generator: capture the secrets ($KAGGLE_USERNAME and $KAGGLE_KEY) into
environment variables and run something like jq -n --arg u "$KAGGLE_USERNAME"
--arg k "$KAGGLE_KEY" '{username:$u,key:$k}' > ~/.kaggle/kaggle.json, then chmod
600 ~/.kaggle/kaggle.json; this ensures special characters in
KAGGLE_KEY/KAGGLE_USERNAME are properly escaped and produces valid JSON.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
AI/scripts/trigger_training.py (1)

92-98: ⚠️ Potential issue | 🟠 Major

상태 조회 오류는 즉시 실패 처리하세요.

kaggle kernels status가 non-zero를 반환해도 지금은 stdout만 보고 5분씩 재시도합니다. 인증/slug/네트워크 문제면 최대 12시간 동안 불필요하게 대기하므로 returncode를 먼저 검사하고 바로 False를 반환하는 편이 안전합니다.

예시 수정
     while check_count < max_checks:
         result = subprocess.run(
             ["kaggle", "kernels", "status", slug],
             capture_output=True,
             text=True
         )
+        if result.returncode != 0:
+            print(f"   [{notebook['name']}] 상태 조회 실패! ❌")
+            print((result.stderr or result.stdout).strip())
+            return False
 
         output = result.stdout.lower()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/scripts/trigger_training.py` around lines 92 - 98, The code currently
inspects result.stdout regardless of process success; change the logic after the
subprocess.run(...) call to first check result.returncode and immediately return
False (and log an error including returncode/stderr) if it's non-zero, instead
of continuing into the 5-minute retry loop; update the logic around the
subprocess.run(...) invocation and the subsequent use of result/output to bail
out on failed `kaggle kernels status` calls.
🧹 Nitpick comments (1)
AI/scripts/trigger_training.py (1)

65-73: 커널 ID의 단일 소스를 두세요.

kernels push -p는 각 notebook_dirkernel-metadata.json을 기준으로 대상을 정하지만, 성공 URL과 상태 조회는 KAGGLE_USERNAME으로 다시 조합하고 있습니다. owner가 바뀌거나 metadata와 env가 어긋나면 push는 성공해도 polling은 다른 커널을 보게 되니, metadata의 id를 읽어 그대로 재사용하는 편이 안전합니다.

Also applies to: 81-88

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/scripts/trigger_training.py` around lines 65 - 73, The current success URL
and later polling reconstruct the kernel identifier from KAGGLE_USERNAME and
notebook['slug'], which can diverge from the pushed kernel; instead, after
running subprocess.run(...) to push (the kernels push -p using
notebook['notebook_dir']), open and parse the kernel-metadata.json in
notebook['notebook_dir'] to read the canonical kernel "id" and use that id for
both the success message in the block that prints notebook['name'] and for any
subsequent polling logic (the later block around lines 81-88), replacing usage
of KAGGLE_USERNAME/notebook['slug'] with the metadata id to ensure a single
authoritative source for the kernel identifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/signal/models/itransformer/train_kaggle.py`:
- Around line 130-132: The global df.ffill() after merging price_df and macro_df
causes feature leakage across tickers; change the fill strategy to forward-fill
macro columns only within each ticker group and then fill remaining NaNs with 0.
Locate the merged dataframe operations around df = pd.merge(price_df, macro_df,
on="date", how="left"), df = df.sort_values(["ticker",
"date"]).reset_index(drop=True) and replace the df.ffill().fillna(0) step with a
grouped forward-fill for macro columns (e.g., using
df.groupby("ticker")[macro_cols].ffill()) followed by df.fillna(0) to handle
other columns.
- Around line 49-51: The save filenames in train_kaggle.py ("model_name",
"scaler_name", "metadata_name") use the new itransformer_* names but downstream
code (artifact_paths.py, itransformer/wrapper.py, deploy_to_server.py) still
expects the old multi_horizon_* names (e.g., multi_horizon_model.keras,
multi_horizon_scaler.pkl); pick one convention and make it consistent: either
change the keys in train_kaggle.py back to
"model_name":"multi_horizon_model.keras" and
"scaler_name":"multi_horizon_scaler.pkl" (and "metadata_name" if applicable) or
update artifact_paths.py, itransformer/wrapper.py (around the loader at
wrapper.py:339) and deploy_to_server.py to look for itransformer_model.keras /
itransformer_scaler.pkl / itransformer_metadata.json; ensure all references to
model_name/scaler_name/metadata_name across these modules match exactly.

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py`:
- Line 260: Remove the unnecessary f-string used in the print statement by
replacing the literal print(f"\n>> 완료") with a plain string print("\n>> 완료");
locate the exact statement print(f"\n>> 완료") in tcn_train_kaggle.py (inside the
TCN training/finishing block) and remove the leading f to satisfy Ruff F541
linting.
- Around line 269-270: 현재 parser.add_argument("--end-date",
default="2023-12-31")로 고정돼 있어 자동 재학습이 최신 데이터를 반영하지 못합니다;
parser.add_argument("--end-date", ...)의 기본값을 하드코딩된 날짜에서 동적값으로 바꿔 오늘
날짜(datetime.date.today().isoformat()) 또는 파켓 파일에서 읽어온 최대 날짜(max_date)를 기본값으로
사용하도록 변경하고, argparse 처리 로직에서 --end-date가 주어지지 않았을 때만 이 동적값을 사용하도록 구현하십시오 (예:
compute max_date_from_parquet() 후에 default_end = max(today, max_date)로 설정하거나,
argparse 이후 if args.end_date is None: args.end_date = default_end 방식으로 처리).
- Around line 119-141: The loop that calls add_technical_indicators silently
skips up to 19 failures and then later pd.concat(processed) can raise "No
objects to concatenate" without context; change the feature-calculation loop
(the for ticker in tqdm(...) block) to log each failing ticker and its exception
(include the ticker value and exception text) when add_technical_indicators
raises, keep fail_count but fail faster or at least include a list of
failed_tickers, and after the loop check if processed is empty (or number of
successful tickers is zero) and raise a clear RuntimeError that includes
start_date/end_date, total tickers attempted, fail_count, and the failed_tickers
list before attempting to create full_df; ensure references to processed,
fail_count, failed_tickers, add_technical_indicators, and full_df are updated
accordingly.
- Line 268: The default parquet path set in the
parser.add_argument("--parquet-dir", default=os.environ.get('PARQUET_DIR',
'/kaggle/input/datasets/jihyeongkimm/sisc-ai-trading-dataset')) is invalid for
Kaggle; update the logic so if PARQUET_DIR is not set the code either: 1) uses a
safe default root '/kaggle/input' (so users must provide the specific dataset
slug), or 2) dynamically detects the mounted dataset by scanning /kaggle/input
(e.g., os.walk('/kaggle/input') and picking the matching dataset folder) and
sets the --parquet-dir fallback accordingly; change the code around the
--parquet-dir argument handling to prefer os.environ['PARQUET_DIR'] if present,
otherwise perform the detection or fallback to '/kaggle/input'.
- Around line 155-160: The current split uses "split_date =
dates[split_date_idx]" and then train_df = raw_df[raw_df['date'] <= split_date]
which includes the split_date in the training set and can leave val_df empty;
change the split to make train use dates strictly before the split_date and val
use >= split_date (i.e., use raw_df[raw_df['date'] < split_date] for train_df
and raw_df[raw_df['date'] >= split_date] for val_df) so that the chosen
split_date becomes the first date in the validation set; update references to
dates, split_date_idx, split_date, train_df, and val_df accordingly.

---

Duplicate comments:
In `@AI/scripts/trigger_training.py`:
- Around line 92-98: The code currently inspects result.stdout regardless of
process success; change the logic after the subprocess.run(...) call to first
check result.returncode and immediately return False (and log an error including
returncode/stderr) if it's non-zero, instead of continuing into the 5-minute
retry loop; update the logic around the subprocess.run(...) invocation and the
subsequent use of result/output to bail out on failed `kaggle kernels status`
calls.

---

Nitpick comments:
In `@AI/scripts/trigger_training.py`:
- Around line 65-73: The current success URL and later polling reconstruct the
kernel identifier from KAGGLE_USERNAME and notebook['slug'], which can diverge
from the pushed kernel; instead, after running subprocess.run(...) to push (the
kernels push -p using notebook['notebook_dir']), open and parse the
kernel-metadata.json in notebook['notebook_dir'] to read the canonical kernel
"id" and use that id for both the success message in the block that prints
notebook['name'] and for any subsequent polling logic (the later block around
lines 81-88), replacing usage of KAGGLE_USERNAME/notebook['slug'] with the
metadata id to ensure a single authoritative source for the kernel identifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88b44e3d-2441-4d9c-be85-f620b46bfdfa

📥 Commits

Reviewing files that changed from the base of the PR and between dd95843 and 9522964.

📒 Files selected for processing (5)
  • AI/modules/signal/core/artifact_paths.py
  • AI/modules/signal/models/TCN/tcn_train_kaggle.py
  • AI/modules/signal/models/itransformer/train_kaggle.py
  • AI/scripts/trigger_training.py
  • AI/scripts/upload_to_kaggle.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • AI/modules/signal/core/artifact_paths.py

Comment on lines +49 to +51
"model_name" : "itransformer_model.keras",
"scaler_name" : "itransformer_scaler.pkl",
"metadata_name" : "itransformer_metadata.json",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# iTransformer 아티팩트명이 downstream 소비 코드에 반영돼 있는지 확인합니다.
rg -n -C2 'itransformer_model\.keras|itransformer_scaler\.pkl|itransformer_metadata\.json|multi_horizon_model\.keras|multi_horizon_scaler\.pkl' AI

Repository: SISC-IT/sisc-web

Length of output: 4064


저장 파일명 계약을 downstream과 맞춰주세요.

iTransformer 학습 스크립트(train_kaggle.py 49-51줄)는 itransformer_model.keras, itransformer_scaler.pkl, itransformer_metadata.json으로 저장하지만, downstream 코드(artifact_paths.py, itransformer/wrapper.py, deploy_to_server.py)는 여전히 구 규약(multi_horizon_model.keras, multi_horizon_scaler.pkl)을 인식합니다. 특히 wrapper.py 339줄에서 multi_horizon_scaler.pkl을 로드하려고 하므로 학습 완료 후 자동 로드/배포 단계에서 파일을 찾을 수 없어 실패합니다.

기존 규약으로 통일하거나, 모든 downstream 코드를 새로운 이Transformer 이름으로 수정해야 합니다.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/itransformer/train_kaggle.py` around lines 49 - 51,
The save filenames in train_kaggle.py ("model_name", "scaler_name",
"metadata_name") use the new itransformer_* names but downstream code
(artifact_paths.py, itransformer/wrapper.py, deploy_to_server.py) still expects
the old multi_horizon_* names (e.g., multi_horizon_model.keras,
multi_horizon_scaler.pkl); pick one convention and make it consistent: either
change the keys in train_kaggle.py back to
"model_name":"multi_horizon_model.keras" and
"scaler_name":"multi_horizon_scaler.pkl" (and "metadata_name" if applicable) or
update artifact_paths.py, itransformer/wrapper.py (around the loader at
wrapper.py:339) and deploy_to_server.py to look for itransformer_model.keras /
itransformer_scaler.pkl / itransformer_metadata.json; ensure all references to
model_name/scaler_name/metadata_name across these modules match exactly.

Comment on lines +130 to +132
df = pd.merge(price_df, macro_df, on="date", how="left")
df = df.sort_values(["ticker", "date"]).reset_index(drop=True)
df = df.ffill().fillna(0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

전역 ffill()이 티커 간 feature 누수를 만듭니다.

df["ticker", "date"]로 정렬한 뒤 ffill()을 쓰면, 다음 티커 첫 행의 log_return/ret_1d와 macro 결측치가 이전 티커 마지막 값으로 채워집니다. macro 값은 더 늦은 날짜가 더 이른 날짜로 섞일 수도 있어서 학습 데이터가 오염됩니다. macro 컬럼만 티커별로 forward-fill하고 나머지는 fillna(0)로 마무리하는 쪽이 안전합니다.

예시 수정
     df = pd.merge(price_df, macro_df, on="date", how="left")
     df = df.sort_values(["ticker", "date"]).reset_index(drop=True)
-    df = df.ffill().fillna(0)
+    macro_fill_cols = [c for c in macro_df.columns if c != "date"]
+    if macro_fill_cols:
+        df[macro_fill_cols] = df.groupby("ticker")[macro_fill_cols].ffill()
+    df = df.fillna(0)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/itransformer/train_kaggle.py` around lines 130 -
132, The global df.ffill() after merging price_df and macro_df causes feature
leakage across tickers; change the fill strategy to forward-fill macro columns
only within each ticker group and then fill remaining NaNs with 0. Locate the
merged dataframe operations around df = pd.merge(price_df, macro_df, on="date",
how="left"), df = df.sort_values(["ticker", "date"]).reset_index(drop=True) and
replace the df.ffill().fillna(0) step with a grouped forward-fill for macro
columns (e.g., using df.groupby("ticker")[macro_cols].ffill()) followed by
df.fillna(0) to handle other columns.

Comment on lines +119 to +141
raw_df = raw_df[
(raw_df['date'] >= start_date) &
(raw_df['date'] <= end_date)
].copy()

print(f">> 로드 완료: {len(raw_df):,}행, {raw_df['ticker'].nunique()}개 종목")

# 피처 계산
print(">> 피처 계산 중...")
processed = []
fail_count = 0

for ticker in tqdm(raw_df['ticker'].unique(), desc="피처 계산"):
df_t = raw_df[raw_df['ticker'] == ticker].copy()
try:
df_t = add_technical_indicators(df_t)
processed.append(df_t)
except Exception as e:
fail_count += 1
if fail_count >= 20:
raise RuntimeError("피처 계산 실패가 20개를 초과했습니다.")

full_df = pd.concat(processed).reset_index(drop=True)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

전처리 실패가 조용히 누락되면 부분 학습이나 모호한 실패로 끝납니다.

현재 구현은 add_technical_indicators() 실패를 최대 19개까지 그냥 건너뛰어서, 버그와 데이터 이슈를 구분하지 못한 채 일부 종목만으로 학습이 진행될 수 있습니다. 게다가 날짜 필터 결과가 비었거나 전부 실패하면 Line 141에서 No objects to concatenate만 남아 원인이 사라집니다.

🛠️ 제안 수정
     raw_df = raw_df[
-        (raw_df['date'] >= start_date) &
-        (raw_df['date'] <= end_date)
+        (raw_df["date"] >= start_date) &
+        (raw_df["date"] <= end_date)
     ].copy()
+    if raw_df.empty:
+        raise ValueError("지정한 날짜 범위에 해당하는 데이터가 없습니다.")
@@
-        except Exception as e:
+        except Exception as err:
             fail_count += 1
+            print(f">> 피처 계산 실패: {ticker} ({fail_count}개) - {err}")
             if fail_count >= 20:
-                raise RuntimeError("피처 계산 실패가 20개를 초과했습니다.")
+                raise RuntimeError("피처 계산 실패가 20개를 초과했습니다.") from err
 
-    full_df = pd.concat(processed).reset_index(drop=True)
+    if not processed:
+        raise ValueError("피처 계산에 성공한 종목이 없습니다.")
+    full_df = pd.concat(processed, ignore_index=True)
🧰 Tools
🪛 Ruff (0.15.7)

[warning] 136-136: Do not catch blind exception: Exception

(BLE001)


[warning] 139-139: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py` around lines 119 - 141, The
loop that calls add_technical_indicators silently skips up to 19 failures and
then later pd.concat(processed) can raise "No objects to concatenate" without
context; change the feature-calculation loop (the for ticker in tqdm(...) block)
to log each failing ticker and its exception (include the ticker value and
exception text) when add_technical_indicators raises, keep fail_count but fail
faster or at least include a list of failed_tickers, and after the loop check if
processed is empty (or number of successful tickers is zero) and raise a clear
RuntimeError that includes start_date/end_date, total tickers attempted,
fail_count, and the failed_tickers list before attempting to create full_df;
ensure references to processed, fail_count, failed_tickers,
add_technical_indicators, and full_df are updated accordingly.

Comment on lines +155 to +160
dates = raw_df['date'].sort_values().unique()
split_date_idx = int(len(dates) * 0.8)
split_date = dates[split_date_idx]

train_df = raw_df[raw_df['date'] <= split_date].copy()
val_df = raw_df[raw_df['date'] > split_date].copy()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Train/Val 분할이 하루 밀려서 검증셋이 비어질 수 있습니다.

split_date = dates[int(len(dates) * 0.8)] 다음에 <= / >로 나누면 split date가 train 쪽에 포함됩니다. 예를 들어 unique date가 5개면 int(5 * 0.8) == 4라서 Line 160의 val이 바로 빈 DataFrame이 됩니다.

🛠️ 제안 수정
     dates          = raw_df['date'].sort_values().unique()
     split_date_idx = int(len(dates) * 0.8)
-    split_date     = dates[split_date_idx]
+    if split_date_idx <= 0 or split_date_idx >= len(dates):
+        raise ValueError("Train/Val 분할을 위해 최소 2개 이상의 거래일이 필요합니다.")
+    train_dates    = dates[:split_date_idx]
+    val_dates      = dates[split_date_idx:]
 
-    train_df = raw_df[raw_df['date'] <= split_date].copy()
-    val_df   = raw_df[raw_df['date'] >  split_date].copy()
-    print(f">> Train: ~{split_date}, Val: {split_date}~")
+    train_df = raw_df[raw_df['date'].isin(train_dates)].copy()
+    val_df   = raw_df[raw_df['date'].isin(val_dates)].copy()
+    print(f">> Train: {train_dates[0]}~{train_dates[-1]}, Val: {val_dates[0]}~{val_dates[-1]}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py` around lines 155 - 160, The
current split uses "split_date = dates[split_date_idx]" and then train_df =
raw_df[raw_df['date'] <= split_date] which includes the split_date in the
training set and can leave val_df empty; change the split to make train use
dates strictly before the split_date and val use >= split_date (i.e., use
raw_df[raw_df['date'] < split_date] for train_df and raw_df[raw_df['date'] >=
split_date] for val_df) so that the chosen split_date becomes the first date in
the validation set; update references to dates, split_date_idx, split_date,
train_df, and val_df accordingly.

with open(metadata_path, "w", encoding="utf-8") as f:
json.dump(metadata, f, ensure_ascii=False, indent=2)

print(f"\n>> 완료")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

불필요한 f prefix 때문에 Ruff F541가 납니다.

placeholder가 없는 문자열이라 f가 의미가 없습니다. lint gate가 걸려 있으면 여기서 CI가 막힙니다.

🛠️ 제안 수정
-    print(f"\n>> 완료")
+    print("\n>> 완료")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"\n>> 완료")
print("\n>> 완료")
🧰 Tools
🪛 Ruff (0.15.7)

[error] 260-260: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py` at line 260, Remove the
unnecessary f-string used in the print statement by replacing the literal
print(f"\n>> 완료") with a plain string print("\n>> 완료"); locate the exact
statement print(f"\n>> 완료") in tcn_train_kaggle.py (inside the TCN
training/finishing block) and remove the leading f to satisfy Ruff F541 linting.


def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Train TCN signal model (Kaggle/Actions 버전)")
parser.add_argument("--parquet-dir", default=os.environ.get('PARQUET_DIR', '/kaggle/input/datasets/jihyeongkimm/sisc-ai-trading-dataset'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Kaggle Notebook에서 dataset을 attached 했을 때 기본 mount 경로는 무엇인가요? /kaggle/input/datasets//도 유효한 런타임 경로인가요, 아니면 기본값을 /kaggle/input/로 둬야 하나요?

💡 Result:

Kaggle Notebook에서 dataset을 attached 했을 때 기본 mount 경로는 /kaggle/input/ 입니다. 각 attached dataset은 /kaggle/input/ 아래에 dataset의 slug (또는 이름, owner/slug 형식)에 해당하는 디렉토리로 마운트됩니다. 예를 들어, "aviation-accident-database-synopses" dataset의 경우 /kaggle/input/aviation-accident-database-synopses/AviationData.csv 입니다. /kaggle/input/datasets// 형식은 표준이 아니며 유효하지 않습니다. 기본값은 /kaggle/input/ 또는 /kaggle/input/ 형식으로, 실제 경로는 os.walk('/kaggle/input')이나 notebook의 Data 탭에서 확인하는 것이 안전합니다. /kaggle/input/만 사용하는 것은 이름 충돌 가능성 때문에 권장되지 않습니다.

Citations:


Kaggle 기본 입력 경로 수정 필요

현재 기본값 /kaggle/input/datasets/jihyeongkimm/sisc-ai-trading-dataset은 Kaggle의 표준 마운트 경로가 아닙니다. Kaggle Notebook에서 attached dataset은 /kaggle/input/<dataset-slug> 또는 /kaggle/input/<owner-dataset-slug> 형식으로만 마운트됩니다. /kaggle/input/datasets/ 경로는 유효하지 않으므로, PARQUET_DIR 환경 변수가 설정되지 않으면 스크립트가 실패합니다.

실제 마운트된 경로는 데이터셋의 정확한 slug를 확인하여 올바른 기본값으로 수정하거나, 또는 os.walk('/kaggle/input')로 실시간 확인하도록 변경하세요.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py` at line 268, The default
parquet path set in the parser.add_argument("--parquet-dir",
default=os.environ.get('PARQUET_DIR',
'/kaggle/input/datasets/jihyeongkimm/sisc-ai-trading-dataset')) is invalid for
Kaggle; update the logic so if PARQUET_DIR is not set the code either: 1) uses a
safe default root '/kaggle/input' (so users must provide the specific dataset
slug), or 2) dynamically detects the mounted dataset by scanning /kaggle/input
(e.g., os.walk('/kaggle/input') and picking the matching dataset folder) and
sets the --parquet-dir fallback accordingly; change the code around the
--parquet-dir argument handling to prefer os.environ['PARQUET_DIR'] if present,
otherwise perform the detection or fallback to '/kaggle/input'.

Comment on lines +269 to +270
parser.add_argument("--start-date", default="2015-01-01")
parser.add_argument("--end-date", default="2023-12-31")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

기본 end-date가 2023-12-31로 고정돼 있어 자동 재학습이 최신 데이터를 못 봅니다.

오늘 날짜가 2026-03-27인데 현재 기본값이면 2024-01-01 이후 데이터는 override가 없는 모든 실행에서 계속 제외됩니다. 주간 파이프라인 용도라면 실행일이나 parquet의 최대 날짜를 기본값으로 잡는 편이 안전합니다.

🛠️ 제안 수정
     parser.add_argument("--start-date",    default="2015-01-01")
-    parser.add_argument("--end-date",      default="2023-12-31")
+    parser.add_argument(
+        "--end-date",
+        default=os.environ.get("END_DATE", pd.Timestamp.today().strftime("%Y-%m-%d")),
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/TCN/tcn_train_kaggle.py` around lines 269 - 270, 현재
parser.add_argument("--end-date", default="2023-12-31")로 고정돼 있어 자동 재학습이 최신 데이터를
반영하지 못합니다; parser.add_argument("--end-date", ...)의 기본값을 하드코딩된 날짜에서 동적값으로 바꿔 오늘
날짜(datetime.date.today().isoformat()) 또는 파켓 파일에서 읽어온 최대 날짜(max_date)를 기본값으로
사용하도록 변경하고, argparse 처리 로직에서 --end-date가 주어지지 않았을 때만 이 동적값을 사용하도록 구현하십시오 (예:
compute max_date_from_parquet() 후에 default_end = max(today, max_date)로 설정하거나,
argparse 이후 if args.end_date is None: args.end_date = default_end 방식으로 처리).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AI/modules/signal/models/PatchTST/train.py`:
- Around line 217-219: The current split forces n_val = max(1, int(len(tickers)
* 0.2)) which makes train_tickers empty when len(tickers) == 1 and leads to
divide-by-zero later; change the logic around tickers, n_val and val_tickers so
n_val is at most len(tickers)-1 (e.g. n_val = min(max(1, int(len(tickers) *
0.2)), max(0, len(tickers) - 1))), and if len(tickers) < 2 either raise a clear
error or handle the single-ticker case explicitly (ensure train_tickers is
non-empty before proceeding and adjust downstream averaging logic in the
training loop that uses train_tickers/val_tickers).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3914cdca-d1bf-467b-adcb-47a133d02b6a

📥 Commits

Reviewing files that changed from the base of the PR and between 9522964 and f9a8b7d.

📒 Files selected for processing (1)
  • AI/modules/signal/models/PatchTST/train.py

Comment on lines +217 to +219
tickers = full_df['ticker'].unique()
n_val = max(1, int(len(tickers) * 0.2))
val_tickers = tickers[-n_val:] # 마지막 20% 티커를 val로 (시간 순서 보존)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

검증 분할 로직으로 학습 세트가 비어 즉시 실패할 수 있습니다.

Line 218에서 n_val을 최소 1로 강제해 ticker가 1개인 경우 train_tickers가 비게 됩니다. 이 상태로 진행하면 Line 291/301 평균 손실 계산에서 0으로 나누기 예외가 발생합니다. 분할 전에 최소 ticker 수를 검증하고, 분할 비율 상한도 보정하는 게 안전합니다.

수정 제안
-    tickers       = full_df['ticker'].unique()
-    n_val         = max(1, int(len(tickers) * 0.2))
-    val_tickers   = tickers[-n_val:]   # 마지막 20% 티커를 val로 (시간 순서 보존)
+    tickers = full_df['ticker'].unique()
+    if len(tickers) < 2:
+        raise ValueError("train/val 분할을 위해 최소 2개 이상의 ticker가 필요합니다.")
+    n_val = min(max(1, int(len(tickers) * 0.2)), len(tickers) - 1)
+    val_tickers = tickers[-n_val:]   # 마지막 20% 티커를 val로
     train_tickers = tickers[:-n_val]
@@
     X_train, y_train = build_sequences(train_df, scaler, fit_scaler=True)
     X_val,   y_val   = build_sequences(val_df,   scaler, fit_scaler=False)
+    if len(X_train) == 0 or len(X_val) == 0:
+        raise ValueError("시퀀스 생성 결과가 비어 있습니다. 분할/입력 데이터를 확인하세요.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AI/modules/signal/models/PatchTST/train.py` around lines 217 - 219, The
current split forces n_val = max(1, int(len(tickers) * 0.2)) which makes
train_tickers empty when len(tickers) == 1 and leads to divide-by-zero later;
change the logic around tickers, n_val and val_tickers so n_val is at most
len(tickers)-1 (e.g. n_val = min(max(1, int(len(tickers) * 0.2)), max(0,
len(tickers) - 1))), and if len(tickers) < 2 either raise a clear error or
handle the single-ticker case explicitly (ensure train_tickers is non-empty
before proceeding and adjust downstream averaging logic in the training loop
that uses train_tickers/val_tickers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI AI 이슈

Projects

None yet

Development

Successfully merging this pull request may close these issues.

⚙️ [기능추가][AI] Kaggle 기반 자동 학습 파이프라인 구축

1 participant