[AI] AI 파이프라인 설계하기(2)

저번글에서 말했던거 처럼 AI 파이프라인(멀티에이전트)의 가장 핵심 코어인 검증 모듈부터 손대보고자 한다.

왜 이게 코어냐고??

ㅋㅋㅋ 바이브 코딩을 해보면 알다시피 이 놈들은 다 구라쟁이다...

ㄹㅇ 그럴싸 하게 똥을 싸질러놓고 "다 됬읍니다!! 헤헤헿!!!"

하고 자세히 뜯어보면 개판도 아주 그런 개판이 따로없다 ……흥..(실행이나 제대로 되면 양반이지ㅡㅡ)

그렇기에 멱살 끌어잡고 테스트하는 녀석이 젤 중요하다 생각한다

일단 테스트를 하려면 대략적으로 검증”만” 하는게 아니라 검증을 하고 수정된 부분이 있으면 고치는 부분까지 해줘야 Agent라고 할 수 있기에… 수정하는 부분까지 한번 구성해보려고 한다.

✅ 개발환경

언어	Python	3.10
AI 프레임워크	LangGraph	1.0.8
LLM 연동	LangChain Core	1.2.9
LLM Provider	Groq (langchain-groq)	1.1.2
데이터 검증	Pydantic v2	2.12.5
빌드환경	Gradle

난 JAVA 및 백엔드가 익숙하기에 ... Gradle 빌드가정하에 먼저 초기버전을 만들었다.

✔ Groq?

그록을 사용한 이유는 간단하다. ㅋㅋㅋ 서버가 없다 ㅠㅠ... 그렇다고 집에 안쓰는 컴퓨터도 없다....일단 무료 모델로 서비스 하고있다는 그록을 이용하였다. 이마저도 ㅋㅋ 한 4~5번 호출하니 앵꼬나서 계정 3개 돌려가며 처절히 진행함 ㅠㅠㅠㅠㅠㅠㅠ😭😭

물론 나중에 다른 LLM 모델로 갈아 끼우기 편하게 확장성있게 설계 하였다. (꼭 바꿔서 갈아끼울꺼임...이짓모태..)

✔ Pydantic(파이단틱)?

파이썬에서 데이터 타입 검사하는 놈이다. 난 여기서 LLM이 자연어로 쏼라쏼라 하는걸 강제로 구조화 하기 위해 이용하였다. (그래야 환각을 막을수 있음) 스키마 자체를 강제로 Fix 하여 없을 경우 튕겨내도록 하였다.

그리고 @field_validator로 cmd_safety_check 같은 로직을 선언적으로 박아두면, 객체 생성 시점에 자동 실행된다.

이게 뭔소리여? 싶은데 걍 JAVA에서 @Valid+@NotNull+@Pattern 과 같이 Validation 어노테이션을 만들어 준단거다.(DTO+Validaion 조합이라 보면됨)

아니 근데… 멀티에이전트? 아니 AI IDE는 1개인데 어떻게 멀티 에이전트로 돌린단거지?

놉…..여기서 말하는건 IDE를 여러개 쓰는게 아니라 멀티 쓰레드같은 개념이라 보면 된다.

JAVA 에서 멀티 쓰레드를 쓰려면 Thread pool도 관리해야하고 그걸 해주는 ThreadPoolExecutor 같은 프레임워크들이 있다.

여기도 이 멀티 에이전트들을 관리해주는 놈이 있는데 바로 그놈이 LangGraph 이다!!

✔ LangGraph?

ㅋㅋ내가 싫어하는 이 AI의 불확실성을 통제 가능한 워크 플로우로 묶어주는 프레임워크이다. 캬캬 결정론적인 놈이라 아주 좋다.

에이전트 들이 여러 개일 경우 오케스트레이션 해주는 툴이다.

내가 원하는대로 화살표를 슉슉 조정 할 수 있다는 거다!

ex. A → B → 실패? → 다시 B → 성공? → C

(참고로 LangChain은 걍 A→B→C 순차적으로가는 놈임.)

즉!! AI의 창의성은 필요하지만 업무 프로세스는 고정되어야 할 때 무조건 필수로 써야 하는 놈이다.

이 랭그래프를 이용하여 노드녀석들을 구성해보았다.

def build_graph():
    g = StateGraph(GraphState)

    g.add_node("verifier", verifier_node)
    g.add_node("fixer", fixer_node)
    g.add_node("rerun_verifier", rerun_verifier_node)
    g.add_node("apply", apply_node)
    g.add_node("rollback", rollback_node)

    g.set_entry_point("verifier")

    routes = {
        "ok": END,
        "fix": "fixer",
        "rollback": "rollback",
        "stop": END,
    }

    g.add_conditional_edges("verifier", route_after_verifier, routes)

    g.add_conditional_edges(
        "fixer",
        route_after_fixer,
        {"rerun": "rerun_verifier", "apply": "apply", "stop": END},
    )

    g.add_conditional_edges("rerun_verifier", route_after_verifier, routes)

    g.add_edge("apply", "verifier")

    g.add_edge("rollback", "fixer")

    return g.compile()

검증,수정,다시검증,적용,롤백 5개의 노드(add_node)
검증후, 수정후(add_conditional_edges)
시작점(set_entry_point)은 검증노드
routes는 목적지지표
apply노드 후엔 무조건 다시 검증노드
rollback 노드 후엔 무조건 다시 수정노드

add_conditional_edges? 이녀석들은 노드놈들이 일처리 후 "우리 이제 어디로 가야하죵????ㅇ_ㅇ" 할 때 알려주는 네비게이션같은 놈들이다. 그렇기에 GraphState 공용게시판을 보고 상태를 확인한 뒤 알맞은 목적지(routes)로 보내버리는 녀석이다.

add_edge? 이건 상태에 관련 없이 무조건 이 노드 다음엔 저 노드로 가거랏! 하고 강제하는 명령어다.

☑️ verifier_node

랭그래프에서 가장 시작점이자 끝내는 놈이다. 실제 gradle 빌드를 때려서 결과값에 따라 끝낼지 안끝낼지 보는 본체다.

def run_verify(task: TaskSpec, repo_root: Path) -> VerifyReport:
    if not repo_root.exists():
        raise RuntimeError(f"repo_root not found: {repo_root}")
    if not repo_root.is_dir():
        raise RuntimeError(f"repo_root is not a directory: {repo_root}")

    results: List[CommandResult] = []
    ok = True

    for c in task.verify.commands:
        wd = resolve_workdir(repo_root, c.workdir)
        if not wd.exists() or not wd.is_dir():
            ok = False
            results.append(
                CommandResult(
                    name=c.name,
                    workdir=c.workdir,
                    cmd=c.cmd,
                    resolved_cmd=[],
                    exit_code=2,
                    duration_sec=0.0,
                    stdout_tail="",
                    stderr_tail="",
                    error_summary=f"workdir not found or not a directory: {wd}",
                )
            )
            continue

        resolved = resolve_cmd(c.cmd, repo_root)

        proc_env = os.environ.copy()
        if os.name == "nt":
            jvm_utf8 = "-Dfile.encoding=UTF-8 -Dstdout.encoding=UTF-8 -Dstderr.encoding=UTF-8"
            existing = proc_env.get("JAVA_TOOL_OPTIONS", "")
            proc_env["JAVA_TOOL_OPTIONS"] = f"{existing} {jvm_utf8}".strip()

        start = time.time()
        try:
            proc = subprocess.run(
                resolved,
                cwd=str(wd),
                capture_output=True,
                text=True,
                encoding="utf-8",
                errors="replace",
                timeout=c.timeout_sec,
                shell=False,
                env=proc_env,
            )
            
            duration = time.time() - start

            stdout_tail = _tail(proc.stdout or "", task.verify.max_output_chars)
            stderr_tail = _tail(proc.stderr or "", task.verify.max_output_chars)

            err_summary: Optional[str] = None
            if proc.returncode != 0:
                ok = False
                err_summary = (
                    (stderr_tail or stdout_tail)[-500:]
                    if (stderr_tail or stdout_tail)
                    else "command failed"
                )

            results.append(
                CommandResult(
                    name=c.name,
                    workdir=c.workdir,
                    cmd=c.cmd,
                    resolved_cmd=resolved,
                    exit_code=proc.returncode,
                    duration_sec=round(duration, 3),
                    stdout_tail=stdout_tail,
                    stderr_tail=stderr_tail,
                    error_summary=err_summary,
                )
            )

        except subprocess.TimeoutExpired as e:
            duration = time.time() - start
            ok = False
            results.append(
                CommandResult(
                    name=c.name,
                    workdir=c.workdir,
                    cmd=c.cmd,
                    resolved_cmd=resolved,
                    exit_code=124,
                    duration_sec=round(duration, 3),
                    stdout_tail=_tail(getattr(e, "stdout", "") or "", task.verify.max_output_chars),
                    stderr_tail=_tail(getattr(e, "stderr", "") or "", task.verify.max_output_chars),
                    error_summary=f"timeout after {c.timeout_sec}s",
                )
            )

    return VerifyReport(task_id=task.task_id, ok=ok, results=results)

def resolve_cmd(cmd: List[str], repo_root: Path) -> List[str]:
    """
    cmd[0]이 gradlew 계열이면 OS에 맞게 치환/래핑.
    - JSON은 ["gradlew", "build"] 같은 논리 명령 유지 (allowlist 통과)
    - Windows에서는 .bat 실행을 위해 cmd.exe /c 로 감싼다 (WinError 2 방지)
    """
    if not cmd:
        return cmd

    head = str(cmd[0]).strip()

    if head in ("gradlew", "./gradlew", "gradlew.bat"):
        wrapper = resolve_gradle_wrapper(repo_root)
        base = [str(wrapper)] + cmd[1:]

        # Windows에서 .bat은 CreateProcess로 직접 실행이 불안정/실패(WinError 2)할 수 있어
        # cmd.exe로 감싸서 확실하게 실행한다.
        if os.name == "nt" and wrapper.suffix.lower() == ".bat":
            # 공백 포함 경로/인자 안전하게 한 줄로 만들어 cmd.exe /c 에 전달
            cmdline = subprocess.list2cmdline(base)
            return ["cmd.exe", "/d", "/s", "/c", cmdline]

        return base

    return cmd

☑️ route_after_verifier

상태를 보고 어디로 갈지 결정해주는 녀석이다.

def route_after_verifier(state: GraphState) -> str:
    report = state.get("verify_report")
    attempt = state.get("attempt", 0)

    if report and report.ok:
        ui.route_decision("END", "검증 성공", style="green")
        return "ok"

    if state.get("did_apply") is True:
        ui.route_decision("Rollback", "Apply 후 검증 실패 → 원복", style="red")
        return "rollback"

    if attempt >= MAX_ATTEMPTS:
        ui.route_decision("END", f"최대 시도 횟수 도달 ({MAX_ATTEMPTS})", style="red")
        return "stop"

    ui.route_decision("Run Fixer", "수정 필요", style="purple")
    return "fix"

여기서 중요하게 볼 부분은 ok와 did_apply의 순서다. 이게 무슨 대수냐.. 싶지만 ㅋㅋ

apply노드에서 LLM이 수정하게 되면 did_apply가 true가 되고, 빌드 결과가 성공이다? 그럼 report.ok 가 된다. 하지만 실패면... failed 이기에... did_apply의 true/false에 관계없이 일단 빌드 성공하면 장땡!!이므로 이 결과값부터 먼저 체크하였다.

하지만 이 다음 조건문인 did_apply is True.. 여기로 오게됬단거는 ㅋㅋ 수정은 했으나 빌드가 실패했단 의미가 되므로 rollback 노드로 보냈다.

그리고 MAX_ATTEMPTS로 호출을 제한(3회)하여 무한루프를 방지하였다. (LLM이 실패로 토큰 다 퍼묵는거 못참지!!!)

어차피 이 Max값까지 수정을 못했다는건 결국 사람이 개입해서 수정을 해야하는 문제가 되는거기에…난 적당히 3회정도로 해주었따.

☑️ fixer_node

def fixer_node(state: GraphState) -> GraphState:
    task = state["task"]
    report = state["verify_report"]
    repo_root = Path(state["repo_root"])

    prev_sigs = list(state.get("failed_plan_signatures", []))
    prev_notes = list(state.get("failed_failures_notes", []))

    fix_count = state.get("metrics_fix_count", 0) + 1
    ui.node_enter("Run Fixer", f"LLM 호출 #{fix_count}")

    with ui.llm_spinner():
        plan, blocked = make_fix_plan_llm(
            task=task,
            repo_root=repo_root,
            report=report,
            previous_failed_signatures=prev_sigs,
            previous_failures_notes=prev_notes,
        )

    ui.show_fix_plan(plan)

    blocked_total = state.get("metrics_blocked_count", 0) + blocked
    metrics_update = {"metrics_fix_count": fix_count, "metrics_blocked_count": blocked_total}

    if any((e.new_content or "").strip() for e in (plan.edits or [])):
        return {"fix_plan": plan, "next_action": "apply", "stop_reason": None, **metrics_update}

    if plan.commands_to_rerun and any(cmd for cmd in plan.commands_to_rerun):
        return {"fix_plan": plan, "next_action": "rerun", "stop_reason": None, **metrics_update}

    return {
        "fix_plan": plan,
        "next_action": "stop",
        "stop_reason": "FixPlan has neither commands_to_rerun nor edits[].new_content",
        **metrics_update,
    }

fix 상태면 오는 fixer 노드이다. 여기서 핵심은 이전 수정부분(시그니처)들을 갖고 와서 비교한단거다. 저번에 저리 하니 망했다!!!하는 같은 실수를 반복하게 하지 않기 위해 추가한 부분이다. (이래야 토큰아낌....이러니까 토큰에 미친사람같다 ㄱ- 내지갑 소즁해..)

코드 수정안이 있을 경우 apply로 가고, 커맨드만 바꿔도 될경우(수정안無) rerun으로 부분 검증하도록 분기태웠다.

metrics_update는 내가 따로 지표를 수집하려고 추가하였다.

✔ make_fix_plan_llm

def make_fix_plan_llm(
	...
    model: str = "llama-3.3-70b-versatile",
) -> tuple[FixPlan, int]:
....
for i, cmd in enumerate(out):
      VerifyCommand(name=f"rerun_{i+1}", cmd=cmd)
....

llama-3.3-70b-versatile로 쓴 이유? 이걸 쓴 이유는 단순하다. 컴파일 에러 로그 보고 자바 파일 통째로 고치려면 작은 모델은 실패율이 높다. 거기다 FixPlan JSON 스키마도 칼같이 지켜야 해서 큰 모델이 필요했다. Groq는 빠르니까 루프 돌리기 편했고. v0 데모 성립시키기엔 이게 제일 현실적인 선택이었다. 즉, 공짜+빠름(Groq)+70B추론 을 만족하는 녀석이었기에 채택하였따.
Pydantic 차단 VerifyCommand(cmd=cmd) 객체를 생성하는 순간 Pydantic이 cmd_safety_check validator를 자동 실행한다. cmd에 bash, cmd.exe 같은 게 들어있으면 여기서 ValueError가 터져서 빠꾸 시켜버린다. 따라서 별도 검증 노드가 필요 없다. 객체 생성 시점에 자동으로 터지니까.

🗨 근데 이건 명령어만 검증하고 파일은 검사 안하잖아?

맞다. 그걸 위해 2차 방어선으로 apply노드에서 화이트리스트를 만들어 확장자를 체크하고 경로 제약을 두어 차단하였다.

☑️ route_after_fixer

def route_after_fixer(state: GraphState) -> str:
    action = state.get("next_action") or "stop"
    labels = {
        "apply": ("Apply Changes", "blue"),
        "rerun": ("Rerun Verifier", "blue"),
        "stop": ("END", "red"),
    }
    label, style = labels.get(action, ("END", "red"))
    reason_map = {
        "apply": "코드 수정 적용",
        "rerun": "숏컷 재검증",
        "stop": "수정 불가 — 종료",
    }
    ui.route_decision(label, reason_map.get(action, ""), style=style)
    return action

앞서 말한바와 같이 fixer녀석이 끝나고나면 상태를 보고 어디로 보내야 할지 네비게이션 역할을 해준다.

☑️ apply_node

def apply_node(state: GraphState) -> GraphState:
    ui.node_enter("Apply Changes", "LLM 수정안을 소스 코드에 반영")

    repo_root = Path(state["repo_root"])
    plan = state["fix_plan"]

    with ui.apply_spinner():
        results, diff_path = apply_fix_plan(repo_root, plan)

    ui.show_apply_table([r.__dict__ for r in results])

    did_apply = any(getattr(r, "applied", False) for r in results)

    return {
        "apply_results": [r.__dict__ for r in results],
        "diff_path": diff_path,
        "did_apply": did_apply,
        "next_action": None,
        "stop_reason": None,
    }

LLM이 준 Fix Plan을 실제 소스에 반영하는 부분이다. 여기서 핵심은 did_apply쪽이다. 하나라도 바뀐 내용이 있는지 체크(True/False) 해서 State에 박아둔다. 그래야 나중에 route_after_verifier 에서 rollback_node로 갈지 말지 판단하는 기준이 되기 때문이다.

☑️ rollback_node

위에서 말한대로 did_apply가 true 일 경우 오는 노드이다. git restore 로 소스를 원복 하고 히스토리에 남긴다.

☑️ rerun_verifier_node

def rerun_verifier_node(state: GraphState) -> GraphState:
    ui.node_enter("Rerun Verifier", "숏컷 부분 검증")

    task = state["task"]
    repo_root = Path(state["repo_root"])
    plan = state["fix_plan"]

    default_timeout = task.verify.commands[0].timeout_sec if task.verify.commands else 900

    new_commands = [
        VerifyCommand(
            name=f"rerun_{i}",
            workdir=".",
            cmd=cmd,
            timeout_sec=default_timeout,
        )
        for i, cmd in enumerate(plan.commands_to_rerun, start=1)
    ]

    rerun_task = task.model_copy(deep=True)
    rerun_task.verify.commands = new_commands

    with ui.build_spinner():
        report = run_verify(rerun_task, repo_root)

    if report.ok:
        ui.verify_result_panel(True)
    else:
        err = ""
        for r in report.results:
            if r.exit_code != 0:
                err = (r.error_summary or "")[:200]
                break
        ui.verify_result_panel(False, err)

    history = list(state.get("verify_history", []))
    history.append(report)

    return {
        "verify_report": report,
        "verify_history": history,
        "attempt": state.get("attempt", 0) + 1,
        "did_apply": False,
        "next_action": None,
        "stop_reason": None,
    }

rerun_verifier_node는 LLM이 코드 수정 없이 명령어만 바꾸자고 제안했을 때 타는 부분 검증 노드이다.

전체 verifier를 다시 돌리는 대신, LLM이 제안한 commands_to_rerun만 실행해서 빠르게 재검증한다.

여기서 핵심은 원본 TaskSpec을 건드리지 않고 딥카피 후 명령어만 교체하는 부분이다.

파이썬은 메모리 참조 구조이므로 값을 대입하면 원본 값이 바뀌어 버린다. 그렇기에 원본 보존을 위해 딥카피 처리하였다.(JAVA랑 다르게 좀 귀찮은 부분이다..)

🗨 어.. 근데 다시 검증할꺼면 verfier_node로 안보내고 굳이 rerun_verifier_node을 만든 이유?

이건 전체 무결성을 확정짓는게 아니라 LLM이 지정한 단축부분검증만 숏컷으로 필요하기에 원본을 복사해놓고 부분적으로 돌린것이다 verfier_node 는 정석대로 전체 검증을 하는 녀석이다. 전체를 돌리기때문에 수십초 걸린다.. 굳이코드는 맞는데 빌드 명령어만 바꾸면 되는경우 그럴필요있나 ㅋㅋ (우리 시간은 소중함) 그렇기에 LLM이 지정한 명령어만 재빠르게 실행해서 검증 사이클 속도를 높이도록 만들었다.

샘플코드를 만들어 돌려보자

잘된다

-끝-

'web > AI' 카테고리의 다른 글

[AI] AI 파이프라인 설계하기(1) (0)	2026.03.26
[AI] MCP로 starter kit 만들기 (0)	2026.03.26
[AI] Sample App(Backend) 만들기.. (1)	2026.02.05
[DataLake] 파이프라인 고도화(1) (0)	2026.01.25
[DataLake] 데이터 파이프라인 만들기 (0)	2026.01.15

포도젤리는 행복해 🐾

[AI] AI 파이프라인 설계하기(2)

'web > AI' 카테고리의 다른 글

티스토리툴바

[AI] AI 파이프라인 설계하기(2)

'web > AI' 카테고리의 다른 글

관련글

티스토리툴바