728x90
- 참고 블로그 : https://recording-it.tistory.com/115
- Docs : https://github.com/ollama/ollama
Agent 파이프라인을 띄우려면 Endpoint로 접근할 수 있는 API가 필요
외부 API를 쓰면 돈이드니까, Ollama를 이용해 로컬에 모델을 띄운다
Ollama란 ?
로컬 PC에서 대형 언어 모델(LLM)을 “설치·실행·관리”까지 한 번에 해주는 런타임 도구
모델 파일을 가지고 실행시켜주는 엔진이라고 생각하면 될듯 GGUF(가중치, 토크나이저, 메타데이터 포함)나 Safetensors(가중치 정보만 보유)를 Import해서 구동시킬 수 있다고 한다
- 링크를 통해 다운로드 https://ollama.com/download/mac
- 모델 다운로드 및 실행
ollama pull gpt-oss:20b
ollama run gpt-oss:20b
API 사용
ollama는 HTTP를 통해 접근할 수 있는 RESTful API를 제공합니다. 기본적으로 API 서버는 http://localhost:11434에서 실행됩니다.
✔️ 주요 API 엔드포인트
- 생성 (Generate): POST /api/generate - 텍스트 생성
- 채팅 (Chat): POST /api/chat - 대화형 응답
- 임베딩 (Embeddings): POST /api/embeddings - 텍스트의 벡터 표현을 생성
- 모델 목록 (List Models): GET /api/tags - 사용 가능한 모델 목록을 반환
- 모델 생성 (Create Model): POST /api/create - 새로운 모델을 생성하거나 기존 모델을 수정
- 모델 삭제 (Delete Model): DELETE /api/delete - 모델 삭제
채팅 예시
request
curl <http://localhost:11434/api/chat> -d '{
"model": "gpt-oss:20b",
"messages": [
{ "role": "user", "content": "Hello, how are you?" }
]
}'
response
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.36044Z","message":{"role":"assistant","content":"","thinking":"The"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.394353Z","message":{"role":"assistant","content":"","thinking":" user"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.427897Z","message":{"role":"assistant","content":"","thinking":" gre"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.461904Z","message":{"role":"assistant","content":"","thinking":"ets"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.496616Z","message":{"role":"assistant","content":"","thinking":"."},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.530844Z","message":{"role":"assistant","content":"","thinking":" We"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.565671Z","message":{"role":"assistant","content":"","thinking":" respond"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.599426Z","message":{"role":"assistant","content":"","thinking":" in"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.634303Z","message":{"role":"assistant","content":"","thinking":" friendly"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.670811Z","message":{"role":"assistant","content":"","thinking":" tone"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.703562Z","message":{"role":"assistant","content":"","thinking":"."},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.937794Z","message":{"role":"assistant","content":"Hello"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:13.970307Z","message":{"role":"assistant","content":"!"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.003039Z","message":{"role":"assistant","content":" I'm"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.035902Z","message":{"role":"assistant","content":" doing"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.069329Z","message":{"role":"assistant","content":" great"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.102949Z","message":{"role":"assistant","content":"—"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.136839Z","message":{"role":"assistant","content":"thanks"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.170898Z","message":{"role":"assistant","content":" for"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.204661Z","message":{"role":"assistant","content":" asking"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.237437Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.269681Z","message":{"role":"assistant","content":" How"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.303046Z","message":{"role":"assistant","content":" about"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.338634Z","message":{"role":"assistant","content":" you"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.373138Z","message":{"role":"assistant","content":"?"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.408121Z","message":{"role":"assistant","content":" What"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.438736Z","message":{"role":"assistant","content":"’s"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.47084Z","message":{"role":"assistant","content":" on"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.503129Z","message":{"role":"assistant","content":" your"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.535386Z","message":{"role":"assistant","content":" mind"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.568999Z","message":{"role":"assistant","content":" today"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.601544Z","message":{"role":"assistant","content":"?"},"done":false}
{"model":"gpt-oss:20b","created_at":"2026-01-05T07:01:14.634607Z","message":{"role":"assistant","content":""},"done":true,"done_reason":"stop","total_duration":9847549875,"load_duration":6422573167,"prompt_eval_count":73,"prompt_eval_duration":2001336166,"eval_count":42,"eval_duration":1362484163}
참고: 모델의 시스템 프롬프트 변경하는법
맥북프로에서 띄우기엔 성능상 2~4b정도가 적당한 것 같아서 모델 변경
- Create Modelfile
FROM qwen3-vl:4b
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.2
# set the system message
SYSTEM """
You are a helpful assistant.
Respond clearly and concisely.
Do not reveal your internal reasoning.
"""
- 실행
# 모델파일로부터 모델 생성
ollama create hello -f ./Modelfile
ollama run hello
LangChain 결합
Ollama 오픈 모델을 사용하면 자꾸 추론 과정을 함께 출력하더라 ;; (gpt-oss, llama, qwen 동일)
그래서 텍스트 응답값을 후처리하는 과정을 넣어주었다.
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.chat_models import ChatOllama
template = """{question}"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOllama(
model="hello:latest",
base_url="http://localhost:11434")
chain = prompt | model
response = chain.invoke({"question": "안녕. 오늘날씨 어때?"})
# qwen모델 응답값 후처리
text = response.content
if "</think>" in text:
text = text.split("</think>")[-1].strip()
print('####')
print(text)728x90
'AI' 카테고리의 다른 글
| FastAPI, LLM 연동하기 (2) - FastAPI 사용 가이드 (0) | 2025.10.14 |
|---|---|
| MCP & A2A (0) | 2025.10.14 |
| FastAPI, LLM 연동하기 (1) - FastAPI 사용 가이드 (0) | 2025.10.13 |