Documentation Index Fetch the complete documentation index at: https://docs.samuraiapi.in/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint
POST https://www.samuraiapi.in/v1/chat/completions
Request Parameters
Parameter Type Required Default Description modelstring ✅ — Model ID (e.g. gpt-4o, claude-3-5-sonnet-20241022) messagesarray ✅ — Conversation history with role + content temperaturenumber — 1Creativity: 0 = deterministic, 2 = very creative max_tokensinteger — model default Max tokens to generate streamboolean — falseStream partial tokens via SSE top_pnumber — 1Nucleus sampling threshold frequency_penaltynumber — 0Reduce repetition. Range: -2.0 to 2.0 presence_penaltynumber — 0Encourage new topics. Range: -2.0 to 2.0 stopstring/array — — Up to 4 stop sequences ninteger — 1Number of completions to return userstring — — Your end-user ID for monitoring
Code Examples
from openai import OpenAI
client = OpenAI(
api_key = "sk-samurai-YOUR_KEY" ,
base_url = "https://www.samuraiapi.in/v1"
)
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain quantum entanglement in simple terms." }
],
temperature = 0.7 ,
max_tokens = 300
)
print (response.choices[ 0 ].message.content)
print ( f "Used { response.usage.total_tokens } tokens" )
{
"id" : "chatcmpl-abc123xyz" ,
"object" : "chat.completion" ,
"created" : 1715000000 ,
"model" : "gpt-4o" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Quantum entanglement is like having two magic coins..."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 28 ,
"completion_tokens" : 145 ,
"total_tokens" : 173
}
}
Multi-turn Conversations
Maintain context by including the full conversation history:
messages = [{ "role" : "system" , "content" : "You are a helpful assistant." }]
# Turn 1
messages.append({ "role" : "user" , "content" : "What is the capital of Japan?" })
r = client.chat.completions.create( model = "gpt-4o" , messages = messages)
reply = r.choices[ 0 ].message.content
messages.append({ "role" : "assistant" , "content" : reply})
# Turn 2 — model remembers the context
messages.append({ "role" : "user" , "content" : "What is its population?" })
r = client.chat.completions.create( model = "gpt-4o" , messages = messages)
print (r.choices[ 0 ].message.content)
# => "Tokyo has a population of approximately 13.9 million in the city proper..."
Try It Live
Interactive Playground Test the chat API directly in your browser with your API key.
Popular Models for Chat
Model Best For Input $/1M Output $/1M gpt-4oGeneral purpose, vision $1.25 $5.00 gpt-4o-miniFast, cheap, great quality $0.075 $0.30 gpt-4.1Long context (1M tokens) $1.00 $4.00 claude-3-5-sonnet-20241022Coding, reasoning $1.50 $7.50 claude-3-5-haiku-20241022Fast Anthropic model $0.40 $2.00 gemini-2.5-flash-preview-05-20Fastest Google model $0.075 $0.30 deepseek-chatUltra cheap, smart $0.007 $0.014 llama-3.3-70b-instructBest open-source $0.05 $0.16
Endpoint
POST https://api.samuraiapi.in/v1/chat/completions
Request Body
Parameter Type Required Description modelstring ✅ Model ID (e.g. gpt-4o, claude-3-5-sonnet-20241022) messagesarray ✅ Array of message objects with role and content temperaturenumber — Sampling temperature 0–2. Default: 1 max_tokensinteger — Maximum tokens to generate streamboolean — Enable streaming. Default: false top_pnumber — Nucleus sampling. Default: 1 frequency_penaltynumber — Penalize frequent tokens (-2 to 2) presence_penaltynumber — Penalize new topics (-2 to 2) stopstring/array — Stop sequences ninteger — Number of completions to generate userstring — Unique user identifier for abuse monitoring
Message Roles
Role Description systemSets the assistant’s behavior and persona userMessages from the human user assistantPrevious assistant responses (for multi-turn)
Code Examples
from openai import OpenAI
client = OpenAI(
api_key = "sk-samurai-YOUR_KEY" ,
base_url = "https://api.samuraiapi.in/v1"
)
response = client.chat.completions.create(
model = "gpt-4o" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Explain quantum computing in simple terms." }
],
temperature = 0.7 ,
max_tokens = 500
)
print (response.choices[ 0 ].message.content)
Example Response
{
"id" : "chatcmpl-abc123" ,
"object" : "chat.completion" ,
"created" : 1710000000 ,
"model" : "gpt-4o" ,
"choices" : [
{
"index" : 0 ,
"message" : {
"role" : "assistant" ,
"content" : "Quantum computing uses quantum bits (qubits)..."
},
"finish_reason" : "stop"
}
],
"usage" : {
"prompt_tokens" : 32 ,
"completion_tokens" : 150 ,
"total_tokens" : 182
}
}
Multi-turn Conversations
Pass previous messages to maintain context:
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." }
]
# First turn
messages.append({ "role" : "user" , "content" : "What is the capital of France?" })
response = client.chat.completions.create( model = "gpt-4o" , messages = messages)
assistant_reply = response.choices[ 0 ].message.content
messages.append({ "role" : "assistant" , "content" : assistant_reply})
# Second turn
messages.append({ "role" : "user" , "content" : "What is its population?" })
response = client.chat.completions.create( model = "gpt-4o" , messages = messages)
print (response.choices[ 0 ].message.content)
Popular Models
Model Provider Context Input $/1M Output $/1M gpt-4oOpenAI 128K $1.25 $5.00 gpt-4o-miniOpenAI 128K $0.075 $0.30 claude-3-5-sonnet-20241022Anthropic 200K $1.50 $7.50 gemini-2.0-flashGoogle 1M $0.05 $0.20 deepseek-chatDeepSeek 64K $0.007 $0.014 llama-3.3-70b-instructMeta 131K $0.05 $0.16