Skip to main content

Endpoint

POST https://www.samuraiapi.in/v1/chat/completions

Request Parameters

ParameterTypeRequiredDefaultDescription
modelstringModel ID (e.g. gpt-4o, claude-3-5-sonnet-20241022)
messagesarrayConversation history with role + content
temperaturenumber1Creativity: 0 = deterministic, 2 = very creative
max_tokensintegermodel defaultMax tokens to generate
streambooleanfalseStream partial tokens via SSE
top_pnumber1Nucleus sampling threshold
frequency_penaltynumber0Reduce repetition. Range: -2.0 to 2.0
presence_penaltynumber0Encourage new topics. Range: -2.0 to 2.0
stopstring/arrayUp to 4 stop sequences
ninteger1Number of completions to return
userstringYour end-user ID for monitoring

Code Examples

from openai import OpenAI

client = OpenAI(
    api_key="sk-samurai-YOUR_KEY",
    base_url="https://www.samuraiapi.in/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum entanglement in simple terms."}
    ],
    temperature=0.7,
    max_tokens=300
)

print(response.choices[0].message.content)
print(f"Used {response.usage.total_tokens} tokens")

Response Format

{
  "id": "chatcmpl-abc123xyz",
  "object": "chat.completion",
  "created": 1715000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is like having two magic coins..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 145,
    "total_tokens": 173
  }
}

Multi-turn Conversations

Maintain context by including the full conversation history:
messages = [{"role": "system", "content": "You are a helpful assistant."}]

# Turn 1
messages.append({"role": "user", "content": "What is the capital of Japan?"})
r = client.chat.completions.create(model="gpt-4o", messages=messages)
reply = r.choices[0].message.content
messages.append({"role": "assistant", "content": reply})

# Turn 2 — model remembers the context
messages.append({"role": "user", "content": "What is its population?"})
r = client.chat.completions.create(model="gpt-4o", messages=messages)
print(r.choices[0].message.content)
# => "Tokyo has a population of approximately 13.9 million in the city proper..."

Try It Live

Interactive Playground

Test the chat API directly in your browser with your API key.
ModelBest ForInput $/1MOutput $/1M
gpt-4oGeneral purpose, vision$1.25$5.00
gpt-4o-miniFast, cheap, great quality$0.075$0.30
gpt-4.1Long context (1M tokens)$1.00$4.00
claude-3-5-sonnet-20241022Coding, reasoning$1.50$7.50
claude-3-5-haiku-20241022Fast Anthropic model$0.40$2.00
gemini-2.5-flash-preview-05-20Fastest Google model$0.075$0.30
deepseek-chatUltra cheap, smart$0.007$0.014
llama-3.3-70b-instructBest open-source$0.05$0.16

Endpoint

POST https://api.samuraiapi.in/v1/chat/completions

Request Body

ParameterTypeRequiredDescription
modelstringModel ID (e.g. gpt-4o, claude-3-5-sonnet-20241022)
messagesarrayArray of message objects with role and content
temperaturenumberSampling temperature 0–2. Default: 1
max_tokensintegerMaximum tokens to generate
streambooleanEnable streaming. Default: false
top_pnumberNucleus sampling. Default: 1
frequency_penaltynumberPenalize frequent tokens (-2 to 2)
presence_penaltynumberPenalize new topics (-2 to 2)
stopstring/arrayStop sequences
nintegerNumber of completions to generate
userstringUnique user identifier for abuse monitoring

Message Roles

RoleDescription
systemSets the assistant’s behavior and persona
userMessages from the human user
assistantPrevious assistant responses (for multi-turn)

Code Examples

from openai import OpenAI

client = OpenAI(
    api_key="sk-samurai-YOUR_KEY",
    base_url="https://api.samuraiapi.in/v1"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Example Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits)..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 32,
    "completion_tokens": 150,
    "total_tokens": 182
  }
}

Multi-turn Conversations

Pass previous messages to maintain context:
messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

# First turn
messages.append({"role": "user", "content": "What is the capital of France?"})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
assistant_reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_reply})

# Second turn
messages.append({"role": "user", "content": "What is its population?"})
response = client.chat.completions.create(model="gpt-4o", messages=messages)
print(response.choices[0].message.content)
ModelProviderContextInput $/1MOutput $/1M
gpt-4oOpenAI128K$1.25$5.00
gpt-4o-miniOpenAI128K$0.075$0.30
claude-3-5-sonnet-20241022Anthropic200K$1.50$7.50
gemini-2.0-flashGoogle1M$0.05$0.20
deepseek-chatDeepSeek64K$0.007$0.014
llama-3.3-70b-instructMeta131K$0.05$0.16