Models in LangChain

Large Language Models (LLMs) are powerful AI systems that can understand and generate human-like text. They are flexible enough to handle tasks such as writing content, translating languages, summarizing information, and answering questions—without requiring separate training for each specific task.

In addition to basic text generation, modern LLMs also support several advanced capabilities:

Tool calling: The model can interact with external tools, such as databases or APIs, and use the retrieved results in its responses.
Structured output: The responses can be generated in a predefined format, which is useful when consistency or machine-readable output is required.
Multimodality: The model can process and generate data beyond text, including images, audio, and video.
Reasoning: The model can perform multi-step thinking to analyze problems and arrive at a logical conclusion.

Models act as the core reasoning engine of agents. They guide how an agent makes decisions—deciding which tools to use, how to interpret the results obtained, and when to return a final response.

The choice of model plays a crucial role in determining the overall reliability and performance of an agent. Different models are optimized for different purposes: some handle complex instructions more effectively, others are stronger in structured reasoning, and some support larger context windows, allowing them to process and retain more information at once.

LangChain’s standard model interfaces give you access to many different provider integrations, which makes it easy to experiment with and switch between models to find the best fit for your use case.

Basic usage

Models can be utilized in two ways:

With agents - Models can be dynamically specified when creating an agent.
Standalone - Models can be called directly (outside of the agent loop) for tasks like text generation, classification, or extraction without the need for an agent framework. The same model interface works in both contexts, which gives you the flexibility to start simple and scale up to more complex agent-based workflows as needed.

LangChain provides support for all major model providers through separate integration packages. Each of these packages follows a common interface, which means you can switch between providers without changing your application logic.

Another important aspect is that new model names can be used immediately, without waiting for a LangChain update. This is possible because the provider packages simply forward the model name directly to the provider’s API, rather than relying on predefined mappings within LangChain itself.

Initialize a model

The easiest way to get started with a standalone model in LangChain is to use init_chat_model to initialize one from a chat model provider of your choice Following is an example of LangChain with OpenAI:

pip install -U "langchain[openai]"

init_chat_model
import os
from langchain.chat_models import init_chat_model

os.environ["OPENAI_API_KEY"] = "sk-..."

model = init_chat_model("gpt-5.4")

This is a generic factory method provided by LangChain.
It acts as a wrapper/abstraction layer, so you don’t explicitly tie your code to one provider.
Based on configuration (model name, environment, defaults), it decides which underlying provider to use.
Useful when:
- You want flexibility to switch providers later (e.g., OpenAI → Anthropic) with minimal code changes.
- You are building framework-level or reusable code.

Model Class
import os
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = "sk-..."

model = ChatOpenAI(model="gpt-5.4")

This is a provider-specific implementation (OpenAI only).
You are explicitly saying: “Use OpenAI’s chat model.”
Gives you direct control over OpenAI-specific parameters and behavior.
Useful when:
- You are certain you will use OpenAI.
- You want access to provider-specific features (fine-tuning params, temperature control, etc.).

response = model.invoke("Why do parrots talk?")

Key methods

Invoke: The model takes messages as input and outputs messages after generating a complete response.
Stream: Invoke the model, but stream the output as it is generated in real-time.
Batch: Send multiple requests to a model in a batch for more efficient processing.

Parameters

A chat model takes parameters that can be used to configure its behavior. The full set of supported parameters varies by model and provider, but standard ones include:

model: It is of type string and it is required. The name or identifier of the specific model you want to use with a provider. You can also specify both the model and its provider in a single argument using the ’:’ format, for example, ‘openai:o1’.
api_key: It is of type string. The key required for authenticating with the model’s provider. This is usually issued when you sign up for access to the model. Often accessed by setting an environment variable.
temperature: It is of type number. Controls the randomness of the model’s output. A higher number makes responses more creative; lower ones make them more deterministic.
max_tokens: It is of type number. Limits the total number of tokens in the response, effectively controlling how long the output can be.
timeout: It is of type number. The maximum time (in seconds) to wait for a response from the model before canceling the request.
max_retries: It is of type number. The default value is "6". The maximum number of attempts the system will make to resend a request if it fails due to issues like network timeouts or rate limits. Retries use exponential backoff with jitter. Network errors, rate limits (429), and server errors (5xx) are retried automatically. Client errors such as 401 (unauthorized) or 404 are not retried. For long-running agent tasks on unreliable networks, consider increasing this to 10–15.

Using init_chat_model, pass these parameters as inline **kwargs:

model = init_chat_model(
    "claude-sonnet-4-6",
    # Kwargs passed to the model:
    temperature=0.7,
    timeout=30,
    max_tokens=1000,
    max_retries=6,  # Default; increase for unreliable networks
)

Each chat model integration may have additional params used to control provider-specific functionality. For example, ChatOpenAI has use_responses_api to dictate whether to use the OpenAI Responses or Completions API.

Invocation

A chat model must be invoked to generate an output. There are three primary invocation methods, each suited to different use cases.

Invoke

The most straightforward way to call a model is to use invoke() with a single message or a list of messages.

response = model.invoke("What is LangChain")
print(response)

A list of messages can be provided to a chat model to represent conversation history. Each message has a role that models use to indicate who sent the message in the conversation.

Dictionary format
conversation = [
    {"role": "system", "content": "You are a helpful assistant that translates English to French."},
    {"role": "user", "content": "Translate: I love programming."},
    {"role": "assistant", "content": "J'adore la programmation."},
    {"role": "user", "content": "Translate: I love building applications."}
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

Message objects
from langchain.messages import HumanMessage, AIMessage, SystemMessage

conversation = [
    SystemMessage("You are a helpful assistant that translates English to French."),
    HumanMessage("Translate: I love programming."),
    AIMessage("J'adore la programmation."),
    HumanMessage("Translate: I love building applications.")
]

response = model.invoke(conversation)
print(response)  # AIMessage("J'adore créer des applications.")

Stream

Most models can stream their output content while it is being generated. By displaying output progressively, streaming significantly improves user experience, particularly for longer responses. Calling stream() returns an iterator that yields output chunks as they are produced. You can use a loop to process each chunk in real-time:

Basic text streaming
for chunk in model.stream("Why do parrots have colorful feathers?"):
    print(chunk.text, end="|", flush=True)

Stream tool calls, reasoning, and other content
for chunk in model.stream("What color is the sky?"):
    for block in chunk.content_blocks:
        if block["type"] == "reasoning" and (reasoning := block.get("reasoning")):
            print(f"Reasoning: {reasoning}")
        elif block["type"] == "tool_call_chunk":
            print(f"Tool call chunk: {block}")
        elif block["type"] == "text":
            print(block["text"])
        else:
            ...

As opposed to invoke(), which returns a single AIMessage after the model has finished generating its full response, stream() returns multiple AIMessageChunk objects, each containing a portion of the output text. Importantly, each chunk in a stream is designed to be gathered into a full message via summation:

Construct an AIMessage
full = None  # None | AIMessageChunk
for chunk in model.stream("What color is the sky?"):
    full = chunk if full is None else full + chunk
    print(full.text)

# The
# The sky
# The sky is
# The sky is typically
# The sky is typically blue
# ...

print(full.content_blocks)
# [{"type": "text", "text": "The sky is typically blue..."}]

The resulting message can be treated the same as a message that was generated with invoke()—for example, it can be aggregated into a message history and passed back to the model as conversational context.

Streaming only works if all steps in the program know how to process a stream of chunks. For instance, an application that isn’t streaming-capable would be one that needs to store the entire output in memory before it can be processed.

Batch

Batching a collection of independent requests to a model can significantly improve performance and reduce costs, as the processing can be done in parallel:

responses = model.batch([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
])
for response in responses:
    print(response)

By default, batch() will only return the final output for the entire batch. If you want to receive the output for each individual input as it finishes generating, you can stream results with batch_as_completed():

Yield batch responses upon completion
for response in model.batch_as_completed([
    "Why do parrots have colorful feathers?",
    "How do airplanes fly?",
    "What is quantum computing?"
]):
    print(response)

When using batch_as_completed(), results may arrive out of order. Each includes the input index for matching to reconstruct the original order as needed.

When processing a large number of inputs using batch() or batch_as_completed(), you may want to control the maximum number of parallel calls. This can be done by setting the max_concurrency attribute in the RunnableConfig dictionary.

Batch with max concurrency
model.batch(
    list_of_inputs,
    config={
        'max_concurrency': 5,  # Limit to 5 parallel calls
    }
)

Structured output

Models can be requested to provide their response in a format matching a given schema. This is useful for ensuring the output can be easily parsed and used in subsequent processing. LangChain supports multiple schema types and methods for enforcing structured output.

Here is an example with Pydantic. Pydantic models provide the richest feature set with field validation, descriptions, and nested structures.

from pydantic import BaseModel, Field

class Movie(BaseModel):
    """A movie with details."""
    title: str = Field(description="The title of the movie")
    year: int = Field(description="The year the movie was released")
    director: str = Field(description="The director of the movie")
    rating: float = Field(description="The movie's rating out of 10")

model_with_structure = model.with_structured_output(Movie)
response = model_with_structure.invoke("Provide details about the movie Inception")
print(response)  # Movie(title="Inception", year=2010, director="Christopher Nolan", rating=8.8)

Python’s TypedDict and JSON Schema provides a simpler alternative to Pydantic models, ideal when you don’t need runtime validation.

Multimodal

Certain models can process and return non-textual data such as images, audio, and video. You can pass non-textual data to a model by providing content blocks.

Some models can return multimodal data as part of their response. If invoked to do so, the resulting AIMessage will have content blocks with multimodal types.

response = model.invoke("Create a picture of a cat")
print(response.content_blocks)
# [
#     {"type": "text", "text": "Here's a picture of a cat"},
#     {"type": "image", "base64": "...", "mime_type": "image/jpeg"},
# ]

Reasoning

Many models are capable of performing multi-step reasoning to arrive at a conclusion. This involves breaking down complex problems into smaller, more manageable steps.

If supported by the underlying model, you can surface this reasoning process to better understand how the model arrived at its final answer.

Stream reasoning output
for chunk in model.stream("Why do parrots have colorful feathers?"):
    reasoning_steps = [r for r in chunk.content_blocks if r["type"] == "reasoning"]
    print(reasoning_steps if reasoning_steps else chunk.text)

Complete reasoning output
response = model.invoke("Why do parrots have colorful feathers?")
reasoning_steps = [b for b in response.content_blocks if b["type"] == "reasoning"]
print(" ".join(step["reasoning"] for step in reasoning_steps))

Basic usage​

Initialize a model​

Key methods​

Parameters​

Invocation​

Invoke​

Stream​

Batch​

Structured output​

Multimodal​

Reasoning​