Skip to content

Understanding Tenacity: Elegant and Simple Automatic Retries with a Single Line of Code

When interacting with AI model API services, we face an unavoidable reality: networks are not always reliable. Proxies can fail, APIs can rate-limit requests, connections can time out, and networks can experience brief outages. Fortunately, these issues are often temporary. If the first request fails, waiting a moment and trying again often leads to success.

This "try again" strategy is known as a retry. It's not a sophisticated technique, but it is a crucial part of building reliable and robust applications.

It All Starts with a Simple API Call

Let's begin with a real-world scenario: calling an AI model's API to translate subtitles. A basic piece of code might look like this:

python
# A basic API call function
from openai import OpenAI, APIConnectionError

def translate_text(text: str) -> str:
    message = [
        {'role': 'system', 'content': 'You are a top-tier subtitle translation engine.'},
        {'role': 'user', 'content': f'<INPUT>{text}</INPUT>'},
    ]
    model = OpenAI(api_key="YOUR_API_KEY", base_url="...") 

    try:
        response = model.chat.completions.create(model="gpt-4o", messages=message)
        if response.choices:
            return response.choices[0].message.content.strip()
        raise RuntimeError("API did not return a valid result")
    except APIConnectionError as e:
        print(f"Network connection failed: {e}. A retry is needed...")
        raise # The program crashes here
    except Exception as e:
        print(f"An other error occurred: {e}")
        raise

This code works, but it's "fragile." As soon as it encounters a network issue, it just prints a message and crashes. We could, of course, manually write a for loop and time.sleep to implement retries:

python
# Manually implementing retries
for attempt in range(3):
    try:
        # ... API call logic ...
        return response.choices[0].message.content.strip()
    except APIConnectionError as e:
        print(f"Attempt {attempt + 1} failed: {e}")
        if attempt == 2: # Check if it's the last attempt
            raise
    # ... have to write similar logic for other exceptions ...

This approach quickly makes the code complex and messy. The retry logic is entangled with the business logic, and if we need retries in multiple places, we're forced to write a lot of repetitive, error-prone code.

This is where the tenacity library comes in.

Getting Started with Tenacity: Elegant Retries with a Single Line of Code

tenacity is a general-purpose retrying library designed for Python. Its core philosophy is to add retry capabilities to any failing operation in a simple and clean way.

We can easily refactor the above function with the @retry decorator:

python
from tenacity import retry

@retry
def translate_text(text: str) -> str:
    # ... the internal logic is exactly the same as before, no changes needed ...

Just by adding a single line, @retry, the function is transformed. Now, if the translate_text function throws any exception, tenacity will automatically catch it and immediately call the function again. It will keep retrying, forever, until the function successfully returns a value.

Fine-Grained Control: Making Retries Behave as We Wish

"Retrying forever" is usually not what we want. We need to set some boundaries. tenacity provides a rich set of parameters for fine-grained control.

1. Setting Stop Conditions (stop)

We don't want to retry indefinitely. The most common requirement is to "try at most N times," which can be achieved with stop_after_attempt.

python
from tenacity import retry, stop_after_attempt

# Try a total of 3 times
@retry(stop=stop_after_attempt(3))
def translate_text(text: str) -> str:
    # ...

Note: An Important Detail to Understandstop_after_attempt(N) refers to the total number of attempts, not the "number of retries."

  • stop_after_attempt(1) means: execute once, and if it fails, stop immediately. It won't retry at all.
  • stop_after_attempt(3) means: execute a total of 3 times, which is the initial attempt + 2 retries.

Remember this simple rule: If you want to retry Y additional times after a failure, you should set stop_after_attempt(Y + 1).

We can also set a time-based limit, like stop_after_delay(10) which means "stop after 10 seconds." Even better, you can combine them with the | (or) operator, and the retrying will stop as soon as either condition is met.

python
from tenacity import stop_after_delay

# Stop when the total number of attempts reaches 5 or the total time exceeds 30 seconds
@retry(stop=(stop_after_attempt(5) | stop_after_delay(30)))
def translate_text(text: str) -> str:
    # ...

2. Setting a Wait Strategy (wait)

Retrying rapidly and continuously can overwhelm a server or hit rate limits. Adding a wait between retries is a smart move. The simplest is a fixed wait, using wait_fixed:

python
from tenacity import retry, wait_fixed

# Wait 2 seconds before each retry
@retry(wait=wait_fixed(2))
def translate_text(text: str) -> str:
    # ...

When interacting with network services, exponential backoff (wait_exponential) is highly recommended. It progressively increases the wait time with each retry (e.g., 2s, 4s, 8s...), which effectively prevents a "retry storm" during service peak times.

python
from tenacity import wait_exponential

# Wait 2^1=2s for the first retry, then 4s, 8s... up to a maximum of 10s
@retry(wait=wait_exponential(multiplier=1, min=2, max=10))
def translate_text(text: str) -> str:
    # ...

3. Deciding When to Retry (retry)

By default, tenacity retries on any exception. This isn't always correct.

For instance, APIConnectionError (network issue) or RateLimitError (too many requests) are typical recoverable errors where a retry is likely to succeed. But AuthenticationError (bad key) or PermissionDeniedError (no permission) are fatal errors where any number of retries are doomed to fail.

We can use retry_if_not_exception_type to tell tenacity not to retry on certain fatal errors.

Note: A Common Syntax Pitfall When specifying multiple exception types, you might instinctively write AuthenticationError | PermissionDeniedError.

python
# WRONG! This will not work as expected
@retry(retry=retry_if_not_exception_type(AuthenticationError | PermissionDeniedError))

In modern Python, A | B creates a UnionType object, but this tenacity function expects a tuple of exception types.

The correct way is:

python
from openai import AuthenticationError, PermissionDeniedError

# CORRECT! Use a tuple
@retry(retry=retry_if_not_exception_type((AuthenticationError, PermissionDeniedError)))

Those small parentheses are crucial.

When Retries Ultimately Fail

If tenacity exhausts all attempts and still fails, what does it do? By default, it raises a RetryError which contains the original exception from the last failed attempt.

But sometimes we don't want the program to crash. Instead, we might want to perform some custom cleanup, like logging a detailed error and returning a user-friendly error message. This is where retry_error_callback comes in.

python
from tenacity import RetryCallState

def my_error_callback(retry_state: RetryCallState):
    # The retry_state object contains all information about this retry attempt
    print(f"All {retry_state.attempt_number} attempts failed!")
    return "Default translation result or error message"

@retry(stop=stop_after_attempt(3), retry_error_callback=my_error_callback)
def translate_text(text: str) -> str:
    # ...

Now, if the function fails 3 times in a row, it won't raise an exception. Instead, it will return the return value of the my_error_callback function.

Note: A Subtle Pitfall in the Callback Function How can we safely get the last exception information inside the callback?

python
def return_last_value(retry_state: RetryCallState):
    # DANGEROUS! This will re-raise the exception!
    return "Failed: " + retry_state.outcome.result()

retry_state.outcome represents the result of the last attempt. If that attempt failed, calling the .result() method will re-raise that exception, causing our callback function itself to crash.

The correct approach is to use the .exception() method, which safely returns the exception object without raising it:

python
def return_last_value(retry_state: RetryCallState):
    # SAFE! This only returns the exception object
    last_exception = retry_state.outcome.exception()
    return f"Failed after {retry_state.attempt_number} attempts. The last error was: {last_exception}"

When Tenacity Meets Object-Oriented Programming

As a codebase grows, we often encapsulate logic within classes. This introduces two deeper problems: scope and inheritance.

1. How Can the Callback Function Access self?

Suppose our callback function needs to access an instance variable of the class (like self.name). We might naturally write it like this:

python
class TTS:
    def __init__(self, name):
        self.name = name

    def _my_callback(self, retry_state):
        print(f"Task for instance {self.name} has failed.")
        # ...

    # This will fail! NameError: name 'self' is not defined
    @retry(retry_error_callback=self._my_callback)
    def run(self):
        # ...

This will fail immediately because the @retry decorator is executed when the class is defined, at which point there are no instances of the class, and thus no self.

The most elegant solution is the "inner function closure" pattern. We apply the decorator to a function defined inside the instance method:

python
class TTS:
    def __init__(self, name):
        self.name = name

    def run(self):
        # Here, self is available!
        @retry(
            # Because the decorator is inside the run method, it can "capture" self
            retry_error_callback=self._my_callback
        )
        def _execute_task():
            # This is where the logic that needs retrying goes
            print(f"Executing task for {self.name}...")
            raise ValueError("Task failed")

        # Call the decorated inner function
        return _execute_task()

    def _my_callback(self, retry_state: RetryCallState):
        # ...

This is a very powerful and Pythonic pattern that perfectly solves the scope issue.

2. How to Define a Retry Policy in a Parent Class for All Subclasses to Inherit?

This is our final and most design-oriented question. Suppose we have a BaseProvider parent class and multiple subclasses MyProviderA, MyProviderB. We want all subclasses to adhere to a uniform retry policy.

A common incorrect idea is to apply the decorator to an empty method in the parent class. When a subclass overrides that method, the decorator from the parent is lost.

The correct solution is the Template Method Pattern.

  • The parent class defines a template method (_exec), which contains the immutable algorithm skeleton (i.e., our retry logic).
  • This template method calls an abstract hook method (_do_work).
  • Subclasses only need to implement this hook method, filling in the specific business logic.

Let's build this pattern with a more complete example:

python
from openai import OpenAI, AuthenticationError, PermissionDeniedError
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_not_exception_type, RetryCallState

# 1. Define a generic, reusable exception handler class
class RetryRaise:
    # Define fatal exceptions that should not be retried
    NO_RETRY_EXCEPT = (AuthenticationError, PermissionDeniedError)

    @classmethod
    def _raise(cls, retry_state: RetryCallState):
        ex = retry_state.outcome.exception()
        if ex:
            # Based on different exception types, log and raise a custom, more friendly RuntimeError
            # ... more complex exception classification logic can be added here ...
            raise RuntimeError(f"Ultimately failed after {retry_state.attempt_number} retries: {ex}") from ex
        raise RuntimeError(f"Failed after {retry_state.attempt_number} retries, but no exception was captured.")

# 2. Implement the template parent class
class BaseProvider:
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_fixed(2),
        retry=retry_if_not_exception_type(RetryRaise.NO_RETRY_EXCEPT),
        retry_error_callback=RetryRaise._raise
    )
    def _exec(self) -> str:
        """This is the template method, responsible for retries. Subclasses should not override it."""
        # Call the hook method, to be implemented by subclasses
        return self._do_work()

    def _do_work(self) -> str:
        """This is the hook method, subclasses must implement it."""
        raise NotImplementedError("Subclasses must implement the _do_work method")

# 3. Implement a concrete subclass
class DeepSeekProvider(BaseProvider):
    def __init__(self, api_key: str, base_url: str):
        self.api_key = api_key
        self.base_url = base_url
        self.model = OpenAI(api_key=self.api_key, base_url=self.base_url)

    def _do_work(self) -> str:
        """This only cares about the core business logic, with no concern for retries."""
        response = self.model.chat.completions.create(
            model="deepseek-chat",
            messages=[{'role': 'user', 'content': 'Who are you?'}]
        )
        if response.choices:
            return response.choices[0].message.content.strip()
        raise RuntimeError(f"API did not return a valid result: {response}")

# --- How to use ---
provider = DeepSeekProvider(api_key="...", base_url="...")
try:
    # We call _exec, which contains the retry logic
    result = provider._exec()
    print("Execution successful:", result)
except RuntimeError as e:
    # If it ultimately fails, we catch the friendly exception raised by RetryRaise
    print("Execution failed:", e)

Through this approach, we have perfectly separated the retry policy (the invariant part) from the business logic (the variant part), building a framework that is both robust and easy to extend.


tenacity is a library that seems simple on the surface but is incredibly powerful in practice. It not only handles simple retry scenarios with ease but can also solve complex reliability issues in object-oriented applications through clever design patterns.

For more complete usage instructions, please see the official documentation at http://tenacity.readthedocs.io/