Understanding Tenacity: Elegant and Simple Automatic Retries with a Single Line of Code
When interacting with AI model API services, we face an unavoidable reality: networks are not always reliable. Proxies can fail, APIs can rate-limit requests, connections can time out, and networks can experience brief outages. Fortunately, these issues are often temporary. If the first request fails, waiting a moment and trying again often leads to success.
This "try again" strategy is known as a retry. It's not a sophisticated technique, but it is a crucial part of building reliable and robust applications.
It All Starts with a Simple API Call
Let's begin with a real-world scenario: calling an AI model's API to translate subtitles. A basic piece of code might look like this:
# A basic API call function
from openai import OpenAI, APIConnectionError
def translate_text(text: str) -> str:
message = [
{'role': 'system', 'content': 'You are a top-tier subtitle translation engine.'},
{'role': 'user', 'content': f'<INPUT>{text}</INPUT>'},
]
model = OpenAI(api_key="YOUR_API_KEY", base_url="...")
try:
response = model.chat.completions.create(model="gpt-4o", messages=message)
if response.choices:
return response.choices[0].message.content.strip()
raise RuntimeError("API did not return a valid result")
except APIConnectionError as e:
print(f"Network connection failed: {e}. A retry is needed...")
raise # The program crashes here
except Exception as e:
print(f"An other error occurred: {e}")
raise
This code works, but it's "fragile." As soon as it encounters a network issue, it just prints a message and crashes. We could, of course, manually write a for
loop and time.sleep
to implement retries:
# Manually implementing retries
for attempt in range(3):
try:
# ... API call logic ...
return response.choices[0].message.content.strip()
except APIConnectionError as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt == 2: # Check if it's the last attempt
raise
# ... have to write similar logic for other exceptions ...
This approach quickly makes the code complex and messy. The retry logic is entangled with the business logic, and if we need retries in multiple places, we're forced to write a lot of repetitive, error-prone code.
This is where the tenacity
library comes in.
Getting Started with Tenacity: Elegant Retries with a Single Line of Code
tenacity
is a general-purpose retrying library designed for Python. Its core philosophy is to add retry capabilities to any failing operation in a simple and clean way.
We can easily refactor the above function with the @retry
decorator:
from tenacity import retry
@retry
def translate_text(text: str) -> str:
# ... the internal logic is exactly the same as before, no changes needed ...
Just by adding a single line, @retry
, the function is transformed. Now, if the translate_text
function throws any exception, tenacity
will automatically catch it and immediately call the function again. It will keep retrying, forever, until the function successfully returns a value.
Fine-Grained Control: Making Retries Behave as We Wish
"Retrying forever" is usually not what we want. We need to set some boundaries. tenacity
provides a rich set of parameters for fine-grained control.
1. Setting Stop Conditions (stop
)
We don't want to retry indefinitely. The most common requirement is to "try at most N times," which can be achieved with stop_after_attempt
.
from tenacity import retry, stop_after_attempt
# Try a total of 3 times
@retry(stop=stop_after_attempt(3))
def translate_text(text: str) -> str:
# ...
Note: An Important Detail to Understand
stop_after_attempt(N)
refers to the total number of attempts, not the "number of retries."
stop_after_attempt(1)
means: execute once, and if it fails, stop immediately. It won't retry at all.stop_after_attempt(3)
means: execute a total of 3 times, which is the initial attempt + 2 retries.Remember this simple rule: If you want to retry
Y
additional times after a failure, you should setstop_after_attempt(Y + 1)
.
We can also set a time-based limit, like stop_after_delay(10)
which means "stop after 10 seconds." Even better, you can combine them with the |
(or) operator, and the retrying will stop as soon as either condition is met.
from tenacity import stop_after_delay
# Stop when the total number of attempts reaches 5 or the total time exceeds 30 seconds
@retry(stop=(stop_after_attempt(5) | stop_after_delay(30)))
def translate_text(text: str) -> str:
# ...
2. Setting a Wait Strategy (wait
)
Retrying rapidly and continuously can overwhelm a server or hit rate limits. Adding a wait between retries is a smart move. The simplest is a fixed wait, using wait_fixed
:
from tenacity import retry, wait_fixed
# Wait 2 seconds before each retry
@retry(wait=wait_fixed(2))
def translate_text(text: str) -> str:
# ...
When interacting with network services, exponential backoff (wait_exponential
) is highly recommended. It progressively increases the wait time with each retry (e.g., 2s, 4s, 8s...), which effectively prevents a "retry storm" during service peak times.
from tenacity import wait_exponential
# Wait 2^1=2s for the first retry, then 4s, 8s... up to a maximum of 10s
@retry(wait=wait_exponential(multiplier=1, min=2, max=10))
def translate_text(text: str) -> str:
# ...
3. Deciding When to Retry (retry
)
By default, tenacity
retries on any exception. This isn't always correct.
For instance, APIConnectionError
(network issue) or RateLimitError
(too many requests) are typical recoverable errors where a retry is likely to succeed. But AuthenticationError
(bad key) or PermissionDeniedError
(no permission) are fatal errors where any number of retries are doomed to fail.
We can use retry_if_not_exception_type
to tell tenacity
not to retry on certain fatal errors.
Note: A Common Syntax Pitfall When specifying multiple exception types, you might instinctively write
AuthenticationError | PermissionDeniedError
.python# WRONG! This will not work as expected @retry(retry=retry_if_not_exception_type(AuthenticationError | PermissionDeniedError))
In modern Python,
A | B
creates aUnionType
object, but thistenacity
function expects a tuple of exception types.The correct way is:
pythonfrom openai import AuthenticationError, PermissionDeniedError # CORRECT! Use a tuple @retry(retry=retry_if_not_exception_type((AuthenticationError, PermissionDeniedError)))
Those small parentheses are crucial.
When Retries Ultimately Fail
If tenacity
exhausts all attempts and still fails, what does it do? By default, it raises a RetryError
which contains the original exception from the last failed attempt.
But sometimes we don't want the program to crash. Instead, we might want to perform some custom cleanup, like logging a detailed error and returning a user-friendly error message. This is where retry_error_callback
comes in.
from tenacity import RetryCallState
def my_error_callback(retry_state: RetryCallState):
# The retry_state object contains all information about this retry attempt
print(f"All {retry_state.attempt_number} attempts failed!")
return "Default translation result or error message"
@retry(stop=stop_after_attempt(3), retry_error_callback=my_error_callback)
def translate_text(text: str) -> str:
# ...
Now, if the function fails 3 times in a row, it won't raise an exception. Instead, it will return the return value of the my_error_callback
function.
Note: A Subtle Pitfall in the Callback Function How can we safely get the last exception information inside the callback?
pythondef return_last_value(retry_state: RetryCallState): # DANGEROUS! This will re-raise the exception! return "Failed: " + retry_state.outcome.result()
retry_state.outcome
represents the result of the last attempt. If that attempt failed, calling the.result()
method will re-raise that exception, causing our callback function itself to crash.The correct approach is to use the
.exception()
method, which safely returns the exception object without raising it:pythondef return_last_value(retry_state: RetryCallState): # SAFE! This only returns the exception object last_exception = retry_state.outcome.exception() return f"Failed after {retry_state.attempt_number} attempts. The last error was: {last_exception}"
When Tenacity Meets Object-Oriented Programming
As a codebase grows, we often encapsulate logic within classes. This introduces two deeper problems: scope and inheritance.
1. How Can the Callback Function Access self
?
Suppose our callback function needs to access an instance variable of the class (like self.name
). We might naturally write it like this:
class TTS:
def __init__(self, name):
self.name = name
def _my_callback(self, retry_state):
print(f"Task for instance {self.name} has failed.")
# ...
# This will fail! NameError: name 'self' is not defined
@retry(retry_error_callback=self._my_callback)
def run(self):
# ...
This will fail immediately because the @retry
decorator is executed when the class is defined, at which point there are no instances of the class, and thus no self
.
The most elegant solution is the "inner function closure" pattern. We apply the decorator to a function defined inside the instance method:
class TTS:
def __init__(self, name):
self.name = name
def run(self):
# Here, self is available!
@retry(
# Because the decorator is inside the run method, it can "capture" self
retry_error_callback=self._my_callback
)
def _execute_task():
# This is where the logic that needs retrying goes
print(f"Executing task for {self.name}...")
raise ValueError("Task failed")
# Call the decorated inner function
return _execute_task()
def _my_callback(self, retry_state: RetryCallState):
# ...
This is a very powerful and Pythonic pattern that perfectly solves the scope issue.
2. How to Define a Retry Policy in a Parent Class for All Subclasses to Inherit?
This is our final and most design-oriented question. Suppose we have a BaseProvider
parent class and multiple subclasses MyProviderA
, MyProviderB
. We want all subclasses to adhere to a uniform retry policy.
A common incorrect idea is to apply the decorator to an empty method in the parent class. When a subclass overrides that method, the decorator from the parent is lost.
The correct solution is the Template Method Pattern.
- The parent class defines a template method (
_exec
), which contains the immutable algorithm skeleton (i.e., our retry logic). - This template method calls an abstract hook method (
_do_work
). - Subclasses only need to implement this hook method, filling in the specific business logic.
Let's build this pattern with a more complete example:
from openai import OpenAI, AuthenticationError, PermissionDeniedError
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_not_exception_type, RetryCallState
# 1. Define a generic, reusable exception handler class
class RetryRaise:
# Define fatal exceptions that should not be retried
NO_RETRY_EXCEPT = (AuthenticationError, PermissionDeniedError)
@classmethod
def _raise(cls, retry_state: RetryCallState):
ex = retry_state.outcome.exception()
if ex:
# Based on different exception types, log and raise a custom, more friendly RuntimeError
# ... more complex exception classification logic can be added here ...
raise RuntimeError(f"Ultimately failed after {retry_state.attempt_number} retries: {ex}") from ex
raise RuntimeError(f"Failed after {retry_state.attempt_number} retries, but no exception was captured.")
# 2. Implement the template parent class
class BaseProvider:
@retry(
stop=stop_after_attempt(3),
wait=wait_fixed(2),
retry=retry_if_not_exception_type(RetryRaise.NO_RETRY_EXCEPT),
retry_error_callback=RetryRaise._raise
)
def _exec(self) -> str:
"""This is the template method, responsible for retries. Subclasses should not override it."""
# Call the hook method, to be implemented by subclasses
return self._do_work()
def _do_work(self) -> str:
"""This is the hook method, subclasses must implement it."""
raise NotImplementedError("Subclasses must implement the _do_work method")
# 3. Implement a concrete subclass
class DeepSeekProvider(BaseProvider):
def __init__(self, api_key: str, base_url: str):
self.api_key = api_key
self.base_url = base_url
self.model = OpenAI(api_key=self.api_key, base_url=self.base_url)
def _do_work(self) -> str:
"""This only cares about the core business logic, with no concern for retries."""
response = self.model.chat.completions.create(
model="deepseek-chat",
messages=[{'role': 'user', 'content': 'Who are you?'}]
)
if response.choices:
return response.choices[0].message.content.strip()
raise RuntimeError(f"API did not return a valid result: {response}")
# --- How to use ---
provider = DeepSeekProvider(api_key="...", base_url="...")
try:
# We call _exec, which contains the retry logic
result = provider._exec()
print("Execution successful:", result)
except RuntimeError as e:
# If it ultimately fails, we catch the friendly exception raised by RetryRaise
print("Execution failed:", e)
Through this approach, we have perfectly separated the retry policy (the invariant part) from the business logic (the variant part), building a framework that is both robust and easy to extend.
tenacity
is a library that seems simple on the surface but is incredibly powerful in practice. It not only handles simple retry scenarios with ease but can also solve complex reliability issues in object-oriented applications through clever design patterns.
For more complete usage instructions, please see the official documentation at http://tenacity.readthedocs.io/