In today's AI application development, high-quality speech recognition (ASR) technology is a core competitive advantage for many products. Especially for Chinese language scenarios, the FunASR project, open-sourced by Alibaba DAMO Academy, delivers excellent performance.
FunASR is not a single model but a comprehensive foundational speech recognition toolkit. It integrates powerful features including speech recognition (paraformer-zh/sensevoicesmall) and voice activity detection (VAD).
When using paraformer-zh and sensevoicesmall, you need to rely on the funasr and modelscope libraries. Although the models themselves are powerful, I encountered a tricky and confusing issue in offline environments or scenarios requiring stable deployment.
Core Issue: Why Does the local_files_only Parameter "Fail" in Offline Deployment?
To achieve true offline usage, it's natural to use the official local_files_only=True parameter. Its design purpose is to tell the program, "Use only locally cached models; do not access the network."
However, in practice, even when setting all conceivable "offline" parameters as shown below, the program still attempts to connect to the server in a network-free environment, ultimately causing failure.
# The ideal way to call
AutoModel(
model=model_name,
# ... other model parameters ...
hub='ms',
local_files_only=True, # Hoping this parameter would work
disable_update=True,
)What's more frustrating is that regardless of network timeouts, download failures, or other I/O issues, funasr only throws a generic error: paraformer-zh is not registered. This message does not help us identify the real cause—network connection attempts.
Digging into the Root Cause: A Broken Parameter Chain
By tracing the source code, we quickly found the problem. The issue isn't with modelscope but with the calling layer in funasr. When funasr calls modelscope's download function snapshot_download, it fails to pass down the crucial local_files_only parameter.
Evidence here: site-packages/funasr/download/download_model_from_hub.py (around line 232)
# funasr's calling code; note the parameter list doesn't include local_files_only
model_cache_dir = snapshot_download(
model, revision=model_revision, user_agent={Invoke.KEY: key, ThirdParty.KEY: "funasr"}
)With the parameter "lost" midway, modelscope's underlying offline logic cannot be triggered, causing it to continue checking for model updates and ultimately fail in offline environments.
Solution: Bypass the Upstream, Target the Downstream
Since modifying the upstream parameter chain is cumbersome, we can adopt a more direct strategy: directly modify modelscope's download logic to make it "smarter" and proactively use local caches.
Our goal: Regardless of how the upstream calls, as long as there is a local model cache, force its use and skip any network checks.
File to modify: site-packages/modelscope/hub/snapshot_download.py
Inside the _snapshot_download function, locate the if local_files_only: condition. Right above it, insert the following patch code:
# ... beginning of function _snapshot_download ...
# ==================== Force Use Local Cache Patch ====================
# First, check if the model files already exist in the local cache (typically more than 1 file)
if len(cache.cached_files) > 1:
# If yes, print a message (optional) and directly return the local path, halting all subsequent operations.
print("Found local model cache, using it directly. To re-download, delete the model folder.")
return cache.get_root_location()
else:
# If no local cache exists, to prevent errors from upstream incorrectly passing local_files_only=True,
# forcibly set it to False here to ensure the download process can continue.
local_files_only = False
# ===============================================================
# Original condition logic
if local_files_only:
if len(cache.cached_files) == 0:
raise ValueError(
'Cannot find the requested files in the cached path and outgoing'
' traffic has been disabled. To enable look-ups and downloads'
" online, set 'local_files_only' to False.")This modification permanently solves the problem. It gives modelscope the ability to prioritize local caches, perfectly meeting the needs of offline deployment.
Bonus: Resolving Conflicts with PySide6 and Other GUI Libraries
When integrating FunASR into PySide6 GUI applications, you might encounter another issue: the model fails to load due to a conflict between modelscope's lazy loading mechanism and PySide6's internal self-inspection behavior.
A simple solution is to modify the site-packages/modelscope/utils/import_utils.py file. Add two lines of code at the beginning of the __getattr__ method in the LazyImportModule class, so it directly states "no such attribute" when "interrogated," avoiding the issue.
# site-packages/modelscope/utils/import_utils.py
class LazyImportModule(ModuleType):
def __getattr__(self, name: str) -> Any:
# ===== Patch =====
if name == '__wrapped__':
raise AttributeError
# =================
# ... original code ...For a detailed background and analysis of this issue, you can refer to my other article, which won't be repeated here. https://pyvideotrans.com/blog/paraformer-zh-is-not-registered
I hope these two targeted modifications help you successfully deploy FunASR into any required environment.
