Skip to content

In today's AI application development, high-quality Automatic Speech Recognition (ASR) technology is a core competitive advantage for many products. Especially for Chinese language scenarios, the open-source FunASR project from Alibaba DAMO Academy delivers excellent results.

FunASR is not just a single model, but a comprehensive basic speech recognition toolkit. It integrates a range of powerful features including speech recognition (paraformer-zh/sensevoicesmall) and Voice Activity Detection (VAD).

When using paraformer-zh and sensevoicesmall, you need to rely on the funasr and modelscope libraries. While the models themselves are powerful, I encountered a rather tricky and confusing problem in offline environments or when needing stable deployments.

Core Issue: Why Doesn't the local_files_only Parameter Work in Offline Deployment?

To achieve true offline usage, we naturally think of using the official local_files_only=True parameter. Its design intention is to tell the program "only use locally cached models and do not access the network."

However, in practice, even with all the "offline" parameters set as you would expect below, the program still tries to connect to the server in a network-free environment, eventually leading to failure.

python
# Ideal invocation method
AutoModel(
    model=model_name,
    # ... other model parameters ...
    hub='ms',
    local_files_only=True, # Expect this parameter to take effect
    disable_update=True,
)

Even more frustrating, regardless of network timeouts, download failures, or other I/O issues, funasr ultimately throws a vague error: paraformer-zh is not registered. This information is of no help in pinpointing the real cause – the network connection attempt.

Deep Dive: A Broken Parameter Passing Chain

By tracing the source code, we quickly found the problem. The issue lies not in modelscope, but in the funasr calling layer. When funasr calls modelscope's download function snapshot_download, it does not pass the critical local_files_only parameter down.

Evidence Here: site-packages/funasr/download/download_model_from_hub.py (line ~232)

python
    # funasr's calling code, you can see that there is no local_files_only in the parameter list at all
    model_cache_dir = snapshot_download(
                model, revision=model_revision, user_agent={Invoke.KEY: key, ThirdParty.KEY: "funasr"}
            )

The parameter is "lost" halfway, so the underlying offline logic of modelscope cannot be triggered, causing it to continue checking for model updates regardless, ultimately resulting in an error in the offline environment.

Solution: Bypass the Upstream, Target the Downstream

Since modifying the upstream parameter passing chain is cumbersome, we can take a more direct approach: directly modify modelscope's download logic to make it more "intelligent" and proactively embrace local caching.

Our goal is: no matter how the upstream calls, as long as there is a local model cache, it will be forced to use it, and no more network checks will be performed.

Modify File: site-packages/modelscope/hub/snapshot_download.py

Inside the function _snapshot_download, find the line if local_files_only:. Directly above it, insert the following patch code:

python
    # ... Function _snapshot_download opening section ...

    # ==================== Force use local cache patch ====================
    # Check if the model files already exist in the local cache first (usually more than 1)
    if len(cache.cached_files) > 1:
        # If so, print a hint message (optional), then directly return the local path, interrupting all subsequent operations.
        print("Found local model cache, using it directly. To re-download, delete the model folder.")
        return cache.get_root_location()
    else:
        # If there is no local cache, to prevent the upstream from incorrectly passing local_files_only=True causing the download to fail,
        # force it to False here to ensure that the download process can continue.
        local_files_only = False
    # ===============================================================

    # Original judgment logic
    if local_files_only:
        if len(cache.cached_files) == 0:
            raise ValueError(
                'Cannot find the requested files in the cached path and outgoing'
                ' traffic has been disabled. To enable look-ups and downloads'
                " online, set 'local_files_only' to False.")

This modification solves the problem once and for all. It gives modelscope the ability to prioritize the use of local caching, perfectly adapting to the needs of offline deployment.

By the Way: Resolving Conflicts with PySide6 and Other GUI Libraries

When integrating FunASR into a PySide6 graphical interface application, you may also encounter another problem: the model fails to load due to conflicts between modelscope's lazy loading mechanism and PySide6's internal self-checking behavior.

A simple solution is to modify the site-packages/modelscope/utils/import_utils.py file, adding two lines of code to the beginning of the __getattr__ method of the LazyImportModule class, so that it directly states "this attribute does not exist" when questioned, thereby avoiding triggering the problem.

python
# site-packages/modelscope/utils/import_utils.py
class LazyImportModule(ModuleType):
    def __getattr__(self, name: str) -> Any:
        # ===== Patch =====
        if name == '__wrapped__':
            raise AttributeError
        # =================
        # ... Original code ...

For a detailed background and analysis of this issue, please refer to my other article , I won't go into details here. https://pyvideotrans.com/blog/paraformer-zh-is-not-registered

Hopefully, these two targeted modifications can help you successfully deploy FunASR to any environment you need.