Skip to content

Two Errors I Encountered While Installing Python Packages and Their Solutions

Recently, while setting up Parakeet, an open-source project from NVIDIA, I needed to install quite a few Python dependencies. The process wasn't too complicated, but I ran into two interesting errors along the way. These problems aren't difficult in themselves, but if you're encountering them for the first time, you might get stuck for a while.

Here, I'm sharing my troubleshooting process, hoping it can help others facing similar issues.

First Problem: ModuleNotFoundError: No module named 'docopt'

When I ran pip install -r requirements.txt, the installation process stopped on a package called docopt. The error message was clear:

ModuleNotFoundError: No module named 'docopt'

The strange part is, I was trying to install docopt, but it was telling me the docopt module couldn't be found.

After taking a closer look at the full error log, I found the problem was in the package's setup.py installation script. Before executing the installation, this script actually tries to import docopt. This creates a classic "chicken and egg" problem: I want to install it, but its installation script requires it to be already installed.

This is actually an issue with how the package itself was bundled, not a fault of pip.

The solution is quite simple. Since it needs a docopt module file to proceed, we can just provide one manually.

  1. Search for docopt.py directly in your browser.
  2. In the search results, you can usually find the source code for this file on GitHub or other code hosting platforms.
  3. Download this docopt.py file and place it directly into my Parakeet project's root directory.
  4. Then, go back to the command line and re-run the previous pip install command.

This time, when the installation script needed the docopt module, it found the docopt.py file in the current directory. Problem solved, and the installation continued.

Second Problem: UnicodeDecodeError: 'gbk' codec can't decode...

After solving the first problem, I continued the installation. It wasn't long before I hit a second roadblock, this time while installing the indic_numtowords package.

The error message was:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 268: illegal multibyte sequence

This is a very typical encoding error.

The cause is that on Windows systems, the default text encoding is GBK. When the indic_numtowords package's installation script tried to read a file (like README.md), it didn't specify which encoding to use. As a result, the system defaulted to using GBK. However, the file itself was likely saved with UTF-8 encoding.

It's like asking someone who only understands Chinese to read an English article. When they see characters they don't recognize, an error naturally occurs.

To fix this, we need to tell Python to use UTF-8 for all file reading and writing operations during this installation.

The method is also straightforward: set a temporary environment variable before running the installation command.

  1. Open your command-line tool (CMD or PowerShell).

  2. Enter the following command and press Enter. This command is only effective in the current window and will be gone once you close it, so it's perfectly safe.

    If you are using CMD:

    shell
    set PYTHONUTF8=1

    If you are using PowerShell:

    shell
    $env:PYTHONUTF8=1
  3. In the same window, re-run pip install indic_numtowords.

After executing the command, the installation proceeded smoothly. With the guidance of this environment variable, Python used the correct UTF-8 encoding to read the file, and the garbled characters and errors disappeared.

Neither of these issues had much to do with the pip tool itself, but rather with how the packages being installed were written. I hope my experience can save you some troubleshooting time.