Two Errors and Solutions I Encountered When Installing Python Packages

Recently, while configuring Nvidia's open-source project Parakeet, I needed to install several Python dependency packages. The process wasn't overly complex, but I encountered two interesting errors along the way. These issues themselves aren't difficult to solve, but if you're encountering them for the first time, they might hold you up for a while.

Here, I'm sharing the troubleshooting process in the hope that it helps others facing similar problems.

The First Problem: `ModuleNotFoundError: No module named 'docopt'`

After running pip install -r requirements.txt, the installation process halted on a package called docopt. The error message was clear:

ModuleNotFoundError: No module named 'docopt'

The odd part was that I was trying to install docopt, yet it was telling me the docopt module couldn't be found.

Upon closer inspection of the full error log, I discovered the issue was in the package's setup.py installation script. This script was attempting to import docopt before performing the installation. This created a classic "chicken or the egg" problem: I wanted to install it, but its installation script required it to already be installed.

This is actually a problem with how this specific package was packaged, not an issue with pip itself.

The solution is simple. Since it needed a docopt module file to proceed, we just had to provide one manually.

Search directly in your browser for docopt.py.
In the search results, you can usually find the source code for this file on GitHub or other code hosting platforms.
Download this docopt.py file and place it directly into the root directory of my Parakeet project.
Then, return to the command line and re-run the previous pip install command.

This time, when the installation script required the docopt module, it found the docopt.py file in the current directory. Problem solved, and the installation continued.

The Second Problem: `UnicodeDecodeError: 'gbk' codec can't decode...`

After solving the first issue, I continued with the installation. Unexpectedly, I soon hit another roadblock. This time it occurred while installing the indic_numtowords package.

The error message looked like this:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 268: illegal multibyte sequence

This is a very typical encoding error.

The root cause is that on Windows systems, the default text encoding is GBK. The installation script for the indic_numtowords package was reading a file (likely README.md) without specifying which encoding to use. Consequently, the system defaulted to using GBK. However, this file was probably saved using UTF-8 encoding.

It's like asking someone who only understands Chinese to read an English article; they'll encounter unfamiliar characters and run into errors.

To fix this, we need to instruct Python to consistently use UTF-8 encoding for reading and writing files during this installation.

The method is straightforward: set a temporary environment variable before executing the installation command.

Open your command line tool (CMD or PowerShell).
Enter the following command and press Enter. This command is only effective for the current window and will be lost when you close it, making it safe.
If you are using CMD:
shell
```
set PYTHONUTF8=1
```
1
If you are using PowerShell:
shell
```
$env:PYTHONUTF8=1
```
1
In the same window, re-run pip install indic_numtowords.

After executing this, the installation proceeded smoothly. Guided by this environment variable, Python used the correct UTF-8 encoding to read the files, and there were no further garbled characters or errors.

Both of these problems were related not to the pip tool itself, but to how the packages being installed were written. I hope this bit of experience saves you some time troubleshooting.