As a developer, have you ever been overwhelmed by these numerous codes?
- When users register, should the country list use
CN
orCHN
? - When doing multilingual translation (i18n), should the folder be named
zh
orzh-CN
? - When processing video subtitles, the specification requires an unfamiliar three-letter code, sometimes
zho
and sometimeschi
. What are the differences? - Not to mention time zone identifiers like
Asia/Shanghai
, which seem to have no rules.
After reading this, you will completely understand the logic behind these codes and be confident in using them correctly in your projects.
Core Idea: Divide and Conquer
These standards seem confusing because we're trying to understand them with a vague concept of "region." However, the principle of the computer world is precision. Therefore, international standards organizations "divide and conquer" the vague concept of "region" into several specific, orthogonal (independent) dimensions and establish a golden standard for each dimension.
Our journey of exploration begins with understanding these dimensions.
1. Geographical Location: Where Am I? - ISO 3166-1
This is the foundation of all codes, answering the simplest question: "What country/region is this?"
- Standard Name: ISO 3166-1
- Core Task: Provide unique identifiers for countries and regions worldwide.
- Main Forms:
- alpha-2 (two-letter code): e.g.,
US
,CN
,JP
. This is the most common and universal form. - alpha-3 (three-letter code): e.g.,
USA
,CHN
,JPN
. More readable, often used in data statistics and official documents.
- alpha-2 (two-letter code): e.g.,
Developer Practical Guide:
- Database Design: When storing countries in the user table, create a
country_code
field and use theCHAR(2)
type to store the two-letter code (alpha-2
). For example:
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(255),
country_code CHAR(2)
);
- API Design: Region-related APIs (such as e-commerce delivery range) should use the two-letter code as a parameter, for example:
GET /api/v1/shipping?country=CN HTTP/1.1
- Frontend Development: In the country selection drop-down box, the
value
of<option value="CN">中国</option>
should use the two-letter code. For example:
<select name="country">
<option value="CN">中国</option>
<option value="US">美国</option>
<option value="JP">日本</option>
</select>
Learn More
- Wikipedia: ISO 3166-1
- Official Standard Query: ISO Online Browsing Platform
2. Language: What Language Do I Speak? - ISO 639
This standard only cares about one thing: Which language are we using?
- Standard Name: ISO 639
- Core Task: Encode the languages of the world.
- Main Forms:
- ISO 639-1 (two-letter code): e.g.,
en
,zh
,ja
. It covers about 184 major languages in the world and is conventionally used in lowercase. - ISO 639-2 (three-letter code, divided into T and B categories): e.g.,
eng
,zho
,jpn
. It covers more than 500 languages, solving the problem of insufficient coverage of two-letter codes. - ISO 639-3 (three-letter code): e.g.,
eng
,zho
,jpn
. ISO 639-3 is an extension of ISO 639-2, aiming to cover the superset of all individual languages.
- ISO 639-1 (two-letter code): e.g.,
Learn More
- Wikipedia: ISO 639
- Official Code List (ISO 639-1 & 639-2): Library of Congress
3. Precise Localization: Where Am I and What Language Do I Speak? - Locale
Now, we combine the previous two to answer a more precise question: "What specific language is used by users in a specific region?" This is the concept of Locale.
- Standard Name: No single standard, usually follows the IETF BCP 47 specification, which combines
ISO 639
andISO 3166-1
. - Core Task: Accurately describe the variants of languages in specific regions to handle differences in spelling, wording, date formats, currency symbols, etc.
- Format:
language code-COUNTRY code
(language-COUNTRY)en-US
: English used in the United States.en-GB
: English used in the United Kingdom.zh-CN
: Chinese used in mainland China (specifically simplified).zh-TW
: Chinese used in Taiwan, China (specifically traditional).
Developer Practical Guide:
- Software Internationalization (i18n): Your resource files (such as translation strings) should be placed in folders named after Locale, such as
values-zh-CN/strings.xml
(Android). For example:
res/
values/
strings.xml
values-zh-CN/
strings.xml
- HTTP Request Header: Parse the
Accept-Language: zh-CN,zh;q=0.9
request header to return the most suitable language version for the user. For example:
Accept-Language: zh-CN,zh;q=0.9
- Date/Currency Formatting: Libraries in all modern programming languages accept Locale as a parameter. For example, in Java:
Locale locale = new Locale("zh", "CN");
DateFormat dateFormat = DateFormat.getDateInstance(DateFormat.DEFAULT, locale);
String dateStr = dateFormat.format(new Date());
Learn More
- Wikipedia: IETF language tag
- Official Standard Definition (BCP 47): IETF Tools - BCP 47
4. Professional Fields and Special Situations: Subtitles, Multimedia and T/B
Codes - ISO 639-2
Why don't video subtitles directly use zh
or en
? Because professional fields require broader language coverage, and this is the root cause of the "one language, multiple codes" problem.
Standard Name: ISO 639-2 (three-letter code)
Key Knowledge Point: T/B Codes (Terminology/Bibliographic Codes) About 20 languages have two three-letter codes in
ISO 639-2
, which stems from historical reasons:- B Code (Bibliographic): Derived from the English name, mainly used for library cataloging, it is a legacy product. For example,
German
->ger
. - T Code (Terminology): Derived from the local name of the language, it is the code recommended for use in modern computer applications. For example,
Deutsch
->deu
.
The most common example is Chinese:
chi
is the B code (from Chinese).zho
is the T code (from 中文, Zhōngwén).
- B Code (Bibliographic): Derived from the English name, mainly used for library cataloging, it is a legacy product. For example,
Language | English Name | Local Name | B Code (Old/Catalog) | T Code (New/Terminology) | Recommended Use |
---|---|---|---|---|---|
Chinese | Chinese | 中文 | chi | zho | zho |
German | German | Deutsch | ger | deu | deu |
French | French | Français | fre | fra | fra |
Tibetan | Tibetan | བོད་ཡིག | tib | bod | bod |
Developer Practical Guide:
- Golden Rule: Prioritize using T codes! It is designed for technical applications. But when dealing with old systems or external data, your code needs to be compatible and be able to recognize both T codes and B codes at the same time.
- Media Processing: When using FFmpeg, you should use the T code. For example:
ffmpeg -i input.mp4 -metadata:s:s:0 language=zho output.mp4
- Data Cleaning: When receiving data from external sources, you can use mapping functions to unify the code. For example, in Python:
language_map = {
"chi": "zho",
"ger": "deu",
"fre": "fra",
"tib": "bod",
}
def normalize_language_code(code):
return language_map.get(code, code)
5. The Ultimate Challenge: Time and Time Zones - IANA Time Zone Database
Why can't the country code US
be used to represent US time? Because the United States has 4 time zones and involves complex daylight saving time rules.
- Standard Name: IANA Time Zone Database (also known as tz database or Olson database)
- Core Task: Accurately define the boundaries of all time zones in the world, the offset from UTC, and all historical daylight saving time change rules.
- Format:
Continent/Representative City
(Area/Location)Asia/Shanghai
America/New_York
Europe/London
Developer Practical Guide:
- Golden Rule: Never calculate time zones or daylight saving time yourself!
- Backend Development: On the server, all times should be stored in UTC. When converting to local time, use the IANA identifier. For example, in Java:
Instant instant = Instant.now();
String timestamp = instant.toString();
- Frontend Development: The browser API can get the user's time zone. For example, in JavaScript:
const timeZone = Intl.DateTimeFormat().resolvedOptions().timeZone;
Learn More
- Wikipedia: tz database
- Official Data Source: IANA Time Zones
Quick Reference Cheat Sheet
Task Scenario | What Do I Need? | Using Standard | Example Code | Developer Key Points |
---|---|---|---|---|
Select Country | Uniquely identify a country | ISO 3166-1 alpha-2 | CN , US | Database CHAR(2) storage, API parameters |
Webpage or Simple Translation | Identify a major language | ISO 639-1 | zh , en | HTML lang attribute, i18n base |
Precise Localization | Distinguish regional variations of language | IETF BCP 47 | zh-CN , en-US | i18n folder naming, HTTP request header, formatting |
Subtitle/Audio Track Marking | Cover as many languages as possible | ISO 639-2 | zho (recommended) | Prioritize T codes, compatible with B codes |
Handle Local Time | Precisely calculate time and daylight saving time | IANA Time Zone DB | Asia/Shanghai | Server stores UTC, client uses IANA identifier for conversion |
Now, the mist has cleared. These codes are not the product of confusion, but a well-designed, clearly divided system. Master them, and you will be able to:
- Establish a Clear Mental Model: Understand the applicable scenarios of each code and understand the historical reasons for special situations such as
zho/chi
. - Write More Robust Code: Elegantly handle global user needs while being compatible with old data.
- Collaborate Efficiently: Communicate with the team using precise terminology.
Reference Links
- ISO 3166-1: Wikipedia | ISO Online Browsing Platform
- ISO 639: Wikipedia | Library of Congress
- IETF BCP 47: Wikipedia | IETF Tools - BCP 47
- IANA Time Zone Database: Wikipedia | IANA Time Zones