匹配语言代码与该语言为官方或常用语言的国家

8 投票

5 回答

9112 浏览

数据工程师

提问于 2025-04-15 21:52

有没有什么Python库可以获取一个特定语言代码对应的国家列表，这些国家是把这种语言作为官方语言或常用语言的？

比如，语言代码“fr”对应的有29个国家是把法语当作官方语言的，还有8个国家是把法语当作常用语言的。

语言代码官方语言常用语言国家列表

5 个回答

找一下Babel这个包。它为每种支持的地区都有一个存储文件。你可以查看localedata模块里的list()函数，这个函数可以帮你获取所有地区的列表。然后你可以写一些代码，把这些地区信息分成（语言，国家）等等。

回答于 2025-04-15 由 Python大师

分享举报

尽管有一个被接受的答案，但根据我的了解，pycountry这个库里的xml文件并没有提供将语言和国家对应起来的方法。它里面有语言及其iso代码的列表，还有国家及其iso代码的列表，以及其他一些有用的信息，但就是没有这个对应关系。

同样，Babel这个包也很不错，但我找了一会儿也没找到任何方法可以列出某个国家的所有语言。你能做到的最好的就是找到“最有可能”的语言：https://stackoverflow.com/a/22199367/202168

所以我只好自己去找了……

import lxml.etree
import urllib.request

def get_territory_languages():
    url = "https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/supplementalData.xml"
    langxml = urllib.request.urlopen(url)
    langtree = lxml.etree.XML(langxml.read())

    territory_languages = {}
    for t in langtree.find('territoryInfo').findall('territory'):
        langs = {}
        for l in t.findall('languagePopulation'):
            langs[l.get('type')] = {
                'percent': float(l.get('populationPercent')),
                'official': bool(l.get('officialStatus'))
            }
        territory_languages[t.get('type')] = langs
    return territory_languages

你可能想把这个结果存到一个文件里，而不是每次需要的时候都去网上查。

这个数据集中还包含了一些“非官方”的语言，你可能不想把这些也包括进去，这里有一些更多的示例代码：

TERRITORY_LANGUAGES = get_territory_languages()

def get_official_locale_ids(country_code):
    country_code = country_code.upper()
    langs = TERRITORY_LANGUAGES[country_code].items()
    # most widely-spoken first:
    langs.sort(key=lambda l: l[1]['percent'], reverse=True)
    return [
        '{lang}_{terr}'.format(lang=lang, terr=country_code)
        for lang, spec in langs if spec['official']
    ]

get_official_locale_ids('es')
>>> ['es_ES', 'ca_ES', 'gl_ES', 'eu_ES', 'ast_ES']

回答于 2025-04-15 由 Python大师

分享举报

-2

你可以使用一个叫做 pycountry 的库（真的很不错）。你可以从这个网站下载它。

回答于 2025-04-15 由 Python大师

分享举报

匹配语言代码与该语言为官方或常用语言的国家

5 个回答

撰写回答