用相应的符号替换字符串中的所有unicode文字

2024-04-19 20:21:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个标准的Python3字符串,其中包含多个unicode文本(例如“\u00c6”)
我需要把它们转换成相应的字符(斯堪的纳维亚字母:æ,ø,å)

我试过用google搜索它,并使用.encode().decode()函数在latin-1utf-8unicode-escape的af组合之间切换
.decode()只适用于bytes类型,因此它是字符串,因此不起作用

该字符串来自使用BeautifulSoup4创建的this网站:

landingPage = "https://www.kmdvalg.dk/Main/Home/KV"

def soupMe(pageLink):
return BeautifulSoup(urllib2.urlopen(pageLink), "html.parser", from_encoding='utf-8')

soup = soupMe(landingPage)

fullList = soup.find(class_="row-masonry")
letters = fullList.find_all("div", class_="masonry")
kommuner = []

for x in letters:
    for y in x.div.find_all("a"):
        kommuner.append({"label": y.string, "link": y.get("href")})

print(json.dumps(kommuner))

输出如下:

[{"label": "Albertslund ", "link": "https://www.kmdvalg.dk/kv/2017/K84982165.htm"}, {"label": "Aller\u00f8d ", "link": "https://www.kmdvalg.dk/kv/2017/K84982201.htm"}, {"label": "Assens ", "link": "https://www.kmdvalg.dk/kv/2017/K84733420.htm"}, {"label": "Ballerup ", "link": "https://www.kmdvalg.dk/kv/2017/K84982151.htm"}, {"label": "Billund ", "link": "https://www.kmdvalg.dk/kv/2017/K84733530.htm"}, {"label": "Bornholm ", "link": "https://www.kmdvalg.dk/kv/2017/K84982400.htm"}, {"label": "Br\u00f8ndby ", "link": "https://www.kmdvalg.dk/kv/2017/K84982153.htm"}, {"label": "Br\u00f8nderslev ", "link": "https://www.kmdvalg.dk/kv/2017/K84712810.htm"}, {"label": "Drag\u00f8r ", "link": "https://www.kmdvalg.dk/kv/2017/K84982155.htm"}, {"label": "Egedal ", "link": "https://www.kmdvalg.dk/kv/2017/K84982240.htm"}, {"label": "Esbjerg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733561.htm"}, {"label": "Fan\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84733563.htm"}, {"label": "Favrskov ", "link": "https://www.kmdvalg.dk/kv/2017/K84713710.htm"}, {"label": "Faxe ", "link": "https://www.kmdvalg.dk/kv/2017/K84979320.htm"}, {"label": "Fredensborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84982210.htm"}, {"label": "Fredericia ", "link": "https://www.kmdvalg.dk/kv/2017/K84733607.htm"}, {"label": "Frederiksberg ", "link": "https://www.kmdvalg.dk/kv/2017/K84982147.htm"}, {"label": "Frederikshavn ", "link": "https://www.kmdvalg.dk/kv/2017/K84712813.htm"}, {"label": "Frederikssund ", "link": "https://www.kmdvalg.dk/kv/2017/K84982250.htm"}, {"label": "Fures\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84982190.htm"}, {"label": "Faaborg-Midtfyn ", "link": "https://www.kmdvalg.dk/kv/2017/K84733430.htm"}, {"label": "Gentofte ", "link": "https://www.kmdvalg.dk/kv/2017/K84982157.htm"}, {"label": "Gladsaxe ", "link": "https://www.kmdvalg.dk/kv/2017/K84982159.htm"}, {"label": "Glostrup ", "link": "https://www.kmdvalg.dk/kv/2017/K84982161.htm"}, {"label": "Greve ", "link": "https://www.kmdvalg.dk/kv/2017/K84979253.htm"}, {"label": "Gribskov ", "link": "https://www.kmdvalg.dk/kv/2017/K84982270.htm"}, {"label": "Guldborgsund", "link": "https://www.kmdvalg.dk/kv/2017/K84979376.htm"}, {"label": "Haderslev ", "link": "https://www.kmdvalg.dk/kv/2017/K84733510.htm"}, {"label": "Halsn\u00e6s ", "link": "https://www.kmdvalg.dk/kv/2017/K84982260.htm"}, {"label": "Hedensted ", "link": "https://www.kmdvalg.dk/kv/2017/K84713766.htm"}, {"label": "Helsing\u00f8r ", "link": "https://www.kmdvalg.dk/kv/2017/K84982217.htm"}, {"label": "Herlev ", "link": "https://www.kmdvalg.dk/kv/2017/K84982163.htm"}, {"label": "Herning ", "link": "https://www.kmdvalg.dk/kv/2017/K84713657.htm"}, {"label": "Hiller\u00f8d ", "link": "https://www.kmdvalg.dk/kv/2017/K84982219.htm"}, {"label": "Hj\u00f8rring ", "link": "https://www.kmdvalg.dk/kv/2017/K84712860.htm"}, {"label": "Holb\u00e6k ", "link": "https://www.kmdvalg.dk/kv/2017/K84979316.htm"}, {"label": "Holstebro ", "link": "https://www.kmdvalg.dk/kv/2017/K84713661.htm"}, {"label": "Horsens ", "link": "https://www.kmdvalg.dk/kv/2017/K84713615.htm"}, {"label": "Hvidovre ", "link": "https://www.kmdvalg.dk/kv/2017/K84982167.htm"}, {"label": "H\u00f8je-Taastrup ", "link": "https://www.kmdvalg.dk/kv/2017/K84982169.htm"}, {"label": "H\u00f8rsholm ", "link": "https://www.kmdvalg.dk/kv/2017/K84982223.htm"}, {"label": "Ikast-Brande ", "link": "https://www.kmdvalg.dk/kv/2017/K84713756.htm"}, {"label": "Ish\u00f8j ", "link": "https://www.kmdvalg.dk/kv/2017/K84982183.htm"}, {"label": "Jammerbugt ", "link": "https://www.kmdvalg.dk/kv/2017/K84712849.htm"}, {"label": "Kalundborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84979326.htm"}, {"label": "Kerteminde ", "link": "https://www.kmdvalg.dk/kv/2017/K84733440.htm"}, {"label": "Kolding ", "link": "https://www.kmdvalg.dk/kv/2017/K84733621.htm"}, {"label": "K\u00f8benhavn ", "link": "https://www.kmdvalg.dk/kv/2017/K84982101.htm"}, {"label": "K\u00f8ge ", "link": "https://www.kmdvalg.dk/kv/2017/K84979259.htm"}, {"label": "Langeland ", "link": "https://www.kmdvalg.dk/kv/2017/K84733482.htm"}, {"label": "Lejre ", "link": "https://www.kmdvalg.dk/kv/2017/K84979350.htm"}, {"label": "Lemvig ", "link": "https://www.kmdvalg.dk/kv/2017/K84713665.htm"}, {"label": "Lolland ", "link": "https://www.kmdvalg.dk/kv/2017/K84979360.htm"}, {"label": "Lyngby-Taarb\u00e6k ", "link": "https://www.kmdvalg.dk/kv/2017/K84982173.htm"}, {"label": "L\u00e6s\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84712825.htm"}, {"label": "Mariagerfjord ", "link": "https://www.kmdvalg.dk/kv/2017/K84712846.htm"}, {"label": "Middelfart ", "link": "https://www.kmdvalg.dk/kv/2017/K84733410.htm"}, {"label": "Mors\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84712773.htm"}, {"label": "Norddjurs ", "link": "https://www.kmdvalg.dk/kv/2017/K84713707.htm"}, {"label": "Nordfyns ", "link": "https://www.kmdvalg.dk/kv/2017/K84733480.htm"}, {"label": "Nyborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733450.htm"}, {"label": "N\u00e6stved ", "link": "https://www.kmdvalg.dk/kv/2017/K84979370.htm"}, {"label": "Odder ", "link": "https://www.kmdvalg.dk/kv/2017/K84713727.htm"}, {"label": "Odense ", "link": "https://www.kmdvalg.dk/kv/2017/K84733461.htm"}, {"label": "Odsherred ", "link": "https://www.kmdvalg.dk/kv/2017/K84979306.htm"}, {"label": "Randers ", "link": "https://www.kmdvalg.dk/kv/2017/K84713730.htm"}, {"label": "Rebild ", "link": "https://www.kmdvalg.dk/kv/2017/K84712840.htm"}, {"label": "Ringk\u00f8bing-Skjern", "link": "https://www.kmdvalg.dk/kv/2017/K84713760.htm"}, {"label": "Ringsted ", "link": "https://www.kmdvalg.dk/kv/2017/K84979329.htm"}, {"label": "Roskilde ", "link": "https://www.kmdvalg.dk/kv/2017/K84979265.htm"}, {"label": "Rudersdal ", "link": "https://www.kmdvalg.dk/kv/2017/K84982230.htm"}, {"label": "R\u00f8dovre ", "link": "https://www.kmdvalg.dk/kv/2017/K84982175.htm"}, {"label": "Sams\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84713741.htm"}, {"label": "Silkeborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84713740.htm"}, {"label": "Skanderborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84713746.htm"}, {"label": "Skive ", "link": "https://www.kmdvalg.dk/kv/2017/K84713779.htm"}, {"label": "Slagelse ", "link": "https://www.kmdvalg.dk/kv/2017/K84979330.htm"}, {"label": "Solr\u00f8d ", "link": "https://www.kmdvalg.dk/kv/2017/K84979269.htm"}, {"label": "Sor\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84979340.htm"}, {"label": "Stevns ", "link": "https://www.kmdvalg.dk/kv/2017/K84979336.htm"}, {"label": "Struer ", "link": "https://www.kmdvalg.dk/kv/2017/K84713671.htm"}, {"label": "Svendborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733479.htm"}, {"label": "Syddjurs ", "link": "https://www.kmdvalg.dk/kv/2017/K84713706.htm"}, {"label": "S\u00f8nderborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733540.htm"}, {"label": "Thisted ", "link": "https://www.kmdvalg.dk/kv/2017/K84712787.htm"}, {"label": "T\u00f8nder ", "link": "https://www.kmdvalg.dk/kv/2017/K84733550.htm"}, {"label": "T\u00e5rnby ", "link": "https://www.kmdvalg.dk/kv/2017/K84982185.htm"}, {"label": "Vallensb\u00e6k ", "link": "https://www.kmdvalg.dk/kv/2017/K84982187.htm"}, {"label": "Varde ", "link": "https://www.kmdvalg.dk/kv/2017/K84733573.htm"}, {"label": "Vejen ", "link": "https://www.kmdvalg.dk/kv/2017/K84733575.htm"}, {"label": "Vejle ", "link": "https://www.kmdvalg.dk/kv/2017/K84733630.htm"}, {"label": "Vesthimmerlands ", "link": "https://www.kmdvalg.dk/kv/2017/K84712820.htm"}, {"label": "Viborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84713791.htm"}, {"label": "Vordingborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84979390.htm"}, {"label": "\u00c6r\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84733492.htm"}, {"label": "Aabenraa", "link": "https://www.kmdvalg.dk/kv/2017/K84733580.htm"}, {"label": "Aalborg", "link": "https://www.kmdvalg.dk/kv/2017/K84712851.htm"}, {"label": "Aarhus", "link": "https://www.kmdvalg.dk/kv/2017/K84713751.htm"}]

问题是标签被转换成unicode文本,所以

  • æ变成\u00e6
  • Tønder变成T\u00f8nder
  • Ærø变成\u00c6r\u00f8

如何获取字符串并用相应的符号替换unicode文本的所有实例?


Tags: 字符串https文本wwwunicodelinkfindlabel
1条回答
网友
1楼 · 发布于 2024-04-19 20:21:07

这个序列化程序在默认情况下设置ensure_ascii = True,因此输出字符串将始终是纯ASCII格式。要在结果中获得utf8字符,只需将ensure_ascii = False添加到json.dumps()。你知道吗

相关问题 更多 >