调用谷歌搜索API时的Unicode错误

0 投票

2 回答

1209 浏览

数据工程师

提问于 2025-04-16 10:29

我需要在谷歌上搜索，以获取某个查询的结果数量。我在这里找到了答案 - 从Python应用进行谷歌搜索

不过，对于某些查询，我遇到了下面的错误。我觉得这个查询里有一些unicode字符。

UnicodeDecodeError: 'ascii' 编码无法解码位置28的字节0xc3: 排序值不在范围内(128)

我在谷歌上查了一下，发现我需要把unicode转换成ascii，然后找到了下面的代码。

def convertToAscii(text, action):
            temp = unicode(text, "utf-8")
            fixed = unicodedata.normalize('NFKD', temp).encode('ASCII', action)
            return fixed
    except Exception, errorInfo:
            print errorInfo
            print "Unable to convert the Unicode characters to xml character entities"
            raise errorInfo

如果我使用忽略的方式，它会删除那些字符，但如果我使用其他方式，就会出现异常。

有没有什么办法可以解决这个问题？

谢谢

== 编辑 ==

我正在使用下面的代码进行编码，然后进行搜索，但这段代码抛出了错误。

query = urllib.urlencode({'q': searchfor})

error handling unicode character encoding urllib ascii unicodeerror google search api encoding

2 个回答

你不能安全地把 Unicode 转换成 ASCII。这样做会丢失一些信息（特别是非英语的字母会被丢掉）。

你应该整个过程都用 Unicode，这样才能确保不丢失任何信息。

回答于 2025-04-16 由 Python大师

分享举报

你不能直接对原始的Unicode字符串进行urlencode处理。你需要先把它们转换成UTF-8格式，然后再进行编码：

query = urllib.urlencode({'q': u"München".encode('UTF-8')})

这样会返回q=M%C3%BCnchen，谷歌会很高兴地接受这个格式。

回答于 2025-04-16 由 Python大师

分享举报

调用谷歌搜索API时的Unicode错误

2 个回答

撰写回答