如何使SQLAlchemy正确地将unicode省略号插入到mySQL表中？

import sys import feedparser import sqlalchemy from sqlalchemy import create_engine, MetaData, Table COMMON_CHANNEL_PROPERTIES = [ ('Channel title:','title', None), ('Channel description:', 'description', 100), ('Channel URL:', 'link', None), ] COMMON_ITEM_PROPERTIES = [ ('Item title:', 'title', None), ('Item description:', 'description', 100), ('Item URL:', 'link', None), ] INDENT = u' '*4 def feedinfo(url, output=sys.stdout): feed_data = feedparser.parse(url) channel, items = feed_data.feed, feed_data.entries #adding charset=utf8 here is what fixed the problem db = create_engine('mysql://user:pass@localhost/db?charset=utf8') metadata = MetaData(db) rssItems = Table('rss_items', metadata,autoload=True) i = rssItems.insert(); for label, prop, trunc in COMMON_CHANNEL_PROPERTIES: value = channel[prop] if trunc: value = value[:trunc] + u'...' print >> output, label, value print >> output print >> output, "Feed items:" for item in items: i.execute({'title':item['title'], 'description': item['description'][:100]}) for label, prop, trunc in COMMON_ITEM_PROPERTIES: value = item[prop] if trunc: value = value[:trunc] + u'...' print >> output, INDENT, label, value print >> output, INDENT, u'---' return if __name__=="__main__": url = sys.argv[1] feedinfo(url)

Channel title: [H]ardOCP News/Article Feed Channel description: News/Article Feed for [H]ardOCP... Channel URL: http://www.hardocp.com Feed items: Item title: Windows 8 UI is Dropping the 'Start' Button Item description: After 15 years of occupying a place of honor on the desktop, the "Start" button will disappear from ... Item URL: http://www.hardocp.com/news/2012/02/05/windows_8_ui_dropping_lsquostartrsquo_button/ --- Item title: Which Crashes More? Apple Apps or Android Apps Item description: A new study of smartphone apps between Android and Apple conducted over a two month period came up w... Item URL: http://www.hardocp.com/news/2012/02/05/which_crashes_more63_apple_apps_or_android/ --- Traceback (most recent call last): File "parse.py", line 47, in <module> feedinfo(url) File "parse.py", line 36, in feedinfo i.execute({'title':item['title'], 'description': item['description'][:100]}) File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2758, in execute return e._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2304, in _execute_clauseelement return connection._execute_clauseelement(elem, multiparams, params) File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1538, in _execute_clauseelement compiled_sql, distilled_params File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1639, in _execute_context context) File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 330, in do_execute cursor.execute(statement, parameters) File "build/bdist.linux-i686/egg/MySQLdb/cursors.py", line 159, in execute File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 264, in literal File "build/bdist.linux-i686/egg/MySQLdb/connections.py", line 202, in unicode_literal UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' in position 35: ordinal not in range(256)

2条回答

网友

1楼 · 编辑于 2024-06-06 00:15:03

错误消息

UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2026' 
in position 35: ordinal not in range(256)

似乎表明某些Python语言代码试图将字符\u2026转换为拉丁-1（ISO8859-1）字符串，但它失败了。毫不奇怪，这个字符是^{}，在ISO8859-1中没有一个等效字符。

通过在SQLAlchemy连接调用中添加查询?charset=utf8，您修复了该问题：

import sqlalchemy
from sqlalchemy import create_engine, MetaData, Table

db = create_engine('mysql://user:pass@localhost/db?charset=utf8')

SQLAlchemy文档的Database Urls部分告诉我们，以mysql开头的URL表示使用mysql-python驱动程序的MySQL方言。

下一节Custom DBAPI connect() arguments告诉我们，查询参数被传递给底层DBAPI。

那么，^{}驱动程序对参数{charset: 'utf8'}做了什么？他们文档的Functions and attributes部分提到charset属性“……如果存在，连接字符集将更改为此字符集，如果它们不相等。”

为了了解连接字符集的含义，我们转向MySQL 5.6参考手册中的10.1.4. Connection Character Sets and Collations。长话短说，MySQL可以将传入的查询解释为不同于数据库字符集的编码，也不同于返回查询结果的编码。

由于您报告的错误消息看起来像一个Python而不是一个SQL错误消息，我将推测SQLAlchemy或mysql Python中的某些东西在发送查询之前试图将其转换为默认的连接编码latin-1。这就是引发错误的原因。但是，connect()调用中的查询字符串?charset=utf8更改了连接编码，并且U+2026 HORIZONTAL ELLIPSIS能够通过。

更新：您也会问，“如果我删除charset选项，然后使用.encode（'cp1252'）对描述进行编码，那么它就可以正常运行了。省略号如何能够使用cp1252而不是unicode？”

字节值\x85处的encoding ^{} has是水平省略字符。因此，可以将包含U+2026 HORIZONTAL ELLIPSIS的Unicode字符串无误地编码到cp1252中。

还要记住，在Python中，Unicode字符串和字节字符串是两种不同的数据类型。有理由推测MySQLdb可能有一个策略，即只通过SQL连接发送字节字符串。因此，它将作为Unicode字符串接收的查询编码为字节字符串，但将作为字节字符串单独接收的查询保留下来。（这是推测，我没有看源代码。）

在您发布的回溯中，最后两行（最接近发生错误的地方）显示方法名literal，然后是unicode_literal。这倾向于支持MySQLdb将接收到的查询作为Unicode字符串编码为字节字符串的理论。

当您自己对查询字符串进行编码时，可以绕过MySQLdb中执行此编码的部分。但是，请注意，如果对查询字符串的编码与MySQL连接字符集调用的编码不同，则编码将不匹配，并且文本可能存储错误。

网友

2楼 · 编辑于 2024-06-06 00:15:03

在连接字符串中添加charset=utf8肯定有帮助，但我在Python 2.7中遇到了这样的情况：将convert_unicode=True添加到create_engine也是必要的。SQLAlchemy文档说，这只是为了提高性能，但在我的例子中，它实际上解决了使用错误编码器的问题。

相关问题更多 >

编程相关推荐

热门问题

热门文章