SQLAlchemy 的 Unicode 问题

Question

我知道我在处理Unicode转换时遇到了问题，但不太清楚具体是哪里出了错。

我正在从一堆HTML文件中提取关于最近一次欧洲旅行的数据。有些地点名称里包含非ASCII字符（比如é、ô、ü）。我通过正则表达式从文件的字符串表示中获取这些数据。

如果我在找到地点时直接打印出来，它们会正常显示这些字符，所以编码应该没问题：

Le Pré-Saint-Gervais, France
Hôtel-de-Ville, France

我使用SQLAlchemy把数据存储在SQLite表中：

Base = declarative_base()
class Point(Base):
    __tablename__ = 'points'

    id = Column(Integer, primary_key=True)
    pdate = Column(Date)
    ptime = Column(Time)
    location = Column(Unicode(32))
    weather = Column(String(16))
    high = Column(Float)
    low = Column(Float)
    lat = Column(String(16))
    lon = Column(String(16))
    image = Column(String(64))
    caption = Column(String(64))

    def __init__(self, filename, pdate, ptime, location, weather, high, low, lat, lon, image, caption):
        self.filename = filename
        self.pdate = pdate
        self.ptime = ptime
        self.location = location
        self.weather = weather
        self.high = high
        self.low = low
        self.lat = lat
        self.lon = lon
        self.image = image
        self.caption = caption

    def __repr__(self):
        return "<Point('%s','%s','%s')>" % (self.filename, self.pdate, self.ptime)

engine = create_engine('sqlite:///:memory:', echo=False)
Base.metadata.create_all(engine)
Session = sessionmaker(bind = engine)
session = Session()

我循环遍历这些文件，把每个文件中的数据插入到数据库里：

for filename in filelist:

    # open the file and extract the information using regex such as:
    location_re = re.compile("<h2>(.*)</h2>",re.M)
    # extract other data

    newpoint = Point(filename, pdate, ptime, location, weather, high, low, lat, lon, image, caption)
    session.add(newpoint)
    session.commit()

每次插入时，我都会看到以下警告：

/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/engine/default.py:230: SAWarning: Unicode type received non-unicode bind param value 'Spitalfields, United Kingdom'
  param.append(processors[key](compiled_params[key]))

而当我尝试对这个表做任何操作，比如：

session.query(Point).all()

我得到的结果是：

Traceback (most recent call last):
  File "./extract_trips.py", line 131, in <module>
    session.query(Point).all()
  File "/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/orm/query.py", line 1193, in all
    return list(self)
  File "/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/orm/query.py", line 1341, in instances
    fetch = cursor.fetchall()
  File "/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/engine/base.py", line 1642, in fetchall
    self.connection._handle_dbapi_exception(e, None, None, self.cursor, self.context)
  File "/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.4p2-py2.5.egg/sqlalchemy/engine/base.py", line 931, in _handle_dbapi_exception
    raise exc.DBAPIError.instance(statement, parameters, e, connection_invalidated=is_disconnect)
sqlalchemy.exc.OperationalError: (OperationalError) Could not decode to UTF-8 column 'points_location' with text 'Le Pré-Saint-Gervais, France' None None

我希望能够正确存储地点名称，并保持原始字符不变。任何帮助都将非常感谢。

数据库正则表达式 unicode sqlalchemy 数据提取非ascii字符 sqlite 编码问题

SQLAlchemy 的 Unicode 问题

3 个回答

撰写回答