neo4j、灯泡与utf8
我在用Python的bulbs库往neo4j数据库里插入和查找数据时遇到了一些麻烦。问题出在字符编码上。我在尝试查找索引中的一个节点时,出现了以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128)
我在网上查找了如何更改neo4j或bulbs中的字符编码,但似乎找不到解决办法。
编辑
这是导致错误的代码:
from bulbs.model import Node
from bulbs.neo4jserver import Graph
from bulbs.property import String
import MySQLdb
import sys
class Topic(Node):
element_type = 'node'
name = String(nullable=False)
g = Graph()
g.add_proxy('topics', Topic)
con = MySQLdb.connect(host='127.0.0.1', user='root', db='wiki_new', charset='utf8')
cur = con.cursor()
cur.execute('SELECT page_title FROM page')
while True:
row = cur.fetchone()
if not row:
break
sys.stdout.write(row[0] + '\n')
nds = g.topics.index.lookup(name=row[0])
if not nds:
g.topics.create(name=row[0])
引发错误的字符串是:!Xóõ。
更新
现在我从一个XML文件(一个维基百科页面的转储)中获取数据,使用的是Python的sax解析器。代码基本上是一样的,但我得到的错误是:
File "graph.py", line 197, in <module>
build_wikipedia_graph(WIKI_DUMP_PATH)
File "graph.py", line 195, in build_wikipedia_graph
filter_handler.parse(open(wiki_dump_path))
File "/usr/lib/python2.7/xml/sax/saxutils.py", line 255, in parse
self._parent.parse(source)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 304, in end_element
self._cont_handler.endElement(name)
File "/home/pedro/wiki/1.0/page_parser.py", line 55, in method
getattr(self._downstream, method_name)(*a, **k)
File "/home/pedro/wiki/1.0/page_parser.py", line 87, in endElement
self.pageCallBack(self.currentPage, self.callbackArgs)
File "graph.py", line 181, in _callback
kgraph.set_links_to(page.title, target)
File "graph.py", line 59, in set_links_to
topic_dst = self._g.topics.get_or_create('name', topic_dst, name=topic_dst)
File "/usr/local/lib/python2.7/dist-packages/bulbs/element.py", line 607, in get_or_create
vertex = self.index.get_unique(key, value)
File "/usr/local/lib/python2.7/dist-packages/bulbs/neo4jserver/index.py", line 335, in get_unique
resp = lookup(self.index_name,key,value)
File "/usr/local/lib/python2.7/dist-packages/bulbs/neo4jserver/client.py", line 878, in lookup_vertex
path = build_path(index_path, vertex_path, index_name, key, value)
File "/usr/local/lib/python2.7/dist-packages/bulbs/utils.py", line 126, in build_path
segments = [quote(str(segment), safe='') for segment in args if segment is not None]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128)
这个错误发生在我尝试创建一个名为:atp-toernooi van montréal/toronto的节点时。
另一个更新
使用更新后的bulbs库,我遇到了不同的错误:
File "/usr/local/lib/python2.7/dist-packages/bulbs/utils.py", line 129, in build_path
segments = [quote(unicode(segment), safe='') for segment in args if segment is not None]
File "/usr/lib/python2.7/urllib.py", line 1238, in quote
return ''.join(map(quoter, s))
KeyError: u'\xe9'
有人能帮忙吗?
谢谢!
1 个回答
0
在Neo4j服务器中,Bulbs把字符串存储为unicode格式。注意,属性类型为字符串的值会被转换成unicode(在Python 3中,unicode字符串是默认的):
可以查看Python的Unicode使用指南:
http://docs.python.org/2/howto/unicode.html#python-2-x-s-unicode-support
首先,确认你的MySQL服务器支持UTF-8编码:
mysql> show character set like 'utf%';
另外,注意我的修改和评论...
from bulbs.model import Node
from bulbs.neo4jserver import Graph
from bulbs.property import String
import MySQLdb
import sys
class Topic(Node):
element_type = 'node' # by convention name this 'topic'
name = String(nullable=False)
g = Graph()
g.add_proxy('topics', Topic)
# Make sure use_unicode to set True
con = MySQLdb.connect(host='127.0.0.1', user='root', db='wiki_new', use_unicode=True, charset='utf8')
cur = con.cursor()
cur.execute('SELECT page_title FROM page')
while True:
row = cur.fetchone()
if not row:
break
sys.stdout.write(row[0] + '\n')
# Use Bulbs' get_or_create method to simplify your code
nds = g.topics.get_or_create(name, row[0], name=row[0])