我最初转储了一个包含特定句子的文件,使用:
with open(labelFile, "wb") as out:
json.dump(result, out,indent=4)
JSON中的这句话看起来像:
"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .",
然后我继续通过以下方式将其加载到:
with open(sys.argv[1]) as sentenceFile:
sentenceFile = json.loads(sentenceFile.read())
对其进行处理,然后使用以下命令将其写入CSV:
with open(sys.argv[2], 'wb') as csvfile:
fieldnames = ['x','y','z'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for sentence in sentence2locations2values:
sentence = unicode(sentence['parsedSentence']).encode("utf-8")
writer.writerow({'x': sentence})
在Excel for Mac中打开的CSV文件中有这样一句话:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
然后,我继续将这个从Excel for Mac转到Google Sheets,在那里:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
注意,非常稍微不同的是,Â
替换了Ã
。你知道吗
然后给它贴上标签,把它带回到Excel for Mac中,这时它又回到:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
如何在CSV中读取包含如下句子的内容:
-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating NUMBER_SLOT per year , is a significant contributor to its population growth à cents  $  `` a daily quota of 150 Mainland Chinese with family ties in LOCATION_SLOT are granted a `` one way permit '' .
到如下值:
"-LSB- 97 -RSB- However , the influx of immigrants from mainland China , approximating 45,000 per year , is a significant contributor to its population growth \u00c3 cents \u00c2 $ \u00c2 `` a daily quota of 150 Mainland Chinese with family ties in Hong Kong are granted a `` one way permit '' .",
所以它匹配了这个问题开始时原始json转储中的内容?你知道吗
编辑
我检查了一下,发现从\u00c3
到Ã
(googlesheets中的格式)的编码实际上是拉丁文8。你知道吗
编辑
我运行了enca
,看到原来转储的文件是7位ASCII字符,我的CSV是unicode。所以我需要加载为unicode并转换为7位ASCII?你知道吗
我想出了解决办法。解决方案是将CSV文件从其原始格式(标识为
UTF-8
)解码,然后句子变成原始的。所以:发生的非常奇怪的事情是,当我在excelformac中编辑CSV文件并保存时,每次它似乎都转换成不同的编码。我警告其他用户这是一个巨大的头痛。你知道吗
相关问题 更多 >
编程相关推荐