如何修复“UnicodeDecodeError:”charmap“编解码器无法解码位置29815中的字节0x9d:字符映射到<undefined>”？问题的回答

如何修复“UnicodeDecodeError:”charmap“编解码器无法解码位置29815中的字节0x9d:字符映射到<undefined>”？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<p>目前，我正试图让一个Python 3程序通过Spyder IDE/GUI对一个充满信息的文本文件进行一些操作。但是，在尝试读取文件时，出现以下错误：</p> <pre><code> File "<ipython-input-13-d81e1333b8cd>", line 77, in <module> parser(f) File "<ipython-input-13-d81e1333b8cd>", line 18, in parser data = infile.read() File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29815: character maps to <undefined> </code></pre> <p>程序代码如下：</p> <pre><code>import os os.getcwd() import glob import re import sqlite3 import csv def parser(file): # Open a TXT file. Store all articles in a list. Each article is an item # of the list. Split articles based on the location of such string as # 'Document PRN0000020080617e46h00461' articles = [] with open(file, 'r') as infile: data = infile.read() start = re.search(r'\n HD\n', data).start() for m in re.finditer(r'Document [a-zA-Z0-9]{25}\n', data): end = m.end() a = data[start:end].strip() a = '\n ' + a articles.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(a) start = end # In each article, find all used Intelligence Indexing field codes. Extract # content of each used field code, and write to a CSV file. # All field codes (order matters) fields = ['HD', 'CR', 'WC', 'PD', 'ET', 'SN', 'SC', 'ED', 'PG', 'LA', 'CY', 'LP', 'TD', 'CT', 'RF', 'CO', 'IN', 'NS', 'RE', 'IPC', 'IPD', 'PUB', 'AN'] for a in articles: used = [f for f in fields if re.search(r'\n ' + f + r'\n', a)] unused = [[i, f] for i, f in enumerate(fields) if not re.search(r'\n ' + f + r'\n', a)] fields_pos = [] for f in used: f_m = re.search(r'\n ' + f + r'\n', a) f_pos = [f, f_m.start(), f_m.end()] fields_pos.append(f_pos) obs = [] n = len(used) for i in range(0, n): used_f = fields_pos[i][0] start = fields_pos[i][2] if i < n - 1: end = fields_pos[i + 1][1] else: end = len(a) content = a[start:end].strip() obs.append(content) for f in unused: obs.insert(f[0], '') obs.insert(0, file.split('/')[-1].split('.')[0]) # insert Company ID, e.g., GVKEY # print(obs) cur.execute('''INSERT INTO articles (id, hd, cr, wc, pd, et, sn, sc, ed, pg, la, cy, lp, td, ct, rf, co, ina, ns, re, ipc, ipd, pub, an) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)''', obs) # Write to SQLITE conn = sqlite3.connect('factiva.db') with conn: cur = conn.cursor() cur.execute('DROP TABLE IF EXISTS articles') # Mirror all field codes except changing 'IN' to 'INC' because it is an invalid name cur.execute('''CREATE TABLE articles (nid integer primary key, id text, hd text, cr text, wc text, pd text, et text, sn text, sc text, ed text, pg text, la text, cy text, lp text, td text, ct text, rf text, co text, ina text, ns text, re text, ipc text, ipd text, pub text, an text)''') for f in glob.glob('*.txt'): print(f) parser(f) # Write to CSV to feed Stata with open('factiva.csv', 'w', newline='') as csvfile: writer = csv.writer(csvfile) with conn: cur = conn.cursor() cur.execute('SELECT * FROM articles WHERE hd IS NOT NULL') colname = [desc[0] for desc in cur.description] writer.writerow(colname) for obs in cur.fetchall(): writer.writerow(obs) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何修复“UnicodeDecodeError:”charmap“编解码器无法解码位置29815中的字节0x9d:字符映射到<undefined>”？

1 个回答

相关Python问题