csv列值转到新行导致加载错误问题的回答

csv列值转到新行导致加载错误

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>修复文件：</h2> <ul> <li>使用<code>m = re.findall('(?<=[a-zA-Z])\s+\\n[a-zA-Z]', text)</code>查找类似<code>,green \ngrape</code>的情况 <ul> <li>模式将找到<code>alpha \nalpha</code>并忽略<code>alpha \nnumeric</code></li> <li><code>m</code>将是所有匹配项的列表（例如<code>[' \ng']</code>）</li> <li><code>.replace(' \ng', ' g')</code>，结果是<code>,green grape</code></li> </ul></li> <li>用<a href="https://docs.python.org/3/library/pathlib.html" rel="nofollow noreferrer">^{<cd9>}</a>查找所有文件 <ul> <li><code>.rglob</code>查找所有子目录。如果所有文件都在一个目录中，请使用<code>.glob</code></li> <li><code>pathlib</code>将路径视为对象而不是字符串。因此，<code>pathlib</code>对象有许多方法。你知道吗</li> <li><code>.stem</code>返回文件名</li> <li><code>.suffix</code>返回文件扩展名（例如<code>.csv</code>）</li> </ul></li> <li>这不会覆盖现有文件。它将创建一个新文件，在名称中添加<code>_fixed</code>。你知道吗</li> </ul> <pre class="lang-py prettyprint-override"><code>import re from pathlib import Path # list of all the files files = list(Path(r'c:\some_path').rglob('*.csv')) # iterate through each file for file in files: # create new filename name_fixed new_file = file.with_name(f'{file.stem}_fixed{file.suffix}') # read all the text in as a string text = file.read_text() # find and fix the sections that need fixing m = re.findall('(?<=[a-zA-Z])\s+\\n[a-zA-Z]', text) for match in m: text = text.replace(match, f' {match[-1:]}') text_list = text.split('\n') text_list = [x.strip() for x in text_list] # write the new file with new_file.open('w', newline='') as f: w = csv.writer(f, delimiter=',') w.writerows([x.split(',') for x in text_list]) </code></pre> <h2>示例：</h2> <h3>在<code>.csv</code>中包含以下内容：</h3> <pre class="lang-py prettyprint-override"><code>orderid,fruit,count,person 3523,apple,84,peter 2522,green grape, 99, mary 1299, watermelon, 93, paul 3523,apple,84,peter 2522,green banana, 99, mary 1299, watermelon, 93, paul 3523,apple,84,peter 2522,green apple, 99, mary 1299, watermelon, 93, paul </code></pre> <h3>新文件：</h3> <pre class="lang-py prettyprint-override"><code>orderid,fruit,count,person 3523,apple,84,peter 2522,green grape, 99, mary 1299, watermelon, 93, paul 3523,apple,84,peter 2522,green banana, 99, mary 1299, watermelon, 93, paul 3523,apple,84,peter 2522,green apple, 99, mary 1299, watermelon, 93, paul </code></pre> <h2>创建数据帧：</h2> <pre class="lang-py prettyprint-override"><code>import pandas as pd new_files = list(Path(f'c:\some_path').glob('*_fixed.csv')) df = pd.concat([pd.read_csv(f) for f in new_files]) </code></pre>

csv列值转到新行导致加载错误

1 个回答

相关Python问题