<h2>解决方案</h2>
<p>下面是另一个解决方案:</p>
<p>这里的逻辑是首先找到以4位数字开头的行。你知道吗</p>
<p>B.一旦行被识别,任何一行(除了最上面的一行:标题行)</p>
<ul>
<li>没有以4位数字开头的</li>
<li>没有三个分隔的<code>','</code></li>
</ul>
<p>将追加到上一行。你知道吗</p>
<p>C.最后,在一行的末尾删除任何空白,所有的行放在一起形成一个字符串,如果用户愿意,可以将其写入.csv文件。你知道吗</p>
<p>我们使用<code>io.StringIO</code>作为数据帧加载这个字符串。你知道吗</p>
<h2>示例-1</h2>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
from io import StringIO
import re
def get_clean_data(lines):
target_lines = [re.findall('^\d{4}', line) for line in lines]
target_lines_dict = dict((i, val[0]) if (len(val)>0) else (i, None) for i,val in enumerate(target_lines))
correct_lines = list()
line_index = 0
for i,line in enumerate(lines):
if i==0:
correct_lines.append(line.strip())
if i>0:
if target_lines_dict[i] is not None:
correct_lines.append(line.strip())
line_index +=1
else:
correct_lines[line_index] += ' ' + line.strip()
correct_lines = [re.sub(',\s*', ', ', line)+'\n' for line in correct_lines]
ss = ''.join(correct_lines)
return ss
# Dummy Data
s = """
orderid,fruit,count,person
3523,apple,84,peter
2522,green
grape, 99, mary
1299, watermelon, 93, paul
"""
lines = s.strip().split('\n')
# In case of a csv file, use readlines:
# with open('csv_file.csv', 'r') as f:
# lines = f.readlines()
# Get cleaned data
ss = get_clean_data(lines)
# Make Dataframe
df = pd.read_csv(StringIO(ss), sep=',')
print(df)
</code></pre>
<p><strong>输出</strong>:</p>
<pre><code> orderid fruit count person
0 3523 apple 84 peter
1 2522 green grape 99 mary
2 1299 watermelon 93 paul
</code></pre>
<h2>示例-2</h2>
<p>现在让我们使用以下虚拟数据。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>s = """
orderid,fruit,count,person
3523,apple,84,peter
2522,green
grape, 99, mary
1299, watermelon, 93, paul
3523,apple,84,peter
2522,green
banana, 99, mary
1299, watermelon, 93, paul
3523,apple,84,peter
2522,green
apple, 99, mary
1299, watermelon, 93, paul
"""
</code></pre>
<p><strong>输出</strong>:</p>
<pre><code> orderid fruit count person
0 3523 apple 84 peter
1 2522 green grape 99 mary
2 1299 watermelon 93 paul
3 3523 apple 84 peter
4 2522 green banana 99 mary
5 1299 watermelon 93 paul
6 3523 apple 84 peter
7 2522 green apple 99 mary
8 1299 watermelon 93 paul
</code></pre>