csv列值转到新行导致加载错误

2条回答

网友

1楼 · 编辑于 2024-05-26 20:45:12

修复文件：

使用m = re.findall('(?<=[a-zA-Z])\s+\\n[a-zA-Z]', text)查找类似,green \ngrape的情况
- 模式将找到alpha \nalpha并忽略alpha \nnumeric
- m将是所有匹配项的列表（例如[' \ng']）
- .replace(' \ng', ' g')，结果是,green grape
用^{}查找所有文件
- .rglob查找所有子目录。如果所有文件都在一个目录中，请使用.glob
- pathlib将路径视为对象而不是字符串。因此，pathlib对象有许多方法。你知道吗
- .stem返回文件名
- .suffix返回文件扩展名（例如.csv）
这不会覆盖现有文件。它将创建一个新文件，在名称中添加_fixed。你知道吗

import re
from pathlib import Path

# list of all the files
files = list(Path(r'c:\some_path').rglob('*.csv'))

# iterate through each file
for file in files:

    # create new filename name_fixed
    new_file = file.with_name(f'{file.stem}_fixed{file.suffix}')

    # read all the text in as a string
    text = file.read_text()

    # find and fix the sections that need fixing
    m = re.findall('(?<=[a-zA-Z])\s+\\n[a-zA-Z]', text)
    for match in m:
        text = text.replace(match, f' {match[-1:]}')
    text_list = text.split('\n')
    text_list = [x.strip() for x in text_list]

    # write the new file
    with new_file.open('w', newline='') as f:
        w = csv.writer(f, delimiter=',')
        w.writerows([x.split(',') for x in text_list])

示例：

在`.csv`中包含以下内容：

orderid,fruit,count,person  
3523,apple,84,peter  
2522,green  
grape, 99, mary   
1299, watermelon, 93, paul
3523,apple,84,peter  
2522,green  
banana, 99, mary   
1299, watermelon, 93, paul
3523,apple,84,peter  
2522,green  
apple, 99, mary   
1299, watermelon, 93, paul

新文件：

orderid,fruit,count,person
3523,apple,84,peter
2522,green grape, 99, mary
1299, watermelon, 93, paul
3523,apple,84,peter
2522,green banana, 99, mary
1299, watermelon, 93, paul
3523,apple,84,peter
2522,green apple, 99, mary
1299, watermelon, 93, paul

创建数据帧：

import pandas as pd

new_files = list(Path(f'c:\some_path').glob('*_fixed.csv'))
df = pd.concat([pd.read_csv(f) for f in new_files])

网友

2楼 · 编辑于 2024-05-26 20:45:12

解决方案

下面是另一个解决方案：

这里的逻辑是首先找到以4位数字开头的行。你知道吗

B.一旦行被识别，任何一行（除了最上面的一行：标题行）

没有以4位数字开头的
没有三个分隔的','

将追加到上一行。你知道吗

C.最后，在一行的末尾删除任何空白，所有的行放在一起形成一个字符串，如果用户愿意，可以将其写入.csv文件。你知道吗

我们使用io.StringIO作为数据帧加载这个字符串。你知道吗

示例-1

import pandas as pd
from io import StringIO
import re

def get_clean_data(lines):
    target_lines = [re.findall('^\d{4}', line) for line in lines]
    target_lines_dict = dict((i, val[0]) if (len(val)>0) else (i, None) for i,val in enumerate(target_lines))

    correct_lines = list()
    line_index = 0
    for i,line in enumerate(lines):
        if i==0:
            correct_lines.append(line.strip())
        if i>0:
            if target_lines_dict[i] is not None:
                correct_lines.append(line.strip())
                line_index +=1
            else:
                correct_lines[line_index] += ' ' + line.strip()                
    correct_lines = [re.sub(',\s*', ', ', line)+'\n' for line in correct_lines]
    ss = ''.join(correct_lines)
    return ss

# Dummy Data
s = """
orderid,fruit,count,person  
3523,apple,84,peter  
2522,green  
grape, 99, mary   
1299, watermelon, 93, paul
"""
lines = s.strip().split('\n')

# In case of a csv file, use readlines:
# with open('csv_file.csv', 'r') as f:
#     lines = f.readlines()

# Get cleaned data
ss = get_clean_data(lines)

# Make Dataframe
df = pd.read_csv(StringIO(ss), sep=',')
print(df)

输出：

   orderid         fruit   count  person
0     3523         apple      84   peter
1     2522   green grape      99    mary
2     1299    watermelon      93    paul

示例-2

现在让我们使用以下虚拟数据。你知道吗

s = """
orderid,fruit,count,person  
3523,apple,84,peter  
2522,green  
grape, 99, mary   
1299, watermelon, 93, paul
3523,apple,84,peter  
2522,green  
banana, 99, mary   
1299, watermelon, 93, paul
3523,apple,84,peter  
2522,green  
apple, 99, mary   
1299, watermelon, 93, paul
"""

输出：

   orderid          fruit   count  person
0     3523          apple      84   peter
1     2522    green grape      99    mary
2     1299     watermelon      93    paul
3     3523          apple      84   peter
4     2522   green banana      99    mary
5     1299     watermelon      93    paul
6     3523          apple      84   peter
7     2522    green apple      99    mary
8     1299     watermelon      93    paul

在vscode中查看

熊猫阅读\u csv

通缉栏

修复文件：

示例：

在`.csv`中包含以下内容：

新文件：

创建数据帧：

解决方案

示例-1

示例-2

相关问题更多 >

编程相关推荐

热门问题

热门文章

csv列值转到新行导致加载错误

在vscode中查看

熊猫阅读\u csv

通缉栏

修复文件：

示例：

在.csv中包含以下内容：

新文件：

创建数据帧：

解决方案

示例-1

示例-2

相关问题 更多 >

编程相关推荐

热门问题

热门文章

在`.csv`中包含以下内容：

相关问题更多 >