Python：将字段值提取到新列中，写入Excel

import csv with open('test.csv',mode='r') as testFile reader = csv.DictReader(testFile, delimiter=',') for row in reader: ### This is where i assume i need to perform the regex operation on the current row

1条回答

网友

1楼 · 发布于 2024-05-20 00:54:42

下面是一种使用Pandasdf['column'].str.extract()函数的技术

您可以将已编译（或未编译）的正则表达式字符串传递到extract()函数中。这将使用表达式中的命名组，并将这些组提取到具有相同名称的列中

样本数据：

name,file_info
test1,c:\folder1\subfolder1\subfolder2\example1.xls | history 12345 at 2020-01-01
test2,c:\folder1\subfolder1\subfolder2\example2.xls | history 24687 at 2020-01-12
test3,c:\folder1\subfolder1\subfolder2\example3.xls | history 33445 at 2020-01-13
test4,c:\folder1\subfolder1\subfolder2\example4.xls | history 55664 at 2020-01-14

代码：

import os
import pandas as pd
import re

# Define constants
COLS = ['name', 'path', 'file', 'history', 'date']
PATH = './test.csv'
PATH_XL = './test.xlsx'
RE_EXP = re.compile(r'^'
                    '(?P<path>.*)\|\shistory\s'
                    '(?P<history>\d+)\sat\s'
                    '(?P<date>\d{4}-\d{2}-\d{2})$',
                    re.IGNORECASE)

# Read CSV file.
df = pd.read_csv(PATH)
# Create new columns using named regex groups.
df[['path', 'history', 'date']] = df['file_info'].str.extract(RE_EXP)
# Extract the filename from the path using a built-in function.
df['file'] = df['path'].apply(os.path.basename)
# Convert date to datetime format.
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d').dt.date
# Subset DataFrame to only the columns we require.
df = df[COLS]
# Write results to Excel.
df.to_excel(PATH_XL, index=False)

样本数据：

代码：

Excel输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：将字段值提取到新列中，写入Excel

样本数据：

代码：

Excel输出：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >