循环浏览csv,将新值写入csv

2024-04-19 18:51:00 发布

您现在位置:Python中文网/ 问答频道 /正文

导言

自从我在scrapy工作了两个月以来,我休息了一下,开始用python学习文本格式。 我的webcrawler提供了一些数据,这些数据存储在一个.csvFile文件中,如下所示:

My.csvFile

SKU
"
                Article nr. : 560821800 / D26 x H10 cm
            "
"
                Article nr. : 560828100 / D14 x H11 cm
            "
"
                Article nr. : 560821400 / D13 x H10 cm
            "
"
                Article nr. : 560821900 / L17 x W17 x H14
            "
"
                Article nr. : 560828900 / L17 x W17 x H14
            "
"
                Article nr. : 560821600 / D16 x H13 cm
            "
"
                Article nr. : 560828300 / D16 x H13 cm
            "
"
                Article nr. : 560827900 / D13 x H10 cm
            "
"
                Article nr. : 560829000 / L17 x W17 x H14
            "

有太多的空白和其他我不想要的东西,所以我读了关于“RegularExpression”的文章

现在我玩了一下,设法删除了所有空格和其他不需要的数字,所以我只有代表特定产品id的f.e560821800

现在我打开了.csv文件,编辑了这些值,并试图将其写入一个新的.csv文件,我称之为输出

“Output”文件只包含一列,我想称之为“SKU”

代码

import csv
import re
    
with open(r'C:\Users\y.y\OneDrive - company name\Python3_Textformatierung\sku.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        sku = row.pop()
        sku = re.sub(r'[\s\t\n]+|(\.)+|(\:)', '', sku)
        sku = sku.replace('Articlenr', '')
        print(sku)#string splitted to ['560827900', 'D13xH10cm']
        string_to_list = sku.split('/')#splits string to list
        print(string_to_list)
        sku_string = string_to_list.pop(0)
        print(sku_string)#only value of sku remains

运行此代码后,我得到以下输出:

SKU
['SKU']
SKU
560821800/D26xH10cm
['560821800', 'D26xH10cm']
560821800
560828100/D14xH11cm
['560828100', 'D14xH11cm']
560828100
560821400/D13xH10cm
['560821400', 'D13xH10cm']
560821400
560821900/L17xW17xH14
['560821900', 'L17xW17xH14']
560821900
560828900/L17xW17xH14
['560828900', 'L17xW17xH14']
560828900
560821600/D16xH13cm
['560821600', 'D16xH13cm']
560821600
560828300/D16xH13cm
['560828300', 'D16xH13cm']
560828300
560827900/D13xH10cm
['560827900', 'D13xH10cm']
560827900
560829000/L17xW17xH14
['560829000', 'L17xW17xH14']
560829000

我的问题 我想收集sku_string的每一个值,并将它们写入output.csv文件,但只将字段名传递给新文件

我使用以下代码尝试了此任务:

#write data to csv with fieldname['SKU']
with open(r'C:\Path\to\Output.csv', 'w') as csv_file:
    fieldname = ['SKU']
    csv_writer = csv.DictWriter(csv_file, fieldnames=fieldname, delimiter=',')
    csv_writer.writeheader()
    print(sku_string)
    for s in row:
        csv_writer.writerow(['SKU', sku_string])

我还认识到,我使用的最后一个print语句(仅用于测试),它只包含一个值,我缺少什么

我是一个真正的初学者,我在这里读了很多关于stackoverflow的循环,但我无法将解决方案转移到我的问题上,因为它们中的大多数对于我的实际技能水平来说太高了

更新

我重新编写了代码,但它仍然只将最后一次输出写入output.csv文件

import csv
import re


with open(r'Path\to\sku.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        row = row.pop()#convert from list to str        
        row = row.split('/')#convert str to list with 2 elements, splitted by '/'
        sku_string = row.pop(0)#string with Articlenr + SKU
        sku_string = sku_string.split(':')
        only_sku = sku_string.pop()
        #every string contains only sku now
        print(only_sku)     
        
with open(r'C:\Path\to\Output.csv', 'w') as csv_file:
    fieldname = ['SKU']
    writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldname)
    writer.writeheader()
    for x in only_sku:
        writer.writerow({'SKU' : only_sku})

Output.csv

output.csv


Tags: 文件csvtostringwitharticlecmnr
2条回答

我采用了一种稍微不同的方法,我将您的.csv文件更改为.txt文件,老实说,无论您拥有什么,都不像CSV结构

以下是我的想法:

import csv

with open("sample.txt") as f:
    lines = f.readlines()

parsed_lines = [l for l in [l.strip() for l in lines] if l != '"' and l != "SKU"]
parsed_lines = [l.replace("Article nr. : ", "").split("/") for l in parsed_lines]

with open("output.csv", "w") as output:
    w = csv.writer(output)
    w.writerow(["Article nr.", "Dimensions"])
    w.writerows(parsed_lines)

输出:

  Article nr.  Dimensions
      -         -
   560821800   D26 x H10 cm
   560828100   D14 x H11 cm
   560821400   D13 x H10 cm
   560821900   L17 x W17 x H14
   560828900   L17 x W17 x H14
   560821600   D16 x H13 cm
   560828300   D16 x H13 cm
   560827900   D13 x H10 cm
   560829000   L17 x W17 x H14

或者在.csv文件中:

enter image description here

每行中的新值将写入only_sku循环中的for row in csv_reader:字符串。如果要在循环之外访问这些值,则需要在某个位置收集它们,例如通过将它们附加到列表中

因此,您的读取循环变为:

all_sku = []
with open(r'Path\to\sku.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        row = row.pop()#convert from list to str        
        row = row.split('/')#convert str to list with 2 elements, splitted by '/'
        sku_string = row.pop(0)#string with Articlenr + SKU
        sku_string = sku_string.split(':')
        only_sku = sku_string.pop().strip()
        #every string contains only sku now
        print(only_sku)     
        all_sku.append(only_sku)

请注意,我们如何使用all_sku.append()将读取的最新值添加到所有值的列表中。我们还希望从值中去除前导和尾随空格

然后你可以在另一个循环中这样写:

with open(r'C:\Path\to\Output.csv', 'w') as csv_file:
    fieldname = ['SKU']
    writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldname)
    writer.writeheader()
    for x in all_sku:
        writer.writerow({'SKU' : x})

注意,我们现在循环all_sku并写入该列表的每个元素


现在,如果您希望将维度也写入新的csv文件,那么您也需要跟踪该值。与我们之前创建的字符串列表不同,我们更容易创建要稍后传递给dictwriter.writerow()列表。所以我们有:

all_rows = []
with open(r'Path\to\sku.csv', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        row = row.pop() #convert from list to str        
        row = row.split('/')#convert str to list with 2 elements, splitted by '/'
        sku_string = row[0]
        dims_string = row[1].strip()
        sku_string = sku_string.split(':')
        only_sku = sku_string[1].strip()
        all_sku.append({'SKU': only_sku, 'Dimensions': dims_string})

然后这样写:

with open(r'C:\Path\to\Output.csv', 'w') as csv_file:
    fieldname = ['SKU', 'Dimensions']
    writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldname)
    writer.writeheader()
    for row_dict in all_rows:
        writer.writerow(row_dict)

相关问题 更多 >