python程序在不使用DataFram的情况下删除具有数字序列的重复字符串

2024-03-29 00:31:07 发布

您现在位置:Python中文网/ 问答频道 /正文

用数字序列替换csv文件的重复字符串,而不使用数据帧

python I have a csv file that contain 4 column i want to replace a string of all the column with a sequence of number if any duplicate will be there then it should give the prevoius number.for that i have written this code which return a three dicts:dict1,dict2,dict3 now i want to print that dictionary value in a file like given in below fig.

with open(tempFile, 'r', encoding="utf8") as csvfile: 
    # creating a csv reader object 
        csvreader = csv.reader(csvfile, delimiter=',')
        next(csvreader, None)
        firstRow = next(csvreader)

        NameCount = 1
        AddressCount=1
        EmailCOunt=1
        input_dict={firstRow[1]:NameCount}
        input_dict2={firstRow[2]:AddressCount}
        input_dict3={firstRow[3]:EmailCOunt}

        dict1={firstRow[0]:NameCount}
        dict2={firstRow[0]:AddressCount}
        dict3={firstRow[0]:EmailCOunt}

        for row in csvreader:

            value = input_dict.get(row[1])

            if value is None:
                NameCount = NameCount + 1
                input_dict.update({row[1]:NameCount})
                dict1.update({row[0]: NameCount})
#                 input_dict2.update({row[2]:counter})
#                 dict3.update({row[0]: counter})

            else:
                input_dict.update({row[1]: value})
                dict1.update({row[0]: value})
#             
#                 input_dict2.update({row[2]: value1})
#                 dict3.update({row[0]: value1})
#             value = input_dict2.get(row[2])

            value1 = input_dict2.get(row[2])
            if value1 is None:
                AddressCount = AddressCount + 1
                input_dict2.update({row[2]:AddressCount})
                dict2.update({row[0]: AddressCount})
            else:
                input_dict2.update({row[2]: value1})
                dict2.update({row[0]: value1})

            value2 = input_dict3.get(row[3])
            if value2 is None:
                EmailCOunt = EmailCOunt + 1
                input_dict3.update({row[3]:EmailCOunt})
                dict3.update({row[0]: EmailCOunt})
            else:
                input_dict3.update({row[3]: value2})
                dict3.update({row[0]: value2})

        print('dict1-', dict1)
        print('dict2-', dict2)
        print('dict3-', dict3)[this is the image of my input csv file in which i have replaced the duplicated string of col 1,2,3 with seq no. by using dicts[this is how i need my output look like after string replacement][1]  ][1]

这是写入csv文件的输入数据:

job_Id  Name        Address     Email
1   snehil singh    marathalli  ss@gmail.com
2   salman      marathalli  ss@gmail.com
3   Amir        HSR     ar@gmail.com
4   Rakhesh     HSR     rakesh@gmail.com
5   Ram     marathalli  r@gmail.com
6   Shyam       BTM     ss@gmail.com
7   salman      HSR     ss@gmail.com
8   Amir        BTM     ar@gmail.com
9   snehil singh    Majestic    sne@gmail.com

我无法得到的必要输出是:

job_Id  Name    Address Email
1          1       1    1
2          2       1    1
3          3       2    2
4          4       2    3
5          5       1    4
6          6       3    1
7          2       2    1
8          3       3    2
9          1       4    5

请帮忙。。。。。。。。你知道吗

嗨,伙计们,我试过用这种方式,它的工作。。你知道吗

count=1
            iter_obj1 = iter(dict1.values())
            iter_obj2= iter(dict2.values())
            iter_obj3 = iter(dict3.values())
            while True:
                try:
                    element1 = next(iter_obj1)
                    element2 = next(iter_obj2)
                    element3 = next(iter_obj3)
                    s = count, element1, element2, element3
                    print(s)

                    with open("snehil.csv", 'w') as f:
                        f.write('\n')
                        f.write(json.dumps(s)+'\n')
                        f.write(line)
                    count=count +1
                except StopIteration:
                    break

输出为:

(1, 1, 1, 1)
(2, 2, 1, 1)
(3, 3, 2, 2)
(4, 4, 2, 3)
(5, 5, 1, 4)
(6, 6, 3, 1)
(7, 2, 2, 1)
(8, 3, 3, 2)
(9, 1, 4, 5)

这是正确的输出,但我无法在csv文件中打印它它只显示最后一行(9,1,4,5)它意味着它在单行中读取所有数据..对于打印,我使用了:

with open("snehil.csv", 'w') as f:
#f.write('\n')
f.write(json.dumps(s)+'\n')

甚至我也尝试用Dataframe将其打印到csv文件中,但出现了如下错误:AttributeError:'tuple'object has no attribute'values' 对于dataframe,我写的是:

df=pd.DataFrame.from_dict(s, orient='index')
print(df)

请帮助我如何得到它在csv文件和打印所有行在不同的细胞…谢谢

程序读取csv文件,用数字替换字符串并将其写入csv文件

import csv
import os 
from io import StringIO
# tempFile="input1.csv"

with open("input1.csv", 'r') as csvfile: 
    # creating a csv reader object 
        reader = csv.reader(csvfile, delimiter=',')
        next(reader, None)

        data = {}
        for row in reader:
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]

        for key in data.keys():
            values = data[key]

            things = list(sorted(set(values), key=values.index))

            for i, x in enumerate(data[key]):
                data[key][i] = things.index(x) + 1

        with open("snehil.csv", "w") as outfile:
            writer = csv.writer(outfile)
            # Write headers
            writer.writerow(data.keys())
            # Make one row equal to one value from each list
            rows = zip(*data.values())
            # Write rows
            writer.writerows(rows)  

执行此程序时,我遇到一个错误:

for header, value in row.items():
AttributeError: 'list' object has no attribute 'items'

请帮帮我,我不明白为什么我会犯这个错误。。。。。。你知道吗


Tags: 文件csvincominputdatavaluewith
1条回答
网友
1楼 · 发布于 2024-03-29 00:31:07

您可以将您的csv读取为dictionary,列出每个键(列)的值,然后使用一组唯一值作为索引。你知道吗

首先我们读取数据:

reader = csv.DictReader(StringIO("""
1,snehil singh,marathalli,ss@gmail.com
2,salman,marathalli,ss@gmail.com
3,Amir,HSR,ar@gmail.com
4,Rakhesh,HSR,rakesh@gmail.com
5,Ram,marathalli,r@gmail.com
6,Shyam,BTM,ss@gmail.com
7,salman,HSR,ss@gmail.com
8,Amir,BTM,ar@gmail.com
9,snehil singh,Majestic,sne@gmail.com""")
, delimiter=",", fieldnames=["job_Id", "Name", "Address", "Email"])

然后,我们将数据重组为一组具有值列表{key_1: [], key_2: []}的键:

data = {}
for row in reader:
    for header, value in row.items():
      try:
        data[header].append(value)
      except KeyError:
        data[header] = [value]

接下来要为每个列表中的每个值指定一个唯一标识符。你知道吗

# Loop through all keys
for key in data.keys():
    values = data[key]

    # Create a list of set to use as unique indexer
    things = list(sorted(set(values), key=values.index))

    # Loop through each value in columns
    for i, x in enumerate(data[key]):

        # Replace old value with unique index
        data[key][i] = things.index(x) + 1

如何在新的csv文件中保存data

由于csv.writerows()接受一个列表,但将其视为一行,因此我们需要重新构造数据,使每一行都是每个列表中的一个值。这可以通过zip()实现:

with open("test.csv", "w") as outfile:
    writer = csv.writer(outfile)
    # Write headers
    writer.writerow(data.keys())
    # Make one row equal to one value from each list
    rows = zip(*data.values())
    # Write rows
    writer.writerows(rows)

相关问题 更多 >