从两个不同的CSV文件中复制与第一个元素匹配的行,然后在Python中一个接一个地粘贴到第三个CSV文件中

2024-06-07 03:37:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个CSV文件:

dump1.csv

Primary     attribute1    attribute2    attribute3
rowA        3992372839    3778338494    9180339101
rowB        8291392010    3739203044    2840493019

dump2.csv

Primary     attribute1    attribute2    attribute3
rowA        8911849302    9018383910    8103293202
rowB        7310393021    8301940301    7209301030

现在我想创建第三个CSV文件dump1_dump2.csv

  1. With same headers of dump1.csv & dump2.csv
  2. Match Rows and copy these one by one in the third file.
  3. Rename the row names prepending dump1_ and dump2_ respectively to distinguish within these.
  4. After two rows picked up from each dump, third CSV file should put a row there which will give us the change percentage.

第三个CSVdump1_dump2.csv应该如下所示:

Primary     attribute1    attribute2    attribute3
dump1_rowA  3992372839    3778338494    9180339101
dump2_rowA  8911849302    9018383910    8103293202
change %    123.22        138.68        -11.73
dump1_rowB  8291392010    3739203044    2840493019
dump2_rowB  7310393021    8301940301    7209301030
change %    -11.83        122.02        153.80

我写了一个小片段:

import csv

f1 = open('dump2.csv', 'r')
f2 = open('dump2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

finallist = list(c2)

for hosts_row in c1:
    row = 1
    for final_row in finallist:
        results_row = hosts_row
        if hosts_row[0] == final_row[0]:
            # copy line from dump1.csv, paste it to dump1_dump2.csv and modify hosts_row[0] by appending dump1 & dump2 respectively
            results_row_1 = #new row containing content of dump1
            results_row_2 = #new row containing content of dump2
            break
        row = row + 2
    c3.writerow(results_row_1)
    c3.writerow(results_row_2)

f1.close()
f2.close()
f3.close()

我使用的是python3.6。你知道吗

任何帮助都将不胜感激!你知道吗


Tags: andofcsvresultsrowhostsprimaryattribute1
1条回答
网友
1楼 · 发布于 2024-06-07 03:37:59

欢迎使用Stackoverflow。你知道吗

回答你的问题,你可以使用熊猫更容易和更快的评估。你知道吗

import pandas as pd

df1 = pd.read_csv("dump1.csv")
df2 = pd.read_csv("dump2.csv")

df1["Primary"] = "dump1_"+df1["Primary"].astype(str)
df2["Primary"] = "dump2_"+df2["Primary"].astype(str)

df3 = pd.concat([df1,df2]).sort_index()


df4 = pd.DataFrame(columns=["Primary","attribute1","attribute2","attribute3"])
indx = df3.index.value_counts()
for i, j in indx.iteritems():
    if j == 2:
        tempdf = df3.loc[1].append({"attribute1":((df3.loc[i]["attribute1"].astype(int).diff()/df3.loc[i]["attribute1"].astype(int).values[0])*100).values[1],"attribute2":((df3.loc[i]["attribute2"].astype(int).diff()/df3.loc[i]["attribute2"].astype(int).values[0])*100).values[1], "attribute3":((df3.loc[i]["attribute3"].astype(int).diff()/df3.loc[i]["attribute3"].astype(int).values[0])*100).values[1], "Primary":"Change %"}, ignore_index=True)
        df4 = tempdf.append(df4)

df4.to_csv("dump1_dump2.csv", index=False)

输出:

Primary     attribute1    attribute2    attribute3
dump1_rowA  3992372839    3778338494    9180339101
dump2_rowA  8911849302    9018383910    8103293202
Change %       123.222       138.687      -11.7321
dump1_rowB  8291392010    3739203044    2840493019
dump2_rowB  7310393021    8301940301    7209301030
Change %      -11.8315       122.024       153.805

希望这能解决你的问题,如果不请让我知道。 谢谢

相关问题 更多 >