在Python中创建两个CSV文件的笛卡尔结果(交叉连接)
我尝试通过交叉连接两个现有的csv文件来创建一个新的csv文件。
csv文件 #1:
hour Elevation Azimuth x y z sunx suny sunz
06:29:00 -0.833 67.72 0.379094033 0.925243946 -0.014538068 0.379094033 0.925243946 -0.014538068
07:00:00 6.28 68.75 0.360264063 0.92641472 0.109387255 0.360264063 0.92641472 0.109387255
csv文件 #2:
ID SURFACES A1X A1Y A1Z A2X A2Y A2Z B1X B1Y B1Z B2X B2Y B2Z AX AY AZ BX BY BZ ABX ABY ABZ planex planey planez
1 GROUND 800085.3323 961271.977 -3.07E-18 800080.8795 961246.1978 -3.07E-18 800097.1572 961269.9344 -3.07E-18 800085.3323 961271.977 -3.07E-18 4.4528 25.7792 0.00E+00 11.8249 -2.0426 0.00E+00 0 0 -313.9317514 0 0 -1
2 ROOF 800019.3994 961242.7732 12 800021.442 961254.5981 12 800090.3488 961230.5181 12 800019.3994 961242.7732 12 -2.0426 -11.8249 0.00E+00 70.9494 -12.2551 0.00E+00 0 0 864.0018273 0
我想要这两个文件的笛卡尔积(也就是每个小时和所有表面的组合,就像在SQL中进行交叉连接一样)。
这是我想要的效果示意图:
http://dotnetslackers.com/images/articleimages/sqljoins5.jpg
3 个回答
0
根据用户的评论更新后的版本:
f1 = open("file1")
f2 = open("file2")
f3 = open("result", "wt")
for a in f1:
for b in f2:
f3.write(a.rstrip('\n'))
f3.write(' ')
f3.write(b)
8
你可以使用pandas来实现,就像在SQL中对两个表进行join
操作一样。它们之间的区别是:
使用你提供的文件example1.csv
和example2.csv
:
import pandas as pd
df_1 = pd.read_csv('example1.csv', delim_whitespace=True)
df_2 = pd.read_csv('example2.csv', delim_whitespace=True)
df_1['key'] = 1
df_2['key'] = 1
product = pd.merge(df_1, df_2, on='key')
product[['hour', 'SURFACES']]
结果是:
hour SURFACES
0 06:29:00 GROUND
1 06:29:00 ROOF
2 07:00:00 GROUND
3 07:00:00 ROOF
2
我不知道有没有现成的解决方案,所以我自己做了这个:
import csv
from itertools import product
def main():
with open('file1.csv', 'rb') as f1, open('file2.csv', 'rb') as f2:
reader1 = csv.reader(f1, dialect=csv.excel_tab)
reader2 = csv.reader(f2, dialect=csv.excel_tab)
# Step 1: Read and write the headers separately.
header1, header2 = next(reader1), next(reader2)
with open('output.csv', 'wb') as out:
writer = csv.writer(out, dialect=csv.excel_tab)
writer.writerow(header1 + header2)
# Step 2: Write the product of the rest of the rows.
writer.writerows(
row1 + row2 for row1, row2 in product(reader1, reader2))
main()
涉及到的文件有:
file1.csv
hour Elevation Azimuth
06:29:00 -0.833 67.72
07:00:00 6.28 68.75
file2.csv
ID SURFACES
1 GROUND
2 ROOF
你会得到以下的 output.csv
文件:
hour Elevation Azimuth ID SURFACES
06:29:00 -0.833 67.72 1 GROUND
06:29:00 -0.833 67.72 2 ROOF
07:00:00 6.28 68.75 1 GROUND
07:00:00 6.28 68.75 2 ROOF