如何平均两个文件中的值?

2024-05-15 23:30:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个文件矩阵,看起来像这样

文件1:

{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10].....'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}

文件2:

{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,
0.26, 0.11].....'key100',g,l,i,o,+: [0.2, 0.0, 0.23, 0.16, 0.21]}

两个文件都有相同的“密钥”。我想平均两个文件之间的值,因此结果文件如下所示:

所需的输出文件:

{'key1',g,l,i,o,+: [0.0, 0.0, 0.94, 0.04, 0.01],'key2',g,l,i,o,+: [0.05, 0.15, 0.925,
0.26, 0.105].....'key100',g,l,i,o,+: [0.15, 0.1, 0.29, 0.175, 0.205]}

我曾经考虑过我可以编写的python脚本,但由于我对这一点非常陌生,因此欢迎您提出任何快速的想法:

import gzip
import numpy as np
inFile1 = gzip.open('/home/file1')
inFile2 = gzip.open('/home/file2')

inFile.next()
for line in inFile:
    cols = line.strip().split('\t')
    data = cols[6:]

for line in inFile2:
cols = line.strip().split('\t')
    data2 = cols[6:]

newdata = (data + data2)/2

Tags: 文件inimporthomeforlineopeninfile
2条回答

可以使用regex替换字符串并使其与JSON兼容。然后您可以轻松地将其转换为dict,然后只需使用普通python分析数据(比较dict):

import re
import json

s = '''{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,
0.26, 0.10],'key100',g,l,i,o,+: [0.1, 0.1, 0.29, 0.19, 0.20]}'''

s2 = re.sub('\'(key\d+)\',g,l,i,o,\+', r'"\1"', s)
print(s2)
d = json.loads(s2)
print(d)

问题在于数据格式,正如沃丁所说:

what is this format? It looks a bit like a Python dict, but the ,g,l,i,o,+ doesn't make sense for a dict.

我试过用你的数据,你可以从以下代码中得到提示和帮助:

我试过了

文件1.txt

{'key1',g,l,i,o,+: [0.0, 0.0, 0.92, 0.02, 0.01],'key2',g,l,i,o,+: [0.1, 0.2, 0.90,0.26, 0.10]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}

文件2.txt:

{'key1',g,l,i,o,+: [0.0, 0.0, 0.96, 0.06, 0.01],'key2',g,l,i,o,+: [0.0, 0.1, 0.95,0.26, 0.11]}
{'key3',g,l,i,o,+: [0.0, 0.0, 0.98, 0.02, 0.01],'key4',g,l,i,o,+: [0.1, 0.2, 0.90,0.268, 0.10]}

代码:

import re
pattern=r"('key\w+',g,l,i,o,\+):\s(\[.+?\])"
with open('File1.txt','r') as f:
    for line in f:
        average = {}
        pr=re.finditer(pattern,line)
        for find in pr:
            with open('File2','r') as ff:
                for line in ff:

                    for find1 in re.finditer(pattern,line):
                        if find.group(1)==find1.group(1):
                            average_part=list(map(lambda x: sum(x) / len(x), list(zip(eval(find.group(2)),eval(find1.group(2))))))
                            rest_part=find.group(1)
                            average[rest_part]=average_part
        print(average)

output:

{"'key2',g,l,i,o,+": [0.05, 0.15000000000000002, 0.925, 0.26, 0.10500000000000001], "'key1',g,l,i,o,+": [0.0, 0.0, 0.94, 0.04, 0.01]}
{"'key3',g,l,i,o,+": [0.0, 0.0, 0.98, 0.02, 0.01], "'key4',g,l,i,o,+": [0.1, 0.2, 0.9, 0.268, 0.1]}

相关问题 更多 >