在上合并记录

网友

1楼 · 编辑于 2024-04-19 08:18:06

下面是一个python脚本。你知道吗

只有当节不是按行的顺序为空时，它才会覆盖。你知道吗

from collections import defaultdict

def merge_lines():
    with open('data.txt', 'r') as file:
        with open('output.txt', 'w') as file_out:
            output_dict = defaultdict(list)
            for line in file:
                split_line = line.split('|')
                # Remove first empty string
                del split_line[0]
                # If we havn't seen this record before then add it to dictionary
                if split_line[0] not in output_dict:
                    output_dict[split_line[0]] = split_line
                else:
                    # If we have seen it then update the sections providing 
                    # they are not emptystring ('')
                    for index, val in enumerate(split_line):
                        if val != '':
                            output_dict[split_line[0]][index] = val

            # Join sections back together and write lines to file
            for line_values in output_dict.values():
                file_out.write('|' + '|'.join(line_values))


if __name__ == "__main__":
    merge_lines()

网友

2楼 · 编辑于 2024-04-19 08:18:06

my %merged_rows;
while (<>) {
   chomp;
   my @fields = split(/\|/, $_, -1);
   my $id = $fields[1];
   my $merged_row = $merged_rows{$id} ||= [];

   $merged_row->[$_] = $fields[$_]
      for grep { length($fields[$_]) || $_ > $#$merged_row } 0..$#fields;
}

for my $id ( sort { $a <=> $b } keys(%merged_rows) ) {
   print(join('|', @{ $merged_rows{$id} }), "\n");
}

如果键都是小数字，那么可以使用数组而不是散列来保存合并的行，从而稍微提高速度。你知道吗

当没有限制时，split删除空的尾部字段，因此|1|a|b|c|||||||与|1|a|b|c相同。你知道吗
$z = $x ||= $y;和$x ||= $y; $z = $x;是一回事
$x ||= $y;与$x = $x || $y;基本相同；如果LHS为false，它将RHS分配给LHS。在上下文中，如果这是我们第一次遇到$merged_rows{$id} = [];，那么它确实是$id。你知道吗
[]创建一个空数组并返回对它的引用。你知道吗

网友

3楼 · 编辑于 2024-04-19 08:18:06

def update_col(l1,l2):
    for i,v in enumerate(l2):
        if not v:
            continue
        l1[i] = v

out = []
for l in open('rec.txt'):
    l = l.strip().split('|')
    for r in out:
            if r[1] == l[1]:
                    update_col(r,l)
                    break
    else:   
            out.append(l)

for l in out:
    print '|'.join(l)

输出
|1|a|b|c|zz|yy|dd|aaa|bbb|ccc| |2|fd|ef|gf|||||||

相关问题更多 >

编程相关推荐

热门问题

热门文章

在上合并记录

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >