如何获取同一个单词的所有信息

网友

1楼 · 编辑于 2024-04-26 09:30:24

尝试这个方法，在解析文本文件时创建一个id的dict，以跟踪包含了哪些id。在解析过程中，只使用唯一id编写一个新的文本文件。你知道吗

file = open("file.txt","r")
file_new = open("file_new.txt","w")
id_list = {}    

for line in file:
     #third value of the line is the id
     id = line.split(",")[2]

     #if id is new, we add its corresponding line to the new file and record
     if id not in id_list:
         id_list[id] = True
         file_new.write(line)

网友

2楼 · 编辑于 2024-04-26 09:30:24

首先，我假设你的数据是这样的。你知道吗

name:z,surnames:zz,id:zzz,country:zzzz
name:y,surnames:yy,id:yyy,country:yyyy
name:x,surnames:xx,id:xxx,country:xxxx
name:z,surnames:zz,id:zzz,country:zzzz

我建议您使用pandas包及其read_csv函数。它可以为您提供一个DataFrame对象，便于处理数据表。你知道吗

import pandas as pd
df = pd.read_csv(your_file_here, header=None, names=['name', 'surnames', 'id', 'country'])  # I am assuming you don't have header
temp = df[df.name == 'name:z'].iloc[0]  # save the first row with name:z
df_new = df[df.name != 'name:z']  # drop all rows with name z
df_new = df_new.append(temp)  # append the first row back
df_new.to_csv(new_file_name)  # if you want to save

网友

3楼 · 编辑于 2024-04-26 09:30:24

使用文件路径作为参数，您可以为每行选择id并将其保存在dict上

import re
import sys

ref = dict()
with open(sys.argv[1], 'r') as f:
    for line in f.readlines():
       m = re.search(".*id:(\w*),", line)
       if m is not None and m.group(1) is not None:
           ref[m.group(1)] = line.strip()

for i in ref:
    print(ref[i])

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何获取同一个单词的所有信息

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >