在追加时在Pandas中保留唯一索引（或id）

pd.DataFrame.from_dict({0: ['company1 is bankrupt'], 1: ['company2 and company3 are going to merge. company1 is good'], 2: ['company3 is going to create a new product. CEO of company3 says it will be a good product']}, orient='index', columns=['text'])

1条回答

网友

1楼 · 发布于 2024-05-19 02:12:14

为什么不简单地，如你所说，为新闻创建一个ID？它不需要在索引中。简单地做：

df['news_id'] = [i for i in range(len(df))]

这将为每条新闻附加一个ID（可能与索引重复）。现在您只需引用附加行中的news_id，而不用担心索引。你知道吗

编辑：如果你定期更新数据库，你可以简单地使用一个更复杂的ID创建者。比如：

import random
characters = 'abcdefghijklmnopqrstuvwxyz123456789'
used_ids = []
def pick_id():
    global used_ids #this allows us to use and modify the used_ids list in the function
    ID = ''.join([random.choice(characters) for i in range(16)]) #arbitrary length of 16 for IDs
    while ID in used_ids: #this simply repeats the generation of IDs if the ID is already in used_ids
        ID = ''.join([random.choice(characters) for i in range(16)])
    used_ids.append(ID) #now add the used ID to used_ids
    return ID

现在您只需执行以下操作：

df['news_id'] = [pick_id() for i in range(len(df))]

这要求您跟踪news\u id，但也不是那么糟糕：无论何时运行此程序，您都可以这样读取新的数据帧：

used_ids = list(set(list(new_dataframe['news_id'])))

这将给你所有唯一的身份证已经采取。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章