Pandas，搜索真的很难吗？

In[1]: d ={ "_id": [ "Y100", "Y100", "Y100", "Y101", "Y101", "Y101", "Y102", "Y102", "Y102" ] , "paper_title": [ "translation using information on dialogue participants", "translation using information on dialogue participants", "translation using information on dialogue participants", "#emotional tweets", "#emotional tweets", "#emotional tweets", "#supportthecause: identifying motivations to participate in online health campaigns", "#supportthecause: identifying motivations to participate in online health campaigns", "#supportthecause: identifying motivations to participate in online health campaigns" ] , "reference": [ "beattie, gs (2005, november) #supportthecause: identifying motivations to participate in online health campaigns may 31, 2017, from", "burton, n (2012, june 5) depressive realism retrieved may 31, 2017, from", "gotlib, i h, 27 hammen, c l (1992) #supportthecause: identifying motivations to participate in online health campaigns new york: wiley", "paul ekman 1992 an argument for basic emotions cognition and emotion, 6(3):169200", "saif m mohammad 2012a #tagspace: semantic embeddings from hashtags in mail and books to appear in decision support systems", "robert plutchik 1985 on emotion: the chickenand-egg problem revisited motivation and emotion, 9(2):197200", "alastair iain johnston, rawi abdelal, yoshiko herrera, and rose mcdermott, editors 2009 translation using information on dialogue participants cambridge university press", "j richard landis and gary g koch 1977 the measurement of observer agreement for categorical data biometrics, 33(1):159174", "tomas mikolov, kai chen, greg corrado, and jeffrey dean 2013 #emotional tweets arxiv:13013781" ] } import pandas as pd df=pd.DataFrame(d) df

def return_id(paper_title,reference, _id): if (paper_title is None) or (reference is None): return None if paper_title in reference: return _id else: return None df1['paper_present_in'] = df1.apply(lambda row: return_id(row['paper_title'], row['reference'], row['_id']), axis=1)

1条回答

网友

1楼 · 发布于 2024-04-25 13:42:49

因此，要解决您的问题，您需要两个字典和一个列表来临时存储一些值

# A list to store unique paper titles
unique_paper_title


# A dict to store mapping of unique paper to unique ids
mapping_dict_paper_to_id = dict()

# A dict to store mapping unique idx to the ids
mapping_id_to_idx = dict()


# This gives us the unique paper title's list
unique_paper_title = df["paper_title"].unique()



# Storing values in the dict mapping_dict_paper_to_id

for value in unique_paper_title:
    mapping_dict_paper_to_id[value] = df["_id"][df["paper_title"]==value].unique()[0]



# Storing values in the dict mapping_id_to_idx

for value in unique_paper_title:

    # this gives us the indexes of the matched string ie. the paper_title
    idx_list = df[df['reference'].str.contains(value)].index

    # Storing values in the dictionary
    for idx in idx_list:
        mapping_id_to_idx[idx] = mapping_dict_paper_to_id[value]


# This loops check if the index have any refernce's id and then updates the paper_present_in field accordingly

for i in df.index:
    if i in mapping_id_to_idx:
        df['paper_present_in'][i] = mapping_id_to_idx[i]
    else:
        df['paper_present_in'][i] = "None"

上面的代码将检查并更新数据框中的搜索值

相关问题更多 >

编程相关推荐

热门问题

热门文章