在这里,我想搜索reference
列中paper_title
列的值,如果匹配/找到整个文本,则获取该引用行的_id
(而不是paper_title
行的_id
中匹配的_id
列,并将其保存在paper_title_in
列中
In[1]:
d ={
"_id":
[
"Y100",
"Y100",
"Y100",
"Y101",
"Y101",
"Y101",
"Y102",
"Y102",
"Y102"
]
,
"paper_title":
[
"translation using information on dialogue participants",
"translation using information on dialogue participants",
"translation using information on dialogue participants",
"#emotional tweets",
"#emotional tweets",
"#emotional tweets",
"#supportthecause: identifying motivations to participate in online health campaigns",
"#supportthecause: identifying motivations to participate in online health campaigns",
"#supportthecause: identifying motivations to participate in online health campaigns"
]
,
"reference":
[
"beattie, gs (2005, november) #supportthecause: identifying motivations to participate in online health campaigns may 31, 2017, from",
"burton, n (2012, june 5) depressive realism retrieved may 31, 2017, from",
"gotlib, i h, 27 hammen, c l (1992) #supportthecause: identifying motivations to participate in online health campaigns new york: wiley",
"paul ekman 1992 an argument for basic emotions cognition and emotion, 6(3):169200",
"saif m mohammad 2012a #tagspace: semantic embeddings from hashtags in mail and books to appear in decision support systems",
"robert plutchik 1985 on emotion: the chickenand-egg problem revisited motivation and emotion, 9(2):197200",
"alastair iain johnston, rawi abdelal, yoshiko herrera, and rose mcdermott, editors 2009 translation using information on dialogue participants cambridge university press",
"j richard landis and gary g koch 1977 the measurement of observer agreement for categorical data biometrics, 33(1):159174",
"tomas mikolov, kai chen, greg corrado, and jeffrey dean 2013 #emotional tweets arxiv:13013781"
]
}
import pandas as pd
df=pd.DataFrame(d)
df
输出:
预期成果:
And finally the final result dataframe with unique values as:
注意这里paper_title_in
列将所有_id
标题作为列表显示在reference
列中
我尝试了这个方法,但它返回了paper_presented_in
中的paper_title
列的_id
,该列被搜索到,而不是它匹配的reference
列。预期结果dataframe给出了更清晰的概念。看看那里
def return_id(paper_title,reference, _id):
if (paper_title is None) or (reference is None):
return None
if paper_title in reference:
return _id
else:
return None
df1['paper_present_in'] = df1.apply(lambda row: return_id(row['paper_title'], row['reference'], row['_id']), axis=1)
因此,要解决您的问题,您需要两个字典和一个列表来临时存储一些值
上面的代码将检查并更新数据框中的搜索值
相关问题 更多 >
编程相关推荐