我是Python新手,有点卡住了。我有一个期刊文章及其主题的数据框架。标题以字符串形式从API返回,其中副标题修改描述符
例如,API返回的主题标题之一是: “心血管疾病/*药物治疗/流行病学”
它主要描述了一篇关于心血管疾病药物治疗和心血管疾病流行病学的文章。在本例中,我想在dataframe中为每一项创建一列。我希望该列包含初始项+修饰符。有些文章只有一个术语没有修饰语,有些文章有一个术语+许多副标题
当前数据帧:
+-----------------+------+----------------------------------------------------+ | Article Title | ID | Subject | +-----------------+------+----------------------------------------------------+ | an article | 123 | Cardiovascular Diseases/*drug therapy/epidemiology | | another article | 324 | Adult | | One more | 234 | United Kingdom/epidemiology | +-----------------+------+----------------------------------------------------+ What I want:+-----------------+------+----------------------------------------------------+--------------------------------------+----------------------------------------+--------------+ | Article Title | ID | Subject | Modifier 1 | Modifier 2 | Modifier 3 | +-----------------+------+----------------------------------------------------+--------------------------------------+----------------------------------------+--------------+ | an article | 123 | Cardiovascular Diseases/*drug therapy/epidemiology | Cardiovascular diseases/drug therapy | cardiovascular diseases/epidemiology | | | another article | 324 | Adult | Adult | | | | One more | 234 | United Kingdom/epidemiology | United Kingdom/epidemiology | | | +-----------------+------+----------------------------------------------------+--------------------------------------+----------------------------------------+--------------+
我最初的尝试只是想把最初的标题和修饰语分开(如下)。我很难把我的脑袋包起来,因为它不适合多个副标题:
for term in df['subjects'] :
head, sep, tail = term.partition('/')
descriptor.append(head)
qualifier.append(tail)
您可以使用
str.split()
方法和一些星型解包来将标题分成如下变量:上面的代码用
/
分隔符分隔标题,将第一个元素放在title
变量中,其余所有元素放在classifiers
列表变量中相关问题 更多 >
编程相关推荐