如何根据每个数据帧中的列的值，高效地将来自另一个数据帧的新列添加到数据帧中？

2024-06-17 15:22:07 发布

您现在位置：Python中文网/ 问答频道 /正文

2455

网友

男 | 程序猿一只，喜欢编程写python代码。

感谢您的到来，我有两个数据框，一个叫做“新闻测试”，它存储了300万条新闻，另一个是“公司名称”，存储了28万个公司名称（带有模糊名称）。以下是一些例子：

新闻测验 +=======+===========================================================================+ | index | content | +=======+===========================================================================+ | 0 | Apple and Google are two of the strongest companies in the world. | +-------+---------------------------------------------------------------------------+ | 1 | Working in Facebook and Google is my dream, however, it is still a dream. | +-------+---------------------------------------------------------------------------+
公司名称 +=======+========+==============+=======================+ | index | ID | Company_Name | Company_FuzzyName_new | +=======+========+==============+=======================+ | 0 | 123456 | Apple Inc. | Apple Inc.|Apple | +-------+--------+--------------+-----------------------+ | 1 | 789111 | Google LLC | Google LLC|Google | +-------+--------+--------------+-----------------------+ | 2 | 333333 | Facebook | Facebook|FB | +-------+--------+--------------+-----------------------+

现在，如果“Company_FuzzyName_new”（数据框：Company_fuzzy_name，以|分隔）中的任何一个名称与“content”（数据框：news_test）中的任何单词匹配，我将在news_test中添加一个名为“Com”的新列，并且Company_fuzzy___name中的值是“ID”。因此，根据上述示例，结果将为：

+=======+===========================================================================+==================+
| index |                                  content                                  |       Com        |
+=======+===========================================================================+==================+
|   0   | Apple and Google are two of the strongest companies in the world.         | [123456, 789111] |
+-------+---------------------------------------------------------------------------+------------------+
|   1   | Working in Facebook and Google is my dream, however, it is still a dream. | [789111, 333333] |
+-------+---------------------------------------------------------------------------+------------------+

我已经有了下面的代码，它是有效的 `

list_total = []
for i in range(0, len(news_test)):
    list_match = []
    for j in range(0, len(company_fuzzy_name)):
        if bool(re.search(company_fuzzy_name.iloc[j]['Company_FuzzyName_new'], news_test.iloc[i]['content'].encode('utf-8'))) == True:
            list_match.append(company_fuzzy_name.iloc[j]['ID'])
        else:
            continue
    list_total.append(list_match)
news_test['Com'] = list_total

但是，这个太慢了（因为3M*280K），我想知道有没有办法加快实现时间，或者重组代码以提高效率？“Com”列中的表单不是固定的，它可以是列表、字符串等。谢谢你的帮助

我的Python环境是2.7

Tags： and the name in test 名称 apple facebook

1条回答

网友

1楼 · 发布于 2024-06-17 15:22:07

对不起，有人能帮我吗？我已经在这种情况下呆了很长时间了

如何根据每个数据帧中的列的值，高效地将来自另一个数据帧的新列添加到数据帧中？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何根据每个数据帧中的列的值，高效地将来自另一个数据帧的新列添加到数据帧中？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >