我是一个机器学习的初学者,为我的nlp项目探索数据库。这里我从http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html获得了数据。我正在尝试创建一个pd数据框架,我想在其中解析xml数据,我还想在正面评论中添加一个标签(1),有人能帮我编写代码吗,已经给出了一个示例输出
from bs4 import BeautifulSoup
positive_reviews = BeautifulSoup(open('/content/drive/MyDrive/sorted_data_acl/electronics/positive.review', encoding='utf-8').read())
positive_reviews = positive_reviews.findAll('review_text')
positive_reviews[0]
<review_text>
I purchased this unit due to frequent blackouts in my area and 2 power supplies going bad. It will run my cable modem, router, PC, and LCD monitor for 5 minutes. This is more than enough time to save work and shut down. Equally important, I know that my electronics are receiving clean power.
I feel that this investment is minor compared to the loss of valuable data or the failure of equipment due to a power spike or an irregular power supply.
As always, Amazon had it to me in <2 business days
</review_text>
相关问题 更多 >
编程相关推荐