如何修改字典输出

import scrapy class testspider(scrapy.Spider): name = 'test' start_urls = ['https://oilprice.com/Latest-Energy-News/World-News'] def parse(self, response): all_news = response.xpath('//div[@class="tableGrid__column tableGrid__column--articleContent category"]') for singlenews in all_news: title_item = singlenews.xpath('//div[@class="categoryArticle__content"]//a//text()').extract() meta_item = singlenews.xpath('//div[@class="categoryArticle__content"]//p[@class="categoryArticle__meta"]//text()').extract() extract_item = singlenews.xpath('//div[@class="categoryArticle__content"]//p[@class="categoryArticle__excerpt"]//text()').extract() yield { 'title_data' : title_item, 'meta_data' : meta_item, 'extract_data' : extract_item }

{'title_data': ['Global Energy-Related CO2 Emissions Stopped Rising In 2019', 'BHP Is Now The World’s Top Copper Miner', 'U.S. Budget Proposal Includes Sale Of 15 Mln Barrels Strategic Reserve Oil', ... , '**meta_data**': ['Feb 11, 2020 at 12:02 | Tsvetana Paraskova', 'Feb 11, 2020 at 11:27 | MINING.com ', 'Feb 11, 2020 at 09:59 | Irina Slav', ... , '**extract_data**': ['The world’s energy-related carbon dioxide (CO2) emissions remained flat in 2019, halting two years of emissions increases, as lower emissions in advanced economies offset growing emissions elsewhere, the International Energy…', 'BHP Group on Monday became the world’s largest copper miner based on production after Chile’s copper commission announced a slide in output at state-owned Codelco.\r\nHampered by declining grades Codelco…', 'The budget proposal President Trump released yesterday calls for the sale of 15 million barrels of oil from the Strategic Petroleum Reserve of the United States.\r\nThe proceeds from the…', ... , ']}

2条回答

网友

1楼 · 编辑于 2024-06-09 10:31:50

在Xpath中使用//时，将在整个文档中执行搜索，然后

title_item = singlenews.xpath('//div[@class="categoryArticle__content"]//a//text()').extract()

将返回一个列表，其中包含与此筛选器匹配的div中的所有文本div[@class="categoryArticle__content]

您需要做的是筛选相对路径singlenews，请尝试以下操作：

title_item = singlenews.xpath('./div[@class="categoryArticle__content"]//a//text()').extract()

参考：https://devhints.io/xpath

网友

2楼 · 编辑于 2024-06-09 10:31:50

从输出中，您的代码似乎同时提取了title、meta_data和extract_data并将其保存在一个字典中。如果你想为你正在抓取的网站上的每一条新闻条目创建一本词典，你应该首先获得你需要的所有数据，然后根据你的喜好将其解析成词典。所以你的代码看起来像这样

def parse(self, response):
    all_news = response.xpath('//div[@class="tableGrid__column tableGrid__column articleContent category"]')  
    titles = all_news.xpath('//div[@class="categoryArticle__content"]//a//text()').extract()
    meta_items = all_news.xpath('//div[@class="categoryArticle__content"]//p[@class="categoryArticle__meta"]//text()').extract()
    extract_items = all_news.xpath('//div[@class="categoryArticle__content"]//p[@class="categoryArticle__excerpt"]//text()').extract()      

    # at this point titles, meta_items and extract_items should be 3 concurrent lists of the same length and now you can parse them as you need

    news_items = []
    for i in range(len(titles)): 
        news = { 'title': titles[i], 'meta_data': meta_items[i], 'extract_data': extract_items[i] }
        news_items.append(news)
    return news_items

这将返回您希望的新闻帖子

相关问题更多 >

编程相关推荐

热门问题

热门文章