我正在使用python提取一些数据(页面标题),但输出的顺序与我在代码中输入的URL的顺序不同

2024-05-28 02:12:48 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我编写了代码并运行了它,得到了.xlsx文件,但输出的顺序与我在代码中输入的Url列表的顺序不同

#importing the libraries
import re
import lxml
import  chardet
from os import truncate
import bs4
from bs4 import BeautifulSoup
import multiprocessing
import requests
import pandas as pd
from fake_useragent import UserAgent
import numpy as np

urls = list(('https://isabad.com/advanced-professional-email-templates-opencart-extension' ,
'https://isabad.com/seo-basic-pack-opencart-extension',
'https://isabad.com/x-shipping-pro',
'https://isabad.com/bot-blocker-opencart-extension',
'https://isabad.com/opencart-mobile-application'
))

dit = {}
user_agent = UserAgent()
for url in urls:
        data = requests.get(url, headers={"user-agent": user_agent.chrome})
        soup = bs4.BeautifulSoup(data.content, "lxml")
        dit[url] = soup.find_all("title")
        ex = pd.DataFrame({"title": dit ,})
        print(ex)
        ex.to_excel('sasa.xlsx', index=False, engine='xlsxwriter')


我如何解决这个问题


Tags: 代码fromhttpsimportcomurlextensionxlsx
2条回答

使用list以便结果的顺序与您定义的顺序相同

urls = ['https://www.sample.com/search/category-mobile/' ,
'https://www.sample.com/search/category-tablet-ebook-reader',
'https://www.sample.com/search/category-laptop/',
'https://www.sample.com/search/category-computer-parts/',
'https://www.sample.com/search/category-office-machines/'
]

enter image description here

您正在使用set数据结构来存储URL列表,而python中的set数据结构是无序的数据结构。要使输出具有相同的顺序,应将URL存储在list数据结构中,如下所示:

urls = [
  'https://www.sample.com/search/category-mobile/' ,
  'https://www.sample.com/search/category-tablet-ebook-reader',
  'https://www.sample.com/search/category-laptop/',
  'https://www.sample.com/search/category-computer-parts/',
  'https://www.sample.com/search/category-office-machines/'
]

干杯

相关问题 更多 >