从网页中删除移动规范

2024-04-20 08:19:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图得到所有手机在网页上列出的细节,如名称,价格和规格。我很成功地得到了名字和价格,因为它的规格混乱。有24个移动电话列表,当我试图得到规格,它得到的规格都在一个列表中。我找不到一个合适的方法来根据他们所属的电话把他们分开。任何帮助都将受到感谢。以下是函数定义-

def get_link(self,link):
    page = requests.get(link)
    tree = html.fromstring(page.content)
    name = tree.xpath("//div[@class='_3wU53n']/text()")
    print name
    time.sleep(5)
    price = tree.xpath("//div[@class='_1vC4OE _2rQ-NK']/text()")[1::2]
    print price      
    time.sleep(5)
    highlights = tree.xpath("//ul[@class='vFw0gD']/li/text()")
    print highlights


'''
    dictionary={}
    for i in range(len(name)):
        dictionary[name[i]]=price[i]
    print dictionary


    return
'''

传递的链接是-https://www.flipkart.com/mobiles-accessories/mobiles/pr?count=40&otracker=categorytree&p%5B%5D=sort%3Dpopularity&sid=tyy%2F4io

目前为止的结果是-

['Mi A1 (Black, 64 GB)', 'Redmi Note 4 (Gold, 32 GB)', 'Mi A1 (Rose Gold, 64 GB)', 'Redmi Note 4 (Gold, 64 GB)', 'Redmi Note 4 (Black, 32 GB)', 'Honor 9i (Graphite Black, 64 GB)', 'Redmi Note 4 (Black, 64 GB)', 'Moto E4 Plus (Fine Gold, 32 GB)', 'Moto E4 Plus (Iron Gray, 32 GB)', 'Intex Aqua 5.5 VR (Champagne, White, 8 GB)', 'Lenovo K8 Plus (Venom Black, 32 GB)', 'Redmi Note 4 (Dark Grey, 64 GB)', 'Panasonic Eluga Ray (Gold, 16 GB)', 'Moto C Plus (Pearl White, 16 GB)', 'Moto C Plus (Starry Black, 16 GB)', 'Moto C Plus (Fine Gold, 16 GB)', 'Lenovo K8 Plus (Fine Gold, 32 GB)', 'Panasonic Eluga Ray 700 (Champagne Gold, 32 GB)', 'Panasonic Eluga I5 (Gold, 16 GB)', 'OPPO F5 (Black, 64 GB)', 'Lenovo K8 Plus (Fine Gold, 32 GB)', 'Moto X4 (Super Black, 64 GB)', 'Swipe ELITE Sense- 4G with VoLTE', 'Swipe ELITE Sense- 4G with VoLTE']


['14,999', '9,999', '14,999', '11,999', '9,999', '17,999', '11,999', '9,999', '9,999', '4,499', '9,999', '11,999', '6,999', '6,999', '6,999', '6,999', '9,999', '9,999', '6,499', '24,990', '10,999', '22,999', '5,555', '5,555']


['4 GB RAM | 64 GB ROM | Expandable Upto 128 GB', '5.5 inch Full HD Display', '12MP + 12MP Dual Rear Camera | 5MP Front Camera', '3080 mAh Li-polymer Battery', 'Qualcomm Snapdragon 625 64 bit Octa Core 2GHz Processor', 'Android Nougat 7.1.2 | Stock Android Version', 'Android One Smartphone - with confirmed upgrades to Android Oreo and Android P', 'Brand Warranty of 1 Year Available for Mobile and 6 Months for Accessories', .....]

Tags: nametreelinkplusxpathandroidnoteblack
1条回答
网友
1楼 · 发布于 2024-04-20 08:19:54

试试这个。我想这就是你期望的结果:

import requests
from bs4 import BeautifulSoup

res = requests.get('https://www.flipkart.com/mobiles/pr?count=40&otracker=categorytree&p=sort%3Dpopularity&sid=tyy%2C4io')
soup = BeautifulSoup(res.text, "lxml")
for items in soup.select("._1UoZlX"):
    name = items.select("._3wU53n")[0].text
    price = items.select("._1vC4OE._2rQ-NK")[0].text
    specifics = ' '.join([item.text for item in items.select(".tVe95H")])
    print("Name: {}\nPrice: {}\nSpecification: {}\n".format(name,price,specifics))

单导线输出:

Name: Mi A1 (Black, 64 GB)
Price: ₹14,999
Specification: 4 GB RAM | 64 GB ROM | Expandable Upto 128 GB 5.5 inch Full HD Display 12MP + 12MP Dual Rear Camera | 5MP Front Camera 3080 mAh Li-polymer Battery Qualcomm Snapdragon 625 64 bit Octa Core 2GHz Processor Android Nougat 7.1.2 | Stock Android Version Android One Smartphone - with confirmed upgrades to Android Oreo and Android P Brand Warranty of 1 Year Available for Mobile and 6 Months for Accessories

相关问题 更多 >