优美的汤寻找和导航HTML

2024-05-12 18:44:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从this site中删去时间表。你知道吗

特别是,我希望文本包含在

div #tabs-4 > h3 > a > span 

我试过这个,但它只返回第一个项目,而不是项目下的完整树。这个网站使用#tabs-4四次已经够疯狂了。你知道吗

departures_table = soup.select('#tabs-4')
 for div in alilauro_departures_table:
            span = div.select('span')
            alilauro_timetable.append({
                "COMPANY": span[2].text,
                "DEPARTURE DATE TIME" : span[0].text,
                "ARRIVAL DATE TIME": span[4].text,
                "ITINERARIO": span[1].text,
                "FERRY NAME": span[3].text
            })

Tags: 项目text文本divdatetimetablesite
2条回答

试试下面这个密码。你呢不需要选择#tab,因为您已经在使用url链接。你知道吗

import bs4
import re
import requests
html_doc=requests.get("https://alilauronew.forth-crs.gr/italian_b2c/npgres.exe?func=TT&tripcount=1&StartDateLeg1=22%2F02%2F2019&StartDateLeg2=22%2F02%2F2019&StartDateLeg3=22%2F02%2F2019&StartDateLeg4=22%2F02%2F2019&Leg1ilabel=NAPOLI%28BEVERELLO%29&Leg1i=BEV&Leg1iilabel=ISCHIA&Leg1ii=ISH&Leg1Date=22%2F02%2F2019&Leg2ilabel=ISCHIA&Leg2i=ISH&Leg2iilabel=NAPOLI%28BEVERELLO%29&Leg2ii=BEV&Leg2Date=22%2F02%2F2019&Leg3ilabel=NAPOLI%28BEVERELLO%29&Leg3i=BEV&Leg3iilabel=FORIO&Leg3ii=FRD&Leg3Date=22%2F02%2F2019&Leg4ilabel=FORIO&Leg4i=FRD&Leg4iilabel=NAPOLI%28BEVERELLO%29&Leg4ii=BEV&Leg4Date=22%2F02%2F2019&TotalPassengers=1&TotalVehicles=0")
soup = bs4.BeautifulSoup(html_doc.text, 'html.parser')
headers=soup.find_all('h3' , id=re.compile("Leg1"))

for h in headers:
  spans=h.find_all('span')
  for span in spans:
      print(span.text)

主要的问题是第一个项目在html部分的表中。其他项目是javascript。因此,您需要使用^{{cd1>}如Kajal应答,或者使用^{{cd2>}。你知道吗

硒代码:

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")

driver=webdriver.Chrome(chrome_options=options, executable_path=r'your path')
driver.get('https://alilauronew.forth-crs.gr/italian_b2c/npgres.exe?func=TT&tripcount=1&StartDateLeg1=22%2F02%2F2019&StartDateLeg2=22%2F02%2F2019&StartDateLeg3=22%2F02%2F2019&StartDateLeg4=22%2F02%2F2019&Leg1ilabel=NAPOLI%28BEVERELLO%29&Leg1i=BEV&Leg1iilabel=ISCHIA&Leg1ii=ISH&Leg1Date=22%2F02%2F2019&Leg2ilabel=ISCHIA&Leg2i=ISH&Leg2iilabel=NAPOLI%28BEVERELLO%29&Leg2ii=BEV&Leg2Date=22%2F02%2F2019&Leg3ilabel=NAPOLI%28BEVERELLO%29&Leg3i=BEV&Leg3iilabel=FORIO&Leg3ii=FRD&Leg3Date=22%2F02%2F2019&Leg4ilabel=FORIO&Leg4i=FRD&Leg4iilabel=NAPOLI%28BEVERELLO%29&Leg4ii=BEV&Leg4Date=22%2F02%2F2019&TotalPassengers=1&TotalVehicles=0'
)


x = driver.find_elements_by_css_selector("div#tabs-4")
alilauro_timetable = []
for div in x:
            print div.text

driver.close()

输出:

| | Ven 22 Feb 2019, 07:05 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | AIRONE JET| Ven 22 Feb 2019, 08:05
| | Ven 22 Feb 2019, 07:35 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 08:35
| | Ven 22 Feb 2019, 09:40 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 10:40
| | Ven 22 Feb 2019, 10:50 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 11:50
| | Ven 22 Feb 2019, 12:55 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 13:55
| | Ven 22 Feb 2019, 14:35 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 15:35
| | Ven 22 Feb 2019, 15:35 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 16:35
| | Ven 22 Feb 2019, 17:55 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 18:55
| | Ven 22 Feb 2019, 20:20 | NAPOLI(BEVERELLO) - ISCHIA | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 21:20
| | Ven 22 Feb 2019, 06:30 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 07:30
| | Ven 22 Feb 2019, 07:10 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 08:10
| | Ven 22 Feb 2019, 08:40 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 09:40
| | Ven 22 Feb 2019, 09:35 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 10:35
| | Ven 22 Feb 2019, 11:45 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 12:45
| | Ven 22 Feb 2019, 13:20 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 14:20
| | Ven 22 Feb 2019, 14:05 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 15:05
| | Ven 22 Feb 2019, 16:15 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 17:15
| | Ven 22 Feb 2019, 16:50 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 17:50
| | Ven 22 Feb 2019, 19:10 | ISCHIA - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 20:10
| | Ven 22 Feb 2019, 07:05 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 08:30
| | Ven 22 Feb 2019, 09:40 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 11:05
| | Ven 22 Feb 2019, 10:50 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 12:15
| | Ven 22 Feb 2019, 14:35 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 16:00
| | Ven 22 Feb 2019, 17:20 | NAPOLI(BEVERELLO) - FORIO | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 18:45
| | Ven 22 Feb 2019, 06:45 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 08:10
| | Ven 22 Feb 2019, 09:15 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 10:35
| | Ven 22 Feb 2019, 11:20 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | CELESTINA LAURO | Ven 22 Feb 2019, 12:45
| | Ven 22 Feb 2019, 13:00 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | AIRONE JET | Ven 22 Feb 2019, 14:20
| | Ven 22 Feb 2019, 15:55 | FORIO - NAPOLI(BEVERELLO) | ALILAURO | NETTUNO JET | Ven 22 Feb 2019, 17:15

相关问题 更多 >