试图用python在网页上删除最后一个文档的日期

2024-04-19 23:22:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图得到如下日期:1/19/2021,我想得到python变量中的“19”

<span class="grayItalic">
    Received: 01/19/2021
</span>

以下是一段代码“未工作”:

date = soup.find('span', {'class': 'grayItalic'}).get_text()
converted_date = int(date[13:14])
print(date)

我得到这个错误:“非类型”对象没有属性“获取文本” 有人能帮忙吗


Tags: 代码text类型getdate错误findclass
3条回答

使用标题尝试以下操作:

import requests
from bs4 import BeautifulSoup

url = "https://iapps.courts.state.ny.us/nyscef/DocumentList?docketId=npvulMdOYzFDYIAomW_PLUS_elw==&display=all"

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content,'html.parser')
date = soup.find('span', {'class': 'grayItalic'}).get_text().strip()
converted_date = int(date.split("/")[-2])
print(converted_date)
print(date)

我无法使用请求或urllib模块加载URL。我猜该网站正在阻止自动连接请求。因此,我打开网页,将源代码保存在文件名page.html中,并在其中运行BeautifulSoup操作。这似乎奏效了

html = open("page.html")
soup = BeautifulSoup(html, 'html.parser')
date_span = soup.find('span', {'class': 'grayItalic'})

if date_span is not None:
    print(str(date_span.text).strip().replace("Received: ", ""))
    # output: 04/25/2019

我试着用请求库删除源代码,如下所示,但没有成功(可能是网页阻止了请求)。看看它是否能在你的机器上工作

url = "..."
headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
}

response = requests.get(url, headers=headers)
html = response.content

print(html)
import dateutil.parser
from bs4 import BeautifulSoup
html_doc=""""<span class="grayItalic">
    Received: 01/19/2021
</span>"""
soup=BeautifulSoup(html_doc,'html.parser')

date_ = soup.find('span', {'class': 'grayItalic'}).get_text()

dateutil.parser.parse(date_,fuzzy=True)

输出:

datetime.datetime(2021, 1, 19, 0, 0)

date_输出'\n Received: 01/19/2021\n'您可以使用字符串切片,而可以使用^{}。它将为您返回datetime.datetime对象。 在本例中,我假设您只需要日期。如果您也需要文本,可以使用fuzzy_with_tokens=True

 if the fuzzy_with_tokens option is True, returns a tuple, the first element being a datetime.datetime object, the second a tuple containing the fuzzy tokens.

dateutil.parser.parse(date_,fuzzy_with_tokens=True)

(datetime.datetime(2021, 1, 19, 0, 0), (' Received: ', ' '))

相关问题 更多 >