移动到下一页时抓取

for urls in url_list: html = requests.get(urls) soup = BeautifulSoup(html.text,'html.parser') # Create a BeautifulSoup object # Retrieve a list of all the links and the titles for the respective links #word1,word2,word3 = "US","USA","USFDA" sub_links = soup.find_all('a', class_='arial11_summ') for links in sub_links: sp = BeautifulSoup(str(links),'html.parser') # first convert into a string tag = sp.a #if word1 in tag['title'] or word2 in tag['title'] or word3 in tag['title']: category_links = Base_url + tag["href"] List_of_links.append(category_links) time.sleep(3)

1条回答

网友

1楼 · 发布于 2024-04-19 09:39:41

移到下一页：

像这样将param添加到URL https://www.moneycontrol.com/stocks/company_info/stock_news.php?sc_id=CHC&durationType=Y&Year=2018年
对于年份列表，您可以从第1页获得

提取date:sub字符串以仅获取datetime，然后像这样解析时间和时区

我使用pytz更新了设置时区

input = 'Feb 07, 2019 03:05 PM IST'
str_time = input[:len(input) - 4]
str_timezone = input[len(input) - 3:]

datetime_object = datetime.strptime(str_time, '%b %d, %Y %I:%M %p')
if str_timezone == 'IST':
    # base on https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
    # assume it's Indian/Mauritius
    tz = pytz.timezone('Indian/Mauritius')
else:
    tz = pytz.timezone('UTC')

output = tz.localize(datetime_object)
# test
print(output.strftime('%X %x %z'))

相关问题更多 >

编程相关推荐

热门问题

热门文章