Python中的Webscraping

import urllib.request import re companyList = ["aapl","goog","nflx"] for i in range(len(companyList)): url = "https://finance.yahoo.com/quote/"+companyList[i]+"?p="+companyList[i] htmlfile = urllib.request.urlopen(url) htmltext = htmlfile.read() regex = '<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="35">()(.+?)</span>' pattern = re.compile(regex) price = re.findall(pattern, str(htmltext)) print(price)

2条回答

网友

1楼 · 编辑于 2024-06-16 11:42:16

看看下面的脚本是否有帮助。这还包括身份验证。在

    https://github.com/PraveenKandregula/JenkinsRSSScrappingWithPython/blob/master/JenkinsRSSScrappingWithPython.py

网友

2楼 · 编辑于 2024-06-16 11:42:16

我会为其中一家公司做的。但我要你坚定地保证你不会告诉任何人我已经教过你怎么做。在

获取页面的HTML副本并将其保存在本地。在

>>> import urllib.request
>>> import re
>>> url = 'https://finance.yahoo.com/quote/AAPL/?p=AAPL'
>>> htmlfile = urllib.request.urlopen(url)
>>> htmltext = htmlfile.read()
>>> open('temp.htm', 'w').write(str(htmltext))
533900

检查页面，并复制粘贴要在该页面和类似页面中标识的项目。把它放在评论中以供参考。在

^{pr2}$

把它保存在一个变量中，比如，exp。在

>>> exp = '<span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="35"><!  react-text: 36  >161.38<'

验证字符串中没有多个空白字符。如果存在，则将整个空格字符串替换为\s+

>>> exp.find('  ')
-1

在字符串中对regex有意义的每个字符前加一个“\”字符。在

>>> re.sub(r'[().]', lambda m: '\\'+m.group(), exp)
'<span class="Trsdu\\(0\\.3s\\) Fw\\(b\\) Fz\\(36px\\) Mb\\(-4px\\) D\\(ib\\)" data-reactid="35"><!  react-text: 36  >161\\.38<'

显示并检查结果。在

>>> regex = '<span class="Trsdu\\(0\\.3s\\) Fw\\(b\\) Fz\\(36px\\) Mb\\(-4px\\) D\\(ib\\)" data-reactid="35"><!  react-text: 36  >([^<]+)<'

使用regex查找目标项。在

>>> re.findall(regex, str(htmltext))
['161.38']

相关问题更多 >

编程相关推荐

热门问题

热门文章