使用BeautifulSoup提取元素中的文本

import requests import pandas as pd from bs4 import BeautifulSoup record = [] hksi = ['CKH'] url = "http://www.etnet.com.hk/www/tc/futures/futures_stockoptions.php?atscode={}&month=202101" for s in hksi: response = requests.get(url.format(s)) info = response.text soup = BeautifulSoup(info, "lxml") bid = soup.find('td', {'style': 'padding:10px 0 5px 10px; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text ratio = soup.find('td', {'style': 'padding:10px 0 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text ask = soup.find('td', {'style': 'padding:10px 10px 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text record.append({ 'symbol' : s, 'bid' : bid, 'ask' : ask, 'ratio': ratio }) print(bid)

3条回答

网友

1楼 · 编辑于 2024-05-26 11:56:29

您可以使用正则表达式，仅搜索与预期模式匹配的内容：

如果你想找2.3或2:3之类的东西

用途（案例2.3）：

\d+(?:\.)+\d+

或使用（案例2:3）：

\d+(?:\:)+\d+

此代码将使用23:2的输入：

import requests
import pandas as pd
import re
from bs4 import BeautifulSoup


record = []
hksi = ['CKH']

url = "http://www.etnet.com.hk/www/tc/futures/futures_stockoptions.php?atscode={}&month=202101"

for s in hksi:
    response = requests.get(url.format(s))
    info = response.text
    soup = BeautifulSoup(info, "lxml")
    
    bid = soup.find('td', {'style': 'padding:10px 0 5px 10px; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
    ratio = soup.find('td', {'style': 'padding:10px 0 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
    ask = soup.find('td', {'style': 'padding:10px 10px 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text

    record.append({
        'symbol' : s,
        'bid' : bid,
        'ask' : ask,
        'ratio': ratio
    })
numbers = re.compile(r'\d+(?:\:)+\d+')
output = numbers.findall(bid)
print(output[0])

网友

2楼 · 编辑于 2024-05-26 11:56:29

您可以使用正则表达式仅从字符串中获取数字。使用' (\d+)'只能匹配空格后面的数字

import re

bid = '認購總數 1441'
number = re.findall(' (\d+)', bid)
print(int(number[0]))

输出：

或者，如果bid始终具有相同的结构，即字符后跟空格，后跟数字，则可以在空格上拆分并获取最后一个元素：

bid = '認購總數 1441'
number = bit.split(' ')[-1]

网友

3楼 · 编辑于 2024-05-26 11:56:29

为了解释三个变量bid、ratio和ask，一种简洁的方法是将^{}与此替换一起使用：

val = re.sub(r'[^!-~]', '', val)

对于bid、ratio和ask中的每一个

这将删除除可打印ASCII字符以外的任何字符，并删除空格。如果要保留空格，请执行以下操作：

val = re.sub(r'[^ -~]', '', val)

您还可以使模式更加具体，只保留数字、.、:、%或任何有意义的字符，这取决于您可能需要提取的其他字段，例如

val = re.sub(r'[^0-9:\.%]', '', val)

以下是完整的工作版本：

import re
import requests
import pandas as pd
from bs4 import BeautifulSoup

record = []
hksi = ['CKH']

url = "http://www.etnet.com.hk/www/tc/futures/futures_stockoptions.php?atscode={}&month=202101"

for s in hksi:
    response = requests.get(url.format(s))
    info = response.text
    soup = BeautifulSoup(info, "lxml")
    
    bid = soup.find('td', {'style': 'padding:10px 0 5px 10px; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
    ratio = soup.find('td', {'style': 'padding:10px 0 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
    ask = soup.find('td', {'style': 'padding:10px 10px 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text

    record.append({
        'symbol' : s,
        'bid' : bid,
        'ask' : ask,
        'ratio': ratio
    })
for val in [bid, ratio, ask]:
    val = re.sub(r'[^!-~]', '', val)
    print(val)

相关问题更多 >

编程相关推荐

热门问题

热门文章