<p>为了解释三个变量<code>bid</code>、<code>ratio</code>和<code>ask</code>,一种简洁的方法是将<a href="https://docs.python.org/3/library/re.html#re.sub" rel="nofollow noreferrer">^{<cd4>}</a>与此替换一起使用:</p>
<pre><code>val = re.sub(r'[^!-~]', '', val)
</code></pre>
<p>对于<code>bid</code>、<code>ratio</code>和<code>ask</code>中的每一个</p>
<p>这将删除除可打印ASCII字符以外的任何字符,并删除空格。如果要保留空格,请执行以下操作:</p>
<pre><code>val = re.sub(r'[^ -~]', '', val)
</code></pre>
<p>您还可以使模式更加具体,只保留数字、<code>.</code>、<code>:</code>、<code>%</code>或任何有意义的字符,这取决于您可能需要提取的其他字段,例如</p>
<pre><code>val = re.sub(r'[^0-9:\.%]', '', val)
</code></pre>
<p>以下是完整的工作版本:</p>
<pre><code>import re
import requests
import pandas as pd
from bs4 import BeautifulSoup
record = []
hksi = ['CKH']
url = "http://www.etnet.com.hk/www/tc/futures/futures_stockoptions.php?atscode={}&month=202101"
for s in hksi:
response = requests.get(url.format(s))
info = response.text
soup = BeautifulSoup(info, "lxml")
bid = soup.find('td', {'style': 'padding:10px 0 5px 10px; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
ratio = soup.find('td', {'style': 'padding:10px 0 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
ask = soup.find('td', {'style': 'padding:10px 10px 5px 0; border-top:1px dotted #e2e2e2; font-weight:bold;'}).text
record.append({
'symbol' : s,
'bid' : bid,
'ask' : ask,
'ratio': ratio
})
for val in [bid, ratio, ask]:
val = re.sub(r'[^!-~]', '', val)
print(val)
</code></pre>