python中的字符串中是否有\in的替代方法?

2024-05-15 04:53:59 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我正在用链接删除这个网站:https://www.americanexpress.com/in/credit-cards/payback-card/ 用漂亮的汤和Python

link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
html = urlopen(link)
soup = BeautifulSoup(html, 'lxml')

details = []

for span in soup.select(".why-amex__subtitle span"):
    details.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True)}')

print(details)

输出:

['EARN POINTS: Earn multiple Points from more than 50 PAYBACK partners2and 2 PAYBACK Points from American\xa0Express PAYBACK Credit\xa0Card for every Rs.\xa0100 spent', 'WELCOME GIFT: Get Flipkart voucher worth Rs. 7503on taking 3 transactions within 60 days of Cardmembership', 'MILESTONE BENEFITS: Flipkart vouchers4worth Rs. 7,000 on spending Rs. 2.5 lacs in a Cardmembership yearYou will earn a Flipkart voucher4worth Rs. 2,000 on spending Rs. 1.25 lacs in a Cardmembership year. Additionally, you will earn a Flipkart voucher4worth Rs. 5,000 on spending Rs. 2.5 lacs in a Cardmembership year.']

正如您在输出中看到的,有\xa0个字符要从字符串中删除

我尝试使用replace函数,但由于涉及到\,因此无法使用f字符串

details.append(f'{span.get_text(strip=True)}: {span.find_next("span").get_text(strip=True).replace("\xa0","")}')

有没有别的办法

非常感谢您的帮助


Tags: textinhttpstruegetondetailsspan
2条回答

您可以使用unicodedata删除\xa0字符。在f字符串中插入时,它将不会运行,但这将执行以下操作:

from bs4 import BeautifulSoup
from unicodedata import normalize

link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
html = urlopen(link)
soup = BeautifulSoup(html, 'lxml')

details = []

for span in soup.select(".why-amex__subtitle span"):
    a = normalize('NFKD', span.get_text(strip=True))
    b = normalize('NFKD',span.find_next("span").get_text(strip=True))
    details.append(f'{a}: {b}')

print(details)

这可能是一个临时解决方案,因为.replace("\xa0","")不在内部工作会在以下情况之前在外部进行更改:

link = 'https://www.americanexpress.com/in/credit-cards/payback-card/'
html = urlopen(link)
soup = BeautifulSoup(html, 'lxml')

details = []

for span in soup.select(".why-amex__subtitle span"):

    element = span.get_text(strip=True).replace("\xa0","")
    next_element = span.find_next("span").get_text(strip=True).replace("\xa0","")
    details.append(f'{element}: {next_element}')

print(details)

相关问题 更多 >

    热门问题