我有一张名单,但有些有口音。我希望能够找到该人的页面,而不必手动消除姓名上的重音,这会阻止搜索。有办法做到这一点吗
import requests
from bs4 import BeautifulSoup
import pandas as pd
from pandas import DataFrame
base_url = 'https://basketball.realgm.com'
player_names=['Ante Žižić','Anžejs Pasečņiks', 'Dario Šarić', 'Dāvis Bertāns', 'Jakob Pöltl']
# Empty DataFrame
result = pd.DataFrame()
for name in player_names:
url = f'{base_url}/search?q={name.replace(" ", "+")}'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
if url == response.url:
# Get all NBA players
for player in soup.select('.tablesaw tr:has(a[href*="/nba/teams/"]) a[href*="/player/"]'):
response = requests.get(base_url + player['href'])
player_soup = BeautifulSoup(response.content, 'lxml')
player_data = get_player_stats(search_name=player.text, real_name=name, player_soup=player_soup)
result = result.append(player_data, sort=False).reset_index(drop=True)
else:
player_data = get_player_stats(search_name=name, real_name=name, player_soup=soup)
result = result.append(player_data, sort=False).reset_index(drop=True)
试试下面的答案2:Replace non-ASCII characters with a single spacethe unidecode module
^{} 可以处理空格和unicode字符。然后,由于您处理的是搜索字符串,只需使用一个简单的
replace('-', '+')
将-
转换为+
输出:
当然,其他人提到的^{} 模块也可以工作
URL似乎并不关心名称的大小写
以下是链接,您可以验证它是否正常工作
您可以安装一个名为unidecode的包
现在,您可以在进一步处理列表之前执行以下操作:
输出:
相关问题 更多 >
编程相关推荐