从维基百科页面中删除国家名称

2024-04-23 21:18:49 发布

您现在位置:Python中文网/ 问答频道 /正文

from bs4 import BeautifulSoup
from urllib.request import urlopen

webpage = urlopen('https://en.wikipedia.org/wiki/List_of_largest_banks')
bs = BeautifulSoup(webpage,'html.parser')
print(bs)
spanList= bs.find_all('span',{'class':'flagicon'})
for span in spanList:
        print(span.a['title'])

虽然其打印了第一个表中的国家列表,但打印后出现了错误:

Traceback (most recent call last):
  File "C:/Users/Jegathesan/Desktop/python programmes/scrape5.py", line 10, in <module>
    print(span.a['title'])
TypeError: 'NoneType' object is not subscriptable

Tags: infromhttpsimportbstitlerequesturllib
2条回答

最初的代码是检查所有解析的html中的span标记

修改后的代码将获得在html中找到的所有表标记并存储在列表中

使用语句获取特定表(即第一个表)的span标记

from bs4 import BeautifulSoup
from urllib.request import urlopen

webpage = urlopen('https://en.wikipedia.org/wiki/List_of_largest_banks')
bs = BeautifulSoup(webpage,'html.parser')

# tableList is extracting all "table" elements in a list
tableList = bs.table.findAll()

# spanList will access the table [index] in the tableList and find all span 
# to access other table change the list index
spanList= tableList[0].findAll('span',{'class':'flagicon'})

for span in spanList:
        try:
                print(span.a['title'])
        except:
                print("title tag is not found.")
import pandas as pd

df = pd.read_html("https://en.wikipedia.org/wiki/List_of_largest_banks")

for item in range(4):
    goal = df[item].iloc[:, 1].values.tolist()
    print(goal)
    print("*" * 100)

输出:


['Industrial and Commercial Bank of China', 'China Construction Bank', 'Agricultural Bank of China', 'Bank of China', 'Mitsubishi UFJ Financial Group', 'JPMorgan Chase', 'HSBC Holdings PLC', 'Bank of America', 'BNP Paribas', 'Crédit Agricole', 'Citigroup Inc.', 'Japan Post Bank', 'Wells Fargo', 'Sumitomo Mitsui Financial Group', 'Mizuho Financial Group', 'Banco Santander', 'Deutsche Bank', 'Société Générale', 'Groupe BPCE', 'Barclays', 'Bank of Communications', 'Postal Savings Bank of China', 'Royal Bank of Canada', 'Lloyds Banking Group', 'ING Group', 'Toronto-Dominion Bank', 'China Merchants Bank', 'Crédit Mutuel', 'Norinchukin Bank', 'UBS', 'Industrial Bank (China)', 'UniCredit', 'Goldman Sachs', 'Shanghai Pudong Development Bank', 'Intesa Sanpaolo', 'Royal Bank of Scotland Group', 'China CITIC Bank', 'China Minsheng Bank', 'Morgan Stanley', 'Scotiabank', 'Credit Suisse', 'Banco Bilbao Vizcaya Argentaria', 'Commonwealth Bank', 'Standard Chartered', 'Australia and New Zealand Banking Group', 'Rabobank', 'Nordea', 'Westpac', 'China Everbright Bank', 'Bank of Montreal', 'DZ Bank', 'National Australia Bank', 'Danske Bank', 'State Bank of India', 'Resona Holdings', 'Commerzbank', 'Sumitomo Mitsui Trust Holdings', 'Ping An Bank', 'Canadian Imperial Bank of Commerce', 'U.S. Bancorp', 'CaixaBank', 'Truist Financial', 'ABN AMRO Group', 'KB Financial Group Inc', 'Shinhan Bank', 'Sberbank of Russia', 'Nomura Holdings', 'DBS 
Bank', 'Itaú Unibanco', 'PNC Financial Services', 'Huaxia Bank', 'Nonghyup Bank', 'Capital One', 'Bank of Beijing', 'The Bank of New York Mellon', 'Banco do Brasil', 'Hana Financial Group', 'OCBC Bank', 'Banco Bradesco', 'Handelsbanken', 'Caixa Econômica Federal', 'KBC Bank', 'China Guangfa Bank', 'Nationwide Building 
Society', 'Woori Bank', 'DNB ASA', 'SEB Group', 'Bank of Shanghai', 'United Overseas Bank', 'Bank of Jiangsu', 'La Banque postale', 'Landesbank Baden-Württemberg', 'Erste Group', 'Industrial Bank of Korea', 'Qatar National Bank', 'Banco Sabadell', 'Swedbank', 'BayernLB', 'State Street Corporation', 'China Zheshang Bank', 'Bankia']
****************************************************************************************************
['China', 'United States', 'Japan', 'France', 'South Korea', 'United Kingdom', 'Canada', 'Germany', 'Spain', 'Australia', 'Brazil', 'Netherlands', 'Singapore', 
'Sweden', 'Italy', 'Switzerland', 'Austria', 'Belgium', 'Denmark', 'Finland', 'India', 'Luxembourg', 'Norway', 'Russia']
****************************************************************************************************
['JPMorgan Chase', 'Industrial and Commercial Bank of China', 'Bank of America', 'Wells Fargo', 'China Construction Bank', 'HSBC Holdings PLC', 'Agricultural Bank of China', 'Citigroup Inc.', 'Bank of China', 'China Merchants Bank', 'Royal 
Bank of Canada', 'Banco Santander', 'Commonwealth Bank', 'Mitsubishi UFJ Financial Group', 'Toronto-Dominion Bank', 'BNP Paribas', 'Goldman Sachs', 'Sberbank of Russia', 'Morgan Stanley', 'U.S. Bancorp', 'HDFC Bank', 'Itaú Unibanco', 'Westpac', 'Scotiabank', 'ING Group', 'UBS', 'Charles Schwab', 'PNC Financial Services', 'Lloyds Banking Group', 'Sumitomo Mitsui Financial Group', 'Bank of Communications', 'Australia and New Zealand Banking Group', 'Banco Bradesco', 'National Australia Bank', 'Intesa Sanpaolo', 'Banco Bilbao Vizcaya Argentaria', 'Japan Post Bank', 'The Bank of New York Mellon', 'Shanghai Pudong Development Bank', 'Industrial Bank (China)', 'Bank of China (Hong Kong)', 'Bank of Montreal', 'Crédit 
Agricole', 'DBS Bank', 'Nordea', 'Capital One', 'Royal Bank of Scotland Group', 
'Mizuho Financial Group', 'Credit Suisse', 'Postal Savings Bank of China', 'China Minsheng Bank', 'UniCredit', 'China CITIC Bank', 'Hang Seng Bank', 'Société Générale', 'Barclays', 'Canadian Imperial Bank of Commerce', 'Bank Central Asia', 
'Truist Financial', 'Oversea-Chinese Banking Corp', 'State Bank of India', 'State Street', 'Deutsche Bank', 'KBC Bank', 'Danske Bank', 'Ping An Bank', 'Standard Chartered', 'United Overseas Bank', 'QNB Group', 'Bank Rakyat']
****************************************************************************************************
['United States', 'China', 'United Kingdom', 'Canada', 'Australia', 'Japan', 'France', 'Spain', 'Brazil', 'India', 'Singapore', 'Switzerland', 'Italy', 'Hong Kong', 'Indonesia', 'Russia', 'Netherlands']
****************************************************************************************************

相关问题 更多 >