import pandas as pd
# read_excel returns list of dataframes.
# In this case we know there is only one in the page
df = pd.read_html('http://www.basketball-reference.com/leagues/NBA_2015_per_poss.html',
attrs={'id': 'per_poss'})[0]
# the headers repeat every 20 lines, filtering them out
df = df[df['Rk'] != 'Rk']
# inserting 0 to empty cells
# could also use inplace=True kwarg instead of reassigning, or pass a
# dictionary to use different value for each column
df = df.fillna(0)
我的建议是:使用
pandas.DataFrame
。它可以从许多源加载数据,包括HTML您可以使用
fillna
方法轻松地处理空单元格考虑这个例子:
相关问题 更多 >
编程相关推荐