IndexError: 列表索引超出范围?
我在运行这段代码时总是遇到“列表索引超出范围”的错误。这段代码是用来解析网站的表格,通过访问网站的不同页面,把数据输入到一个Excel表格里。
错误出现在这一行:revenue = cols[0].string:
from urllib.request import urlopen
from bs4 import BeautifulSoup
from openpyxl import Workbook
from openpyxl.cell import get_column_letter
import datetime
now = datetime.datetime.now()
wb = Workbook()
dest_filename = r'iOS Top Grossing Data.xlsx'
ws = wb.active
ws = wb.create_sheet()
ws.title = now.strftime("%m-%d-%y")
sh = wb.get_sheet_by_name('Sheet')
wb.remove_sheet(sh)
ws['A1'] = "REVENUE"
ws.column_dimensions['A'].width = 11
ws.cell('A1').style.alignment.horizontal = 'center'
ws.cell('A1').style.font.bold = True
ws['B1'] = "FREE"
ws.column_dimensions['B'].width = 7
ws.cell('B1').style.alignment.horizontal = 'center'
ws.cell('B1').style.font.bold = True
ws['C1'] = "PAID"
ws.column_dimensions['C'].width = 7
ws.cell('C1').style.alignment.horizontal = 'center'
ws.cell('C1').style.font.bold = True
ws['D1'] = "GAME"
ws.column_dimensions['D'].width = 27
ws.cell('D1').style.alignment.horizontal = 'center'
ws.cell('D1').style.font.bold = True
ws['E1'] = "PRICE"
ws.column_dimensions['E'].width = 7
ws.cell('E1').style.alignment.horizontal = 'center'
ws.cell('E1').style.font.bold = True
ws['F1'] = "REVENUE"
ws.column_dimensions['F'].width = 11
ws.cell('F1').style.alignment.horizontal = 'center'
ws.cell('F1').style.font.bold = True
ws['G1'] = "ARPU INDEX"
ws.column_dimensions['G'].width = 15
ws.cell('G1').style.alignment.horizontal = 'center'
ws.cell('G1').style.font.bold = True
ws['H1'] = "DAILY NEW USERS"
ws.column_dimensions['H'].width = 17
ws.cell('H1').style.alignment.horizontal = 'center'
ws.cell('H1').style.font.bold = True
ws['I1'] = "DAILY ACTIVE USERS"
ws.column_dimensions['I'].width = 19
ws.cell('I1').style.alignment.horizontal = 'center'
ws.cell('I1').style.font.bold = True
ws['J1'] = "ARPU"
ws.column_dimensions['J'].width = 7
ws.cell('J1').style.alignment.horizontal = 'center'
ws.cell('J1').style.font.bold = True
ws['K1'] = "RANK CHANGE"
ws.column_dimensions['K'].width = 14
ws.cell('K1').style.alignment.horizontal = 'center'
ws.cell('K1').style.font.bold = True
page = 0
while page < 6:
page += 1
url = "http://thinkgaming.com/app-sales-data/?page=" + str(page)
html = str(urlopen(url).read())
soup = BeautifulSoup(html)
table = soup.find("table")
counter = 0
while counter < 51:
rows = table.findAll('tr')[counter]
cols = rows.findAll('td')
revenue = cols[0].string
revenue = revenue.replace('\\n', '')
revenue = revenue.replace(' ', '')
free = cols[1].string
free = free.replace('\\n', '')
free = free.replace(' ', '')
paid = cols[2].string
paid = paid.replace('\\n', '')
paid = paid.replace(' ', '')
game = cols[3].string
price = cols[4].string
price = price.replace('\\n', '')
price = price.replace(' ', '')
revenue2 = cols[5].string
revenue2 = revenue2.replace('\\n', '')
revenue2 = revenue2.replace(' ', '')
dailynewusers = cols[6].string
dailynewusers = dailynewusers.replace('\\n', '')
dailynewusers = dailynewusers.replace(' ', '')
cell_location = counter
cell_location += 1
ws['A'+str(cell_location)] = revenue
counter += 1
wb.save(filename = dest_filename)
这是错误的详细信息:
Traceback (most recent call last):
File "C:\Users\shiver_admin\Desktop\script.py", line 89, in <module> revenue = cols[0].string IndexError: list index out of range
1 个回答
3
和评论里说的一样,你没有找到任何 <td>
标签,主要是因为它们根本不存在,尤其是索引 [0]
的地方。这个表格里的第一个 <tr>
标签是这样的:
如果你仔细看,会发现里面有表头。简单来说,你应该把你的 counter
从 1 开始,而不是从 0 开始。
另外一种确保你获取到正确行的方法是检查它们是否有类名。如果你注意到,正确的 <tr>
行里面有类名(比如 odd
和 even
)。你可以使用类似 table.find_all("tr", class_=True)
的方法来获取这些行。
示例代码(注意:这是用 Python 2.7 写的,但很容易修改成适合 Python 3.x 的版本):
import requests as rq
from bs4 import BeautifulSoup as bsoup
url = "http://thinkgaming.com/app-sales-data/?page=1"
r = rq.get(url)
soup = bsoup(r.content)
table = soup.find("table", class_="table")
rows = table.find_all("tr", class_=True)
cols = [td.get_text().strip().encode("utf-8") for td in rows[0].find_all("td")]
print cols
结果:
['1', '10', '-', 'Clash of Clans', 'Free', 'n/a', '44,259']
[Finished in 2.8s]
如果这对你有帮助,请告诉我们。