我想从一个网站抓取数据,这是按不同的省份排序。为了更快,我尝试运行两个或更多python脚本来同时对每个省进行爬网。两个脚本之间的唯一区别是它们抓取不同的url集。每次他们在前30秒或1分钟表现良好。但后来我在每个脚本中都出现了以下错误,而且每次都会同时出现:
Traceback (most recent call last):
File "EOLGrades-A.py", line 157, in <module>
dealEachCollege(url_2,tableName2,cNameList[i])
File "EOLGrades-A.py", line 58, in dealEachCollege
insertData(getData(sp),tableName,collegeName)
File "EOLGrades-A.py", line 33, in getData
if FAtd[x+j].text == '--' or FAtd[x+j].text ==' ':
IndexError: list index out of range
Traceback (most recent call last):
File "EOLGrades-B.py", line 157, in <module>
dealEachCollege(url_2,tableName2,cNameList[i])
File "EOLGrades-B.py", line 58, in dealEachCollege
insertData(getData(sp),tableName,collegeName)
File "EOLGrades-B.py", line 33, in getData
if FAtd[x+j].text == '--' or FAtd[x+j].text ==' ':
IndexError: list index out of range
我的getData方法:
^{pr2}$count来自getCount方法:
def getCount(soup):
FAtr = soup.find_all(name='tr')
count = len(FAtr) - 1
return count
dealEachCollege方法:
def dealEachCollege(URL,tableName,collegeName):
page = s.get(URL,headers=headers)
page.encoding='utf-8'
sp = BeautifulSoup(page.text,"html.parser")
count = getCount(sp)
insertData(getData(sp,count),tableName,collegeName,count)
page.close()
insertData方法:
def insertData(dataList,tableName,collegeName,count):
try:
m=dataList
if m[0]=='0':
return
for i in range(count):
cursor.execute("INSERT INTO " + tableName + " VALUES (%s,%s,%s,%s,%s,%s,%s)",(collegeName,m[i][0],m[i][1],m[i][2],m[i][3],m[i][4],m[i][5]))
conn.commit()
print ("Successfully inserted into %s." %tableName)
except pymysql.Error as e:
print ("Mysql Error %d: %s" %(e.args[0], e.args[1]))
当我只运行一个脚本时,并没有出现错误。 谁能告诉我怎么修理它吗?或者其他方法同时运行2个python爬虫程序?非常感谢!在
目前没有回答
相关问题 更多 >
编程相关推荐