python2.7中for循环中使用的定义函数

from BeautifulSoup import BeautifulSoup import requests import re from collections import defaultdict import itertools import pandas as pd def wego(weburl,annot): print 'Go Term: ', weburl.split('=')[-1] html=requests.get(weburl).text soup=BeautifulSoup(html) desc=r"desc=\".*\"" print "GO leave 2 term:",(re.findall(desc,str(soup))[0].split('"')[1]) pattern=r"Unigene.*A" idDF = pd.DataFrame(columns=['GeneID']) #creates a new datafram idDF['GeneID'] = pd.Series(re.findall(pattern,str(soup))).unique() print "Total Go term is :",idDF.shape[0] old=pd.read_csv(annot,usecols=[0,7,8]) getset=pd.merge(left=idDF,right=old,left_on=idDF.columns[0],\ right_on=old.columns[0]) updown=getset.groupby(getset.columns[1]).count() print updown print "Max P-value: ","{:.3e}".format(getset['P-value'].max()) with open("gourl.txt") as ur: d=[] for url in ur: we=wego(url,annot="file.csv") d.append(we)

IndexError: list index out of range IndexErrorTraceback (most recent call last) <ipython-input-79-a852fe95d69c> in <module>() 2 d=[] 3 for url in ur: ----> 4 we=wego(url,annot="file.csv") 5 d.append(we) <ipython-input-4-9fdf25e75434> in wego(weburl, annot) 5 soup=BeautifulSoup(html) 6 desc=r"desc=\".*\"" ----> 7 print "GO leave 2 term:",(re.findall(desc,str(soup)) [0].split('"')[1]) 8 pattern=r"Unigene.*A" 9 idDF = pd.DataFrame(columns=['GeneID']) #creates a new dataframe IndexError: list index out of range

2条回答

网友

1楼 · 编辑于 2024-06-16 13:18:46

如果你看看你给我们的堆栈跟踪，你可以看到答案。最后一行表示您试图访问的列表元素不存在（“超出范围”）

print "GO leave 2 term:",(re.findall(desc,str(soup))[0].split('"')[1])

你在这一行做了两个列表访问。一个得到第一个匹配模式，一个得到由split('"')产生的第二个项。你知道吗

所以第二个url可能没有您期望的模式。你知道吗

你可以这样使用：

matches = re.findall(desc, str(soup))
tokens = []
if matches:
    tokens = matches[0].split('"')
if len(tokens) > 1:
    print("GO leave 2 term:", tokens[1])

网友

2楼 · 编辑于 2024-06-16 13:18:46

很高兴问题有了答案。问题在我的古尔.txt文件读取时。我将展示以下内容：

>>> with open("wegourl.txt") as ur:
...     d=[]
...     for url in ur:
...         print url
...         

http://stackoverflow.com/questions=1

http://stackoverflow.com/questions=2

毫无疑问，换行引起的空行不是合法的URL，中断了这个脚本。我可以在读取文件时修改\n just rid of：url=url.strip('\n')

相关问题更多 >

编程相关推荐

热门问题

热门文章