我有一个python项目,它爬网一些*.txt文件,并将每个*.txt文件中的一些单词放入一个新文件中。
这个项目在我的第一台设备上运行良好,输出是一个内容正确的.txt文件;但在我的第二台设备上,它运行正常,没有错误,但它创建了一个空的.txt文件。
python版本是相同的。两者都有windows10。你知道吗
代码如下:
import re
#pattern to find
pattern_name_start=r'id="p-name">'
pattern_name_end=r'</div>'
crawlfile=open("product-name.txt","w")
for j in range(10):
#creating file locations and assigning it to $address
address="pages/{0}.txt".format(j)
#opening webpage file which is saved in .txt format and reading its content
pagesfile=open(address,"r")
pagetext=pagesfile.read()
#establishing first character location of the iran-code and generating gs1 code and writing it in the file
pn=""
product_name=""
matchname=re.search(pattern_name_start,pagetext)
if matchname:
strtchar=matchname.start()
#49 is the number of id="p-name characters + number of spaces
for i in range (49,350):
pn=pn+pagetext[strtchar+i]
matchnameend=re.search(pattern_name_end,pn)
if matchnameend:
endchar=matchnameend.start()
#32 is the number of spaces
for i in range(endchar-33):
product_name=product_name+pn[i]
crawlfile.write(product_name+ '\n')
pagesfile.close()
crawlfile.close()
目前没有回答
相关问题 更多 >
编程相关推荐