我试图写csv文件与url和id作为输入文件,但我不明白
我有以下格式的csv文件:
ID Links
P51800010436 https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTcxNzkmRGl2aXNpb249NiZVc2VySUQ9MzQ5MjAmUm9sZUlEPTEmQXBwSUQ9NzUzNjYmQWN0aW9uPVNFQVJDSCZDaGFyYWN0ZXJEPTI2JkV4dEFwcElEPQ%3d%3d
P51800001202 https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTMxOTcmRGl2aXNpb249NiZVc2VySUQ9MjU5MjQmUm9sZUlEPTEmQXBwSUQ9MjM3MzQmQWN0aW9uPVNFQVJDSCZDaGFyYWN0ZXJEPTk3JkV4dEFwcElEPQ%3d%3d
P51800000150 https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTY1NSZEaXZpc2lvbj02JlVzZXJJRD03MjU3JlJvbGVJRD0xJkFwcElEPTExOTY2JkFjdGlvbj1TRUFSQ0gmQ2hhcmFjdGVyRD04MSZFeHRBcHBJRD0%3d
P51800001785 https://maharerait.mahaonline.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTU2NjUmRGl2aXNpb249NiZVc2VySUQ9MjgxODEmUm9sZUlEPTEmQXBwSUQ9MjY4NjcmQWN0aW9uPVNFQVJDSCZDaGFyYWN0ZXJEPTIxJkV4dEFwcElEPQ%3d%3d
我试过的剧本:
from datetime import datetime
start_time = datetime.now()
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
import re
import csv
link = []
rera_id = []
with open('D:/TF_Vishnu/link_with_rera_id.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
rera_id.append(row[0])
link.append(row[1])
for index, rera_id, url in enumerate(rera_id, link):
df_url = pd.read_csv(pd.compat.StringIO(url), header=None)
df_rera_id = pd.read_csv(pd.compat.StringIO(rera_id), header=None)
html=requests.get(url).content
soup=BeautifulSoup(html, 'lxml')
if (soup.find(text="Other Than Individual") == "Other Than Individual"):
print ("Processing Other Than Individual Link.......")
table = soup.find_all("table",{"class":"table table-bordered table-responsive table-striped"})[1]
df_2 = pd.concat([df_rera_id, df_url, df, df_1], axis=1)
df_2.to_csv('D:/scrape_data/test.csv', index=False, header=False, mode='a'))
我想写csv文件使用熊猫的方式,如第一列-rera\u id,第二个链接,第三个数据。。。。以此类推
请帮忙提些建议。对任何错误表示歉意
获取错误:
TypeError:“list”对象不能解释为整数
问题是在使用
enumerate
内置函数时。第二个(可选)参数不是作为另一个iterable对象处理的,而是作为枚举变量的初始值(在您的例子中是index
)-这就是它需要和integer的原因。最好尝试直接枚举reader
:希望有帮助
相关问题 更多 >
编程相关推荐