Python网络爬虫

1条回答

网友

1楼 · 发布于 2024-04-26 06:33:56

下面是一种循环遍历URL数组并从每个URL导入数据的方法。你知道吗

import urllib 
import re 
import json
dateslist = open("C:/Users/rshuell001/Desktop/dates/dates.txt").read() dateslistlist = thedates.split("\n")
for thedate in dateslist: 
    myfile = open("C:/Users/rshuell001/Desktop/dates/" + thedate +".txt", "w+") 
    myfile.close()

    htmltext = urllib.urlopen("http://www.hockey-reference.com/friv/dailyleaders.cgi?month=" + themonth + "& day=" theday "& year=" theyear "")
    data = json.load(htmltext)
    datapoints = data["data_values"]

    myfile = open("C:/Users/rshuell001/Desktop/dates/" + thedate +".txt", "a")
    for point in datapoints:
            myfile.write(str(symbol+","+str(point[0])+","+str(point[1])+"\n"))
    myfile.close()

import requests
from bs4 import BeautifulSoup

base_url = "http://www.privredni-imenik.com/pretraga?abcd=&keyword=&cities_id=0&category_id=0&sub_category_id=0&page=1"
current_page = 1

while current_page < 200:
    print(current_page)
    url = base_url + str(current_page)
    #current_page += 1
    r = requests.get(url)
    zute_soup = BeautifulSoup(r.text, 'html.parser')
    firme = zute_soup.findAll('div', {'class': 'jobs-item'})

    for title in firme:
        title1 = title.findAll('h6')[0].text
        print(title1)
        adresa = title.findAll('div', {'class': 'description'})[0].text
        print(adresa)
        kontakt = title.findAll('div', {'class': 'description'})[1].text
        print(kontakt)
        print('\n')
        page_line = "{title1}\n{adresa}\n{kontakt}".format(
            title1=title1,
            adresa=adresa,
            kontakt=kontakt
        )
    current_page += 1

请记住，有很多，很多，很多种方法来做这类事情，每个网站都是不同于所有其他网站，所以最终的结果你会提出高度定制和非常具体的预期用途。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python网络爬虫

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >