我尝试了很多方法来解决这些数据,但都不管用strip()
和replace()
方法在图1中看起来像,它不起作用。请帮帮我
` improt requests
from lxml import html,etree
from selenium import webdriver
import time
file_name = 'dubanxinlixue.json'
driver = webdriver.Chrome()
url_string = []
name_data, price_data = [], []
jd_goods_data = {}
page = 0
while True:
url = 'https://book.douban.com/tag/%E5%BF%83%E7%90%86%E5%AD%A6?start={page}&type=S'.format(page=page)
url_string.append(url)
page += 20
if page > 980:
break
for i in url_string:
driver.get(i)
base_html = driver.page_source
selctor = etree.HTML(base_html)
j = 1
for j in range(20):
j += 1
name = '//*[@id="subject_list"]/ul/li[%d]/div[2]/h2/a[1]/@title'%(j)
get_name =selctor.xpath(name)[0]
describe = '//*[@id="subject_list"]/ul/li[%d]/div[2]/div[1]/text()'%(j)
get_describe = selctor.xpath(describe)[0]
get_describe.string.strip()
print(get_describe)`
the get_describe looks like this ,[the result of get_describe][1]
相关问题 更多 >
编程相关推荐