如何转换。php.html到Python中的csv

2条回答

网友

1楼 · 编辑于 2024-05-13 19:28:06

使用requests和lxml：

import requests
from lxml.html import fromstring
from lxml.html.clean import Cleaner
import string


# download response
response = requests.get('http://water.weather.gov/ahps2/crests.php?wfo=lch&gage=bsll1&crest_type=historic')
html = response.text

现在有了原始的html文本。你得把这些标签去掉。这里我们使用lxml，一个python库来处理HTML/XML文本。函数的作用是：将字符串解析为元素。在

^{pr2}$

确定要移除的标签。Cleaner类清除html文档中有问题的标记，因此我们创建一个Cleaner对象，传递一个要被黑名单的类变量列表（以及要删除的标记）。请参见lxml Cleaner class documentation，了解每个属性默认设置为什么。请注意，remove_tags只剥离标记，而不剥离内容。在

cleaner = Cleaner(**args)
path = '/html/body'
body = doc.xpath(path)[0] #only interested in the body of the response
clean_response = cleaner.clean_html(body).text_content() #clean!

# split into lines.
table = clean_response.splitlines()

#parse whichever way you wish to
#your code here

网友

2楼 · 编辑于 2024-05-13 19:28:06

从网站中提取数据的过程称为webscraping。在

这段代码可以帮助您：

from bs4 import BeautifulSoup
import urllib2

url = 'http://water.weather.gov/ahps2/crests.php?wfo=lch&gage=bsll1&crest_type=historic'
#read html page using urlopen() method
r = urllib2.urlopen(url).read()

#create soup to navigate through tags
soup = BeautifulSoup(r, 'lxml')

#find the data inside the div mark, under the water_information class tag
results = soup.find('div', {'class':'water_information'})

#get only text from the results soup
water_data = results.text

#write this info to an output file
with open('outputfile.txt', 'w') as f:
    f.write(water_data)

这是我的outputfile.txt内容的示例：

^{pr2}$

现在，您可以通过使用regex和split()创建自己的CSV文件，轻松处理water_data字符串。在

你没想到我会为你写的，对吧？:P

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何转换。php.html到Python中的csv

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >