擅长:python、mysql、java
<p>从网站中提取数据的过程称为<a href="https://en.wikipedia.org/wiki/Web_scraping" rel="nofollow noreferrer">webscraping</a>。在</p>
<p>这段代码可以帮助您:</p>
<pre><code>from bs4 import BeautifulSoup
import urllib2
url = 'http://water.weather.gov/ahps2/crests.php?wfo=lch&gage=bsll1&crest_type=historic'
#read html page using urlopen() method
r = urllib2.urlopen(url).read()
#create soup to navigate through tags
soup = BeautifulSoup(r, 'lxml')
#find the data inside the div mark, under the water_information class tag
results = soup.find('div', {'class':'water_information'})
#get only text from the results soup
water_data = results.text
#write this info to an output file
with open('outputfile.txt', 'w') as f:
f.write(water_data)
</code></pre>
<p>这是我的<code>outputfile.txt</code>内容的示例:</p>
^{pr2}$
<p>现在,您可以通过使用<code>regex</code>和<code>split()</code>创建自己的CSV文件,轻松处理<code>water_data</code>字符串。在</p>
<p>你没想到我会为你写的,对吧?<code>:P</code></p>