使用Scrapy抓取数据
我想从这个链接 http://money.moneygram.com.au/
抓取数据(这个链接也可以是 https://www.moneygram.com/wps/portal/moneygramonline/home/estimator?LC=en-GB
)。当我打开这个网页准备抓取数据时,我发现页面上显示的汇率是从澳元(AUD)到美元(USD)。但是我想在第一个下拉菜单中选择印度卢比(INR)。每次我选择后,使用这个网址时,它默认还是选澳元(AUD)。我现在用的代码是...
from __future__ import absolute_import
#import __init__
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import MySQLdb
class DmozSpider(BaseSpider):
name = "moneygram"
allowed_domains = ["moneygram.com"]
start_urls = ["http://money.moneygram.com.au/"]
def parse(self, response):
filename = response.url.split("/")[-2]
open(filename, 'wb').write(response.body)
hxs = HtmlXPathSelector(response)
而我想要选择印度卢比的HTML部分是...
<div class="firstSelector">
<select id="FromCurrency_dropDown" name="FromCurrency_dropDown" style="width:100%">
<option value="AED">AED (UAE Dirham)</option>
<option value="ARS">ARS (Argentine Peso)</option>
<option selected="selected" value="AUD">AUD (Australian Dollar)</option>
<option value="BGN">BGN (Bulgarian Lev)</option>
<option value="BND">BND (Brunei Dollar)</option>
<option value="BRL">BRL (Brazilian Real)</option>
<option value="CAD">CAD (Canadian Dollar)</option>
<option value="CHF">CHF (Swiss Franc)</option>
<option value="CLP">CLP (Chilean Peso)</option>
<option value="CNH">CNH (Chinese Renminbi Off-Shore)</option>
<option value="CNY">CNY (Chinese Yuan)</option>
<option value="CZK">CZK (Czech Koruna)</option>
<option value="DKK">DKK (Danish Kroner)</option>
<option value="EGP">EGP (Egyptian Pound)</option>
<option value="EUR">EUR (Euro)</option>
<option value="FJD">FJD (Fiji Dollar)</option>
<option value="GBP">GBP (British Pound)</option>
<option value="HKD">HKD (Hong Kong Dollar)</option>
<option value="HUF">HUF (Hungarian Forint)</option>
<option value="IDR">IDR (Indonesian Rupiah)</option>
<option value="ILS">ILS (Israeli New Shekel)</option>
<option value="INR">INR (Indian Rupee)</option> ////////////"i want this to be selected"///////
<option value="ISK">ISK (Icelandic Krona)</option>
<option value="JPY">JPY (Japanese Yen)</option>
<option value="KRW">KRW (Korean Won)</option>
<option value="KWD">KWD (Kuwaiti Dinar)</option>
<option value="LKR">LKR (Sri Lanka Rupee)</option>
<option value="MAD">MAD (Moroccan Dirham)</option>
<option value="MGA">MGA (Malagasy Ariary)</option>
<option value="MXN">MXN (Mexican Peso)</option>
<option value="MYR">MYR (Malaysian Ringgit)</option>
<option value="NOK">NOK (Norway Kroner)</option>
<option value="NZD">NZD (New Zealand Dollar)</option>
<option value="OMR">OMR (Omani Rial)</option>
<option value="PEN">PEN (Peruvian Nuevo Sol)</option>
<option value="PGK">PGK (Papua New Guinea Kina)</option>
</div>
1 个回答
0
这是一个关于ajax的问题。
简单来说,js脚本会把一些参数发送到服务器,然后服务器会把数据返回给你。
你可以使用Chrome浏览器的开发者工具,查看发送数据的具体网址。
这个网址是 "http://money.moneygram.com.au/forex-tools/currency-converter-widget-part"。
发送的参数包括: "FromCurrency=AED&ToCurrency=VND&FromCurrency_dropDown=AED&ToCurrency_dropDown=VND&FromAmount=2561&ToAmount=&X-Requested-With=XMLHttpRequest"。
所以你可以使用scrapy这个工具,向这个网址发送这些参数,获取html数据,然后解析出你想要的信息。