HTML格式:
<td>
<p>China's Changing Trade Structure and its Implications
<br>
Kevin Chow, Xiao Hong, John Fu and Sylvia Li
</p>
<p>25 August 2017
<br>
<a href="/media/eng/publication-and-research/research/research-memorandums/2017/RM13-2017.pdf" target="_blank">Full Paper</a>
(PDF File, 465KB)
</p>
</td>
我获得了图中所示的“a”,并试图获得标题:“中国不断变化的贸易结构及其影响”和日期:“2017年8月25日”,分别使用“a”的相对路径。但我拿不到。代码如下:
for a in response.xpath('//div[@class="prContent"]//a[@href]'):
url = response.urljoin(a.xpath('@href').extract_first())
title = extract_text(a.xpath('../../p[1]/text()[1]'))
您可以尝试以下表达式以获得所需的输出:
要获得
"China's Changing Trade Structure and its Implications"
:要获得
"25 August 2017"
:另外,只有在正确定义链接(
a
)的情况下,这才有效相关问题 更多 >
编程相关推荐