擅长:python、mysql、java
<p>使用XML解析器的recover选项:</p>
<pre><code>parser = etree.XMLParser(recover=True)
EstadoDoc_root = etree.fromstring(sub_element.text, parser=parser)
</code></pre>
<p>然后获取URL(或将其更改为您需要的任何内容):</p>
^{pr2}$
<blockquote>
<p>The second URL is missing the portion of the URL that comes after &
... Is there a way to avoid this?</p>
</blockquote>
<p>使用html解析器规范化和处理违规字符(注意小写标记)</p>
<pre><code>from lxml import html
EstadoDoc_root = html.fromstring(sub_element)
print [x.text for x in EstadoDoc_root.xpath('//urlcaratula|//urlpdf')]
['http://G500603svGLH:8080/Facturacion/PDFServlet?docId=uR1v4VhQHvmQJLl22c1DFOLW3c4qbQ47',
'http://G500603svGLH:8080/Facturacion/XMLServlet?docId=&uR1v4VhQHvmQJLl22c1DFOLW3c4qbQ47']
</code></pre>