如何使用Python从URL解析SVG文档（获取多段线的点）问题的回答

如何使用Python从URL解析SVG文档（获取多段线的点）

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我相信某个地方有一个HTML提取包，但这就是我用核心python和正则表达式模块所要完成的任务。让<code>txt</code>成为您呈现的文本<code><polyline...</code>，因此： 导入正则表达式模块 <pre><code>In [22]: import re </code></pre> 执行搜索： <pre><code>In [24]: g = re.search('polyline points="(.*?)"', txt) </code></pre> 在上面的正则表达式中，我使用<code>polyline points="</code>作为锚点（我省略了<code><</code>，因为它在正则表达式中有一个含义），并捕获所有剩余的部分，直到下一个引号。你知道吗 您想要的文本是通过以下方式实现的： <pre><code>In [25]: g.group(1) Out[25]: '239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280' </code></pre> <h2>更新</h2> 使用xml来解析数据更安全，这里有一种方法(xml.etree文件包含在标准库中）： <pre><code>In [32]: import xml.etree.ElementTree as ET In [33]: root = ET.fromstring(txt) </code></pre> 由于数据已格式化为根标记，因此不需要进一步提取： <pre><code>In [35]: root.tag Out[35]: 'polyline' </code></pre> 所有属性实际上都是XML属性，转换成字典： <pre><code>In [37]: root.attrib Out[37]: {'points': '239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280', 'style': 'fill: none; stroke: #000000; stroke-width: 1; stroke-linejoin: round; stroke-linecap: round; stroke-antialiasing: false; stroke-antialias: 0; opacity: 0.8'} </code></pre> 所以这里有： <pre><code>In [38]: root.attrib['points'] Out[38]: '239,274 239,274 239,274 239,275 239,275 238,276 238,276 237,276 237,276 236,276 236,276 236,277 236,277 235,277 235,277 234,278 234,278 233,279 233,279 232,280 232,280 231,280 231,280 230,280 230,280 230,280 229,280 229,280' </code></pre> 如果您想进一步根据逗号和空格将其拆分为多个组，我将执行以下操作： 使用不带参数的<code>split</code>获取由空格分隔的所有组： <pre><code>>>> p = g.group(1).split() >>> p ['239,274', '239,274', '239,274', '239,275', '239,275', '238,276', '238,276', '237,276', '237,276', '236,276', '236,276', '236,277', '236,277', '235,277', '235,277', '234,278', '234,278', '233,279', '233,279', '232,280', '232,280', '231,280', '231,280', '230,280', '230,280', '230,280', '229,280', '229,280'] </code></pre> 现在，对于每个字符串，在返回字符串列表的逗号处拆分它。我使用<code>map</code>将每个这样的列表转换为<code>int</code>的列表： <pre><code>>>> p2 = [list(map(int, numbers.split(','))) for numbers in p] >>> p2 [[239, 274], [239, 274], [239, 274], [239, 275], [239, 275], [238, 276], [238, 276], [237, 276], [237, 276], [236, 276], [236, 276], [236, 277], [236, 277], [235, 277], [235, 277], [234, 278], [234, 278], [233, 279], [233, 279], [232, 280], [232, 280], [231, 280], [231, 280], [230, 280], [230, 280], [230, 280], [229, 280], [229, 280]] </code></pre> 这会给我们带来更多的启示： <pre><code>>>> '123,456'.split(',') ['123', '456'] >>> list(map(int, '123,456'.split(','))) [123, 456] </code></pre>

如何使用Python从URL解析SVG文档（获取多段线的点）

1 个回答

相关Python问题