从文本Python解析代码

['[u"I want to use a track-bar to change a form\'s opacity.

 This is my code:

<pre><code>decimal trans = trackBar1.Value / 5000;
this.Opacity = trans;
</code></pre>

 When I try to build it, I get this error:

<blockquote>
 Cannot implicitly convert type \'decimal\' to \'double\'. 
</blockquote>

I tried making <code>trans</code> a <code>double</code>, but then the control doesn\'t work.', '", u\'This code has worked fine for me in VB.NET in the past.', '\', u"
 When setting a form\'s opacity should I use a decimal or double?"]']

2条回答

网友

1楼 · 编辑于 2024-05-16 22:21:00

您可以使用XPath提取code内容（使用lxml库将有所帮助），然后选择其他所有内容来提取文本内容，例如：

import lxml.etree


data = '''<p>I want to use a track-bar to change a form's opacity.</p>
          <p>This is my code:</p> <pre><code>decimal trans = trackBar1.Value / 5000; this.Opacity = trans;</code></pre>
          <p>When I try to build it, I get this error:</p>
          <p>Cannot implicitly convert type 'decimal' to 'double'.</p>
          <p>I tried making <code>trans</code> a <code>double</code>.</p>'''

html = lxml.etree.HTML(data)
code_blocks = html.xpath('//code/text()')
text_blocks = html.xpath('//*[not(descendant-or-self::code)]/text()')

网友

2楼 · 编辑于 2024-05-16 22:21:00

最简单的方法可能是对文本应用正则表达式，匹配标记“' and '”。这样你就可以找到代码块了。不过，你不会说你以后会怎么处理他们。所以。。。在

from itertools import zip_longest

sample_paras = [
    """<p>I want to use a track-bar to change a form\'s opacity.</p>&#xA;&#xA;<p>This is my code:</p>&#xA;&#xA;<pre><code>decimal trans = trackBar1.Value / 5000;&#xA;this.Opacity = trans;&#xA;</code></pre>&#xA;&#xA;<p>When I try to build it, I get this error:</p>&#xA;&#xA;<blockquote>&#xA;  <p>Cannot implicitly convert type \'decimal\' to \'double\'. </p>&#xA;</blockquote>&#xA;&#xA;<p>I tried making <code>trans</code> a <code>double</code>, but then the control doesn\'t work.""",
    """This code has worked fine for me in VB.NET in the past.""",
    """</p>&#xA; When setting a form\'s opacity should I use a decimal or double?""",
]

single_block = " ".join(sample_paras)

import re
separate_code = re.split(r"</?code>", single_block)

text_blocks, code_blocks = zip(*zip_longest(*[iter(separate_code)] * 2))

print("Text:\n")
for t in text_blocks:
    print(" ")
    print(t)

print("\n\nCode:\n")
for t in code_blocks:
    print(" ")
    print(t)

相关问题更多 >

编程相关推荐

热门问题

热门文章