擅长:python、mysql、java
<p>你应该看看<a href="https://stackoverflow.com/questions/181095/regular-expression-to-extract-text-from-html">regular expression to extract text from HTML</a></p>
<p>从那篇文章中:</p>
<blockquote>
<p>You can't really parse HTML with regular expressions. It's too
complex. RE's won't handle will work in
a browser as proper text, but might baffle a naive RE.</p>
<p>You'll be happier and more successful with a proper HTML parser.
Python folks often use something Beautiful Soup to parse HTML and
strip out tags and scripts.</p>
<p>Also, browsers, by design, tolerate malformed HTML. So you will often
find yourself trying to parse HTML which is clearly improper, but
happens to work okay in a browser.</p>
<p>You might be able to parse bad HTML with RE's. All it requires is
patience and hard work. But it's often simpler to use someone else's
parser.</p>
</blockquote>