字符串中的Python搜索模式

2024-06-17 09:36:32 发布

您现在位置:Python中文网/ 问答频道 /正文

你好,能帮我吗?我花了很多天的时间来制作一个能找到模式的剧本。你知道吗

我的剧本是:

<meta content="" property="news_keywords"/>
<meta content="Tough and Truthful - Bostonians read the Boston Herald for solid reporting, whether in print or online, on the issues affecting their daily lives. The Boston Herald gets people talking. Our reporters are second-to-none, our photographers are Pulitzer Prize-winning and we present news that Bostonians care about and respond to." property="description"/>
<meta content='{"link":"http:\/\/bostonherald.com\/","type":"frontpage"}' name="parsely-page"/><meta content="" property="keywords"/>
<meta content="Drupal 7 (http://drupal.org)" name="generator"/>
<link href="http://www.bostonherald.com/" rel="canonical"/>
<link href="http://www.bostonherald.com/" rel="shortlink"/>
<meta content="420" http-equiv="refresh"/>
<link href="http://www.bostonherald.com/sites/default/files/images/favicon.ico" rel="shortcut icon" type="image/vnd.microsoft.icon"/>
<title>Boston Herald | Boston Herald</title>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/modules/system/system.base.css?nd76bo");
@import url("http://www.bostonherald.com/modules/system/system.menus.css?nd76bo");
@import url("http://www.bostonherald.com/modules/system/system.messages.css?nd76bo");
@import url("http://www.bostonherald.com/modules/system/system.theme.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/modules/aggregator/aggregator.css?nd76bo");
@import url("http://www.bostonherald.com/modules/comment/comment.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/date/date_api/date.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/date/date_popup/themes/datepicker.1.7.css?nd76bo");
@import url("http://www.bostonherald.com/modules/field/theme/field.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/mollom/mollom.css?nd76bo");
@import url("http://www.bostonherald.com/modules/node/node.css?nd76bo");
@import url("http://www.bostonherald.com/modules/poll/poll.css?nd76bo");
@import url("http://www.bostonherald.com/modules/user/user.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/views/css/views.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/modules/ctools/css/ctools.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/lightbox2/css/lightbox.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/panels/css/panels.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/rate/rate.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish-vertical.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/libraries/superfish/css/superfish-navbar.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/views_slideshow/views_slideshow.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/jcarousel/skins/default/jcarousel-default.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/modules/panels/plugins/layouts/twocol_stacked/twocol_stacked.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/basics.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/custom_blocks.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/navigation.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/view-story_slots.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/taxonomy/taxonomy-styles.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/bhr.css?nd76bo");</style>
<style media="print" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/print.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/alpha-reset.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/alpha-alpha.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/formalize.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/omega-branding.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/omega/css/omega-forms.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/layout-front.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/global.css?nd76bo");</style>
<style media="all" type="text/css">@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/ike-omega-alpha-default.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/ike_omega/css/ike-omega-alpha-default-normal.css?nd76bo");
@import url("http://www.bostonherald.com/sites/all/themes/omega/alpha/css/grid/alpha_default/normal/alpha-default-normal-24.css?nd76bo");</style>

模式是

<meta content(+.?)refresh">

弦太大了,所以我尝试了不同的方法,但都不管用。我不喜欢把字符串保存在任何txt文件中。你知道吗

我试过剧本,但没有成功。你知道吗

#Try 1
import re
re.findall("<meta content(+.?)refresh">",html)

#Try 2
matching = [s for s in html if "<meta content(+.?)refresh">" in s]

Tags: importcommoduleshttpurlstylewwwall
1条回答
网友
1楼 · 发布于 2024-06-17 09:36:32

评论中的问题是:“我想抓取字符串中以“meta content”开头,以“refresh”>;结尾的部分。”

我把它分成几行,因为这样^匹配每行的开头,而不是整个字符串。我用^来匹配开头,用$来匹配结尾。事实上,这些可能没有必要,因为<;和>;就足够了。还要注意,双引号是由它前面的斜杠字符转义的。你知道吗

另一个关键点:不是+。?但是呢?这样可以抓取字符串中间的所有字符。你知道吗

>>> import re
>>> for line in html.splitlines():
...     m = re.match("^<meta content(.*?)refresh\"/>$", line)
...     if m:
...         print(m.group(0))
...
<meta content="420" http-equiv="refresh"/>

Python正则表达式的文档可以在这里找到:https://docs.python.org/2/library/re.html

相关问题 更多 >