如果URL有某个关键字,则打印该URL

2024-04-27 14:49:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个从espn中提取URL的函数。URL看起来像这样http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010

我已经创建了一个国家的名单,我想打印一条消息,如果该网址包含国家从名单中,否则传递到下一个网址

all_countries=['England','India','West Indies']

#one_day will have all the links
for day in one_day:
        d=day.split('-')
        if d in all_countries:
            print(day)
        else:
            next

它不起作用。感谢您的帮助


Tags: incomhttpurlonlywwwallseries
3条回答

因为.split()返回一个列表。您必须迭代列表中的项目。基本上你问电脑的是

["http://www.espncricinfo.com/series/13224/scorecard/426406/scotland", "vs", "england", "only", "odi", "england", "in", "scotland", "odi", "match", "2010"]

在某个列表中看起来是这样的(我假设):

["england", "scotland", "ireland", ...]

我建议你用一些打印的语句。一个简单的print(d)会显示这种行为。您必须迭代d

for word in d:
    if word in all_countries:
        print(word)
        break # otherwise multiple words will trigger your logic multiple times

下面是一个简单的方法(假设one_day是URL列表,all_countries是国家名称列表):

# (some example values for urls and country names) 
one_day = ['http://www.espncricinfo.com/...-vs-australia-only-odi-au...', 
           'http://www.espncricinfo.com/...scotland-vs-england-only-...'] 
all_countries = ['India', 'Ireland', 'Australia'] 

for day in one_day:
  for country in all_countries:
    if country.lower() in day:
      print(f'found a match for {country}: `{day}`')
      # or just: print(day) 

这是因为in检查子字符串,例如:

'Australia'.lower() in '...-vs-australia-only-odi-au...'
## True 

这就是您在条件country.lower() in day内循环的每次迭代中要检查的内容。你知道吗

另外,你也可以在'-'上拆分,就像在原来的帖子中一样,以防你担心类似于'USA'匹配包含'-musac...'的url之类的情况。为此,你可以这样说:

for day in one_day:
  day_split = day.split('-')
  for elem in day_split:
    if elem in [c.lower() for c in all_countries]:
      print(f'found a match: `{day}`')  

或者使用regex更灵活;):

import re

urls = ["http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010",
        "http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010",
        "http://www.espncricinfo.com/series/13240/scorecard/426384/titi-2010"
       ]

countries = ['England',
             'India',
             'West Indies']

for url in urls:
    if bool(re.match('(?i).*?(' + '|'.join(countries).replace(' ', '\W') + ').*?', url)):
        print(url)

结果:

http://www.espncricinfo.com/series/13224/scorecard/426406/scotland-vs-england-only-odi-england-in-scotland-odi-match-2010
http://www.espncricinfo.com/series/13240/scorecard/426384/ireland-vs-australia-only-odi-australia-tour-of-england-and-ireland-2010

相关问题 更多 >