Python正则表达式，用于根据子字符串删除刮取结果？

def find_jobs(self, company, soup): allowed = re.compile(r"Developer|Designer|Engineer|Admin|Manager|Writer|Executive|Lead|Analyst|Editor|" r"Associate|Architect|Recruiter|Specialist|Scientist|Support|Expert|SSE|Head" r"Producer|Evangelist|Ninja", re.IGNORECASE) not_allowed = ['responsibilities', 'description', 'requirements', 'experience', 'empowering', 'engineering', 'find', 'skills', 'recruiterbox', 'google', 'communicating', 'associated', 'internship', 'proficient', 'leadsquared', 'referral', 'should', 'must', 'become', 'global', 'degree', 'good', 'capabilities', 'leadership', 'services', 'expertise', 'architecture', 'hire', 'follow', 'procedures', 'conduct', 'perk', 'missed', 'generation', 'search', 'tools', 'worldwide', 'contact', 'question', 'intern', 'classes', 'trust', 'ability', 'businesses', 'join', 'industry', 'response', 'you', 'using', 'work', 'based', 'grow', 'provide'] profile_list = set() k = soup.body.findAll(text=allowed) for i in k: if len(i) < 60 and not any(x in i.lower() for x in not_allowed): profile_list.add(i.strip().upper()) self.update_jobs(company, profile_list)

2条回答

网友

1楼 · 编辑于 2024-04-27 05:10:03

看起来你的正则表达式写错了。你的notallowed regex实际上是在寻找那些单词是行中唯一的项目。你知道吗

re.compile(r'^something_i_dont_like$')将匹配某个我不喜欢的项，如果它是行中唯一的项

如果你想省略一些东西，你需要做一个消极的展望

re.compile(r'^((?!something_i_dont_like).)*$')

网友

2楼 · 编辑于 2024-04-27 05:10:03

正则表达式

^ability$

意思是“该行仅由“能力”一词组成”。如果需要子字符串，只需更改为

ability

如果你想省略“能力”这个词，而不是“残疾”，那么使用

\bability\b

相关问题更多 >

编程相关推荐

热门问题

热门文章