在python中对regex match中的唯一值进行排序

import sys import re file = open('/Users/me/Desktop/test.txt', 'r') temp =[] for line in file.readlines(): if '->' in line: temp = line.split('->') elif '=>' in line: temp = line.split('=>') if temp: #temp[1].strip() pattern = re.match('^\x20\w{1,}@\w{1,}\.\w{2,3}\x20?', str(temp[1]), re.M) if pattern is not None: print pattern.group() else: print "nono"

3条回答

网友

1楼 · 编辑于 2024-06-16 14:21:06

正如danidee（他是第一个）说的，set会做到的

试试这个：

from __future__ import print_function

import re

with open('test.txt') as f:
    data = f.read().splitlines()

emails = set(re.sub(r'^.*\s+(\w+\@[^\s]*?)\s+.*', r'\1', line) for line in data if '@' in line)

print('\n'.join(emails)) if len(emails) else print('nono')

输出：

^{pr2}$

PS你可能想做一个正确的电子邮件RegExp检查，因为我使用了非常原始的检查

网友

2楼 · 编辑于 2024-06-16 14:21:06

有些重复是由于代码中的错误导致的，在处理每一行时没有重置temp。如果不包含->或=>，并且前面有一行不包含这两个字符串中的任何一个，则该行将触发if temp:测试，如果上一行有，则输出上一行的电子邮件地址。在

当行既不包含->也不包含=>时，可以通过跳回循环的开始来修复这个问题。在

对于由于同一电子邮件地址出现在多行中而出现的其他真实重复，可以使用set过滤掉。在

import sys
import re

addresses = set()
pattern = re.compile('^\x20\w{1,}@\w{1,}\.\w{2,3}\x20?')

with open('/Users/me/Desktop/test.txt', 'r') as f:
    for line in f:
        if '->' in line:
            temp = line.split('->')
        elif '=>' in line:
            temp = line.split('=>')
        else:
            # neither '=>' nor '->' present in the line
            continue

        match = pattern.match(temp[1])
        if match is not None:
            addresses.add(match.group())
        else:
            print "nono"

for address in sorted(addresses):
    print(address)

这些地址存储在一个集合中以消除重复。然后对它们进行分类和打印。还要注意使用with语句在上下文管理器中打开文件。这保证文件将始终关闭。在

另外，由于您将多次应用相同的regex模式，因此为了提高效率，有必要提前编译它。在

使用正确编写的regex模式，您的代码可以大大简化：

^{pr2}$

网友
3楼 · 编辑于 2024-06-16 14:21:06

您可以使用set容器来保存唯一的结果，每次要打印匹配的电子邮件时，您可以检查邮件集中是否不存在，然后打印它：

import sys
import re

file = open('/Users/me/Desktop/test.txt', 'r')
temp =[]
seen = set()
for line in file.readlines():
    if '->' in line:
        temp = line.split('->')
    elif '=>' in line:
        temp = line.split('=>')

    if temp:
        #temp[1].strip()
        pattern = re.match('^\x20\w{1,}@\w{1,}\.\w{2,3}\x20?', str(temp[1]), re.M)
        if pattern is not None:
            matched =  pattern.group()
            if matched not in seen:
               print matched 

        else:
            print "nono"

相关问题更多 >

编程相关推荐

热门问题

热门文章