"Python中的difflib.get_close_matches()函数如何运行？"

import difflib import scipy import numpy a1=numpy.array(['198.129.254.73','134.55.221.58','134.55.219.121','134.55.41.41','198.124.252.101'], dtype='|S15') b1=numpy.array(['198.124.252.102','134.55.41.41','134.55.219.121','134.55.219.137','134.55.220.45', '198.124.252.130'],dtype='|S15') difflib.get_close_matches(a1[-1],b1,2)

2条回答

网友

1楼 · 编辑于 2024-06-08 15:53:17

嗯，在docs中有一部分解释了您的问题：

This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.

为了得到您期望的结果，可以使用Levenshtein_distance。

但是为了比较IPs，我建议使用整数比较：

>>> parts = [int(s) for s in '198.124.252.130'.split('.')]
>>> parts2 = [int(s) for s in '198.124.252.101'.split('.')]
>>> from operator import sub
>>> diff = sum(d * 10**(3-pos) for pos,d in enumerate(map(sub, parts, parts2)))
>>> diff
29

可以使用此样式创建比较函数：

from functools import partial
from operator import sub

def compare_ips(base, ip1, ip2):
    base = [int(s) for s in base.split('.')]
    parts1 = (int(s) for s in ip1.split('.'))
    parts2 = (int(s) for s in ip2.split('.'))
    test1 = sum(abs(d * 10**(3-pos)) for pos,d in enumerate(map(sub, base, parts1)))
    test2 = sum(abs(d * 10**(3-pos)) for pos,d in enumerate(map(sub, base, parts2)))
    return cmp(test1, test2)

base = '198.124.252.101'
test_list = ['198.124.252.102','134.55.41.41','134.55.219.121',
             '134.55.219.137','134.55.220.45', '198.124.252.130']
sorted(test_list, cmp=partial(compare_ips, base))
# yields:
# ['198.124.252.102', '198.124.252.130', '134.55.219.121', '134.55.219.137', 
#  '134.55.220.45', '134.55.41.41']

网友

2楼 · 编辑于 2024-06-08 15:53:17

difflib的一些提示：

SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching". The basic idea is to find the longest contiguous matching subsequence that contains no "junk" elements (R-O doesn't address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people.

关于基于自定义逻辑比较IP的需求。您应该首先验证字符串是否是正确的ip。然后，使用简单的整数算法编写比较逻辑应该是一个容易的任务，以满足您的要求。根本不需要图书馆。

相关问题更多 >

编程相关推荐

热门问题

热门文章