遍历文本并查找预定义子字符串之间的距离

2024-04-29 05:57:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我决定,我想采取一个文本,并找出如何接近一些标签在文本中。基本上,这个想法是检查两个人之间的距离是否少于14个单词,如果他们是我们说他们是相关的

我天真的实现是有效的,但只有当人是一个词,因为我迭代的话

text = """At this moment Robert  who rises at seven and works before 
       breakfast   came in  He glanced at his wife  her cheek was 
       slightly flushed  he  patted it caressingly      What s the 
       matter  my dear   he asked      She objects to my doing nothing 
       and having red hair   said I  in an  injured tone      Oh  of 
       course he can t help his hair   admitted Rose      It generally 
       crops out once in a generation   said my brother   So does  the 
       nose  Rudolf has got them both I must premise that I am going  
       perforce  to rake up the  very scandal which my dear Lady 
       Burlesdon wishes forgotten--in the year  1733  George II  
       sitting then on the throne  peace reigning for  the moment  and 
       the King and the Prince of Wales being not yet at  loggerheads  
       there came on a visit to the English Court a certain  prince  
      who was afterwards known to history as Rudolf the Third of Ruritania"""
involved = ['Robert', 'Rose', 'Rudolf the Third', 
            'a Knight of the Garter', 'James', 'Lady Burlesdon']

# my naive implementation
ws = text.split()
l = len(ws)
    for wi,w in enumerate(ws):
        # Skip if the word is not a person
        if w not in involved:
            continue
        # Check next x words for any involved person
        x = 14
        for i in range(wi+1,wi+x):
            # Avoid list index error
            if i >= l:
                break
            # Skip if the word is not a person
            if ws[i] not in involved:
                continue
            # Print related
            print(ws[wi],ws[i])

现在我想升级这个脚本,以允许多字的名字,如'夫人伯莱斯顿'。我不完全确定什么是最好的方法。欢迎任何提示


Tags: andofthetoinforifws
1条回答
网友
1楼 · 发布于 2024-04-29 05:57:18

您可以首先对文本进行预处理,以便用单个单词id替换text中的所有名称。id必须是您不希望在文本中显示为其他单词的字符串。在对文本进行预处理时,可以保留id到名称的映射,以知道哪个名称对应哪个id。这将允许保持当前算法的原样

相关问题 更多 >