如何在文本中使用Python查找字符偏移量

1 投票

1 回答

3594 浏览

提问于 2025-04-17 20:31

我的目标是找出两个对齐的文本文件中匹配的字符串，然后找到每个文件中匹配字符串的起始字符位置。

doc1=['the boy is sleeping', 'in the class', 'not at home']
doc2=['the girl is reading', 'in the class', 'a serious student']

我尝试过的方法：

# find matching string(s) that exist in both document list:
matchstring=[x for x in doc1 if x in doc2]
Output=matchstring='in the class'

现在的问题是要找到在doc1和doc2中匹配字符串的字符位置（不包括标点符号，空格算在内）。

理想的结果：

Position of starting character for matching string in doc1=20
Position of starting character for matching string in doc2=20

关于文本对齐，有什么想法吗？谢谢。

文本处理字符串匹配文件比较文本对齐字符偏移

1 个回答

嘿，兄弟，试试这个：

doc1=['the boy is sleeping', 'in the class', 'not at home']
doc2=['the girl is reading', 'in the class', 'a serious student']

temp=''.join(list(set(doc1) & set(doc2)))
resultDoc1 = ''.join(doc1).find(temp)
resultDoc2 = ''.join(doc2).find(temp)

print "Position of starting character for matching string in doc1=%d" % (resultDoc1 + 1)
print "Position of starting character for matching string in doc2=%d" % (resultDoc2 + 1)

这正好符合你的期望，效果很好！

回答于 2025-04-17 由 Python大师

分享举报

如何在文本中使用Python查找字符偏移量

1 个回答

撰写回答