获取字符串中非唯一单词的边界索引

2条回答

网友

1楼 · 编辑于 2024-04-16 15:25:08

Clojure/Java更典型的做法是使用起始字符的索引和结束字符后的索引，因此使用[0, 5]和[13, 18]。Java的Matcher将以这种方式返回每个匹配的开始和结束。你知道吗

(def strg "apple orange apple")

(defn re-indices [re s] 
  (let [m (re-matcher re s)] 
    ((fn step [] 
       (when (. m find) 
         (cons [(. m start) (. m end)] (lazy-seq (step))))))))

(re-indices #"\S+" strg)
;=> ([0 5] [6 12] [13 18])

而且subs会适当地使用它们

(->> (re-indices #"\S+" strg)
     (group-by (partial apply subs strg)))
;=> {"apple" [[0 5] [13 18]], "orange" [[6 12]]}

从这里您可以过滤出只有那些子串键与一个以上的索引对。你知道吗

网友

2楼 · 编辑于 2024-04-16 15:25:08

In [9]: import re

In [13]: def find_ind(word, s):
             return [(w.start(), w.end() - 1) for w in re.finditer(word, s) if s.count(word) > 1]

In [14]: find_ind("apple",s)
        [(0, 4), (13, 17)]

In [15]: find_ind("orange",s)
        []

使用python和re.finditer

返回一个迭代器，在字符串中的RE模式的所有非重叠匹配上生成MatchObject实例

相关问题更多 >

编程相关推荐

热门问题

热门文章

获取字符串中非唯一单词的边界索引

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >