字符串列表到字符串的对齐索引问题的回答

字符串列表到字符串的对齐索引

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我需要一个函数来给出一个字符串列表与一个更大的字符串最匹配的索引。在 例如： 给定字符串： <pre><code>text = 'Kir4.3 is a inwardly-rectifying potassium channel. Dextran-sulfate is useful in glucose-mediated channels.' </code></pre> 以及字符串列表： ^{pr2}$ 是否可以创建函数来生成： <pre><code>indices = [7, 10, 12, 32, 42, 49, 51, 67, 70, 77, 80, 87, 88, 97, 105] </code></pre> <hr/> <hr/> 下面是我创建的一个脚本来说明这一点： <pre><code>from re import split from numpy import vstack, zeros import numpy as np # I need a function which takes a string and the tokenized list # and returns the indices for which the tokens were split at def index_of_split(text_str, list_of_strings): #????? return indices # The text string, string token list, and character binary annotations # are all given text = 'Kir4.3 is a inwardly-rectifying potassium channel. Dextran-sulfate is useful in glucose-mediated channels.' tok = ['Kir4.3', 'is', 'a', 'inwardly-rectifying', 'potassium', 'channel','.', 'Dextran-sulfate', 'is', 'useful' ,'in', 'glucose','-', 'mediated', 'channels','.'] # (This binary array labels the following terms ['Kir4.3', 'Dextran-sulfate', 'glucose']) bin_ann = [1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] # Here we would apply our function indices = index_of_split(text, tok) # This list is the desired output #indices = [7, 10, 12, 32, 42, 49, 51, 67, 70, 77, 80, 87, 88, 97, 105] # We could now split the binary array based on these indices bin_ann_toked = np.split(bin_ann, indices) # and combine with the tokenized list tokenized_strings = np.vstack((tok, bin_ann_toked)).T # Then we can remove the trailing zeros, # which are likely caused from spaces, # or other non tokenized text for i, el in enumerate(tokenized_strings): tokenized_strings[i][1] = el[1][:len(el[0])] print(tokenized_strings) </code></pre> 如果函数按所述工作，则该将提供以下输出： <pre><code>[['Kir4.3' array([1, 1, 1, 1, 1, 1])] ['is' array([0, 0])] ['a' array([0])] ['inwardly-rectifying' array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])] ['potassium' array([0, 0, 0, 0, 0, 0, 0, 0, 0])] ['channel' array([0, 0, 0, 0, 0, 0, 0])] ['.' array([0])] ['Dextran-sulfate' array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])] ['is' array([0, 0])] ['useful' array([0, 0, 0, 0, 0, 0])] ['in' array([0, 0])] ['glucose' array([1, 1, 1, 1, 1, 1, 1])] ['-' array([0])] ['mediated' array([0, 0, 0, 0, 0, 0, 0, 0])] ['channels' array([0, 0, 0, 0, 0, 0, 0, 0])] ['.' array([0])]] </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

字符串列表到字符串的对齐索引

1 个回答

相关Python问题