合并Pandas中列的行中的字符串

2024-04-19 12:51:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在一个名为df的数据帧中合并字符串。具体如下:

s=['vic','tory','ban','ana']
df=pd.DataFrame(s,columns=['Tokens'])

请注意,我只打算把它用于另一种语言,而不是英语。在

我想做的是合并df列中的行并检查字典中合并的单词,如果这个单词存在,那么它将被保存到另一个数据集中,并且df中的部分单词也将被删除。例如,我将df[0]和df[1]组合在一起,它变成“victory”,然后在字典中查找它并找到它。然后'维克'和'托利'将从东风删除。我该怎么做才能解决这个问题?感谢任何帮助。在


Tags: columns数据字符串语言dataframedf字典单词
1条回答
网友
1楼 · 发布于 2024-04-19 12:51:29

如果您有一个字符串列表,并希望检查连续字符串的组合是否构成一个单词,那么您可以迭代字符串并检查可能的组合。为此,您可以使用内置的python表示:

LIMIT = 3  # max amount of strings to combine


def process_strings(strings, words):

    ans = list()

    stop = len(strings)
    current = 0
    # iterate over strings
    while current < stop:
        word = ''
        counter = 0
        # iterate over LIMIT strings starting from current string
        while True:
            # check boundary conditions
            if counter >= LIMIT or current + counter >= stop:
                current += 1
                break
            word += strings[current + counter]
            # word found among words
            if word in words:
                current += 1 + counter
                ans.append(word)
                # print('found word: {}'.format(word))
                break
            # word not found
            else:
                counter += 1

    return ans


words = {'victory', 'banana', 'python'}
strings = [
    'vic', 'tory',
    'mo', 'th', 'er',
    'ban', 'ana',
    'pyt', 'on',
    'vict', 'ory',
    'pyt', 'hon',
    'vi', 'ct', 'or', 'y',
    'ba', 'na', 'na']

words_found = process_strings(strings, words)
print('found words:\n{}'.format(words_found))

输出:

^{pr2}$

编辑

修改后的版本1)任意数量的字符串组合,2)像words = {'victory', 'victor'}strings = ['vi', 'ct', 'or', 'y']-这两个词都将被找到:

def process_strings(strings, words):

    MAXLEN = max(map(len, words))

    ans = list()

    stop = len(strings)
    current = 0
    # iterate over strings
    while current < stop:
        word = ''
        counter = 0
        # iterate over some amount of strings starting from current string
        while True:
            # check boundary conditions
            if len(word) > MAXLEN or current + counter >= stop:
                current += 1
                break
            word += strings[current + counter]
            # word found among words
            if word in words:
                ans.append(word)
            # there is no case `word not found`, exit only by boundary condition (length of the combined substrings)
            counter += 1

    return ans

相关问题 更多 >