如何在整个数据帧中用不同的长字符串替换较短的字符串？

import pandas as pd from StringIO import StringIO replacement_dict = { "substring1": "substring1", "substring2": "substring2", "a short substring": "substring3", } exampledata = StringIO("""id;Long String 1;This is a long substring1 of text that has lots of words 2;This is substring2 and also contains more text than needed 3;This is a long substring1 of text that has lots of words 4;This is substring2 and also contains more text than needed 5;This is substring2 and also contains more text than needed 6;This is substring2 and also contains more text than needed 7;Within this string is a short substring that is unique 8;This is a long substring1 of text that has lots of words 9;Within this string is a short substring that is unique 10;Within this string is a short substring that is unique """) df = pd.read_csv(exampledata, sep=";") print df for s in replacement_dict.keys(): if df['Long String'].str.contains(s): df['Long String'] = replacement_dict[df['Long String'].str.contains(s)]

Traceback (most recent call last): File "test.py", line 27, in <module> if df['Long String'].str.contains(s): File "h:\Anaconda\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__.format(self.__class__.__name__)) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

1条回答

网友

1楼 · 发布于 2024-06-16 17:40:33

你可以用^{}做这类事情。然而，你必须稍微修改一下你的字典才能得到你想要的结果。你知道吗

replacement_dict = {
    ".*substring1.*": "substring1",
    ".*substring2.*": "substring2",
    ".*a short substring.*": "substring3",
}

我所做的使键成为正则表达式字符串。它将匹配要匹配的子字符串之前和之后的所有内容。这件事很重要。你知道吗

接下来，用以下内容替换整个for循环：

df['Long String'] = df['Long String'].replace(replacement_dict, regex=True)

.replace()可以使用字典，其中键是要匹配的字符串，值是替换文本。之所以更改键来捕获子字符串前后的所有内容，是因为我们现在可以替换整个值，而不仅仅是一个小的匹配字符串。你知道吗

例如，没有.*部分的字典将转换为如下数据帧：

   id                                        Long String
0   1  This is a long substring1 of text that has lot...
1   2  This is substring2 and also contains more text...
2   3  This is a long substring1 of text that has lot...
3   4  This is substring2 and also contains more text...
4   5  This is substring2 and also contains more text...
5   6  This is substring2 and also contains more text...
6   7    Within this string is substring3 that is unique
7   8  This is a long substring1 of text that has lot...
8   9    Within this string is substring3 that is unique
9  10    Within this string is substring3 that is unique

请注意，您真正看到的唯一更改是使用“short substring”值，因为您实际上只是用自身替换“substring1”和“substring2”。你知道吗

现在，如果我们重新添加regex通配符，我们会得到：

   id Long String
0   1  substring1
1   2  substring2
2   3  substring1
3   4  substring2
4   5  substring2
5   6  substring2
6   7  substring3
7   8  substring1
8   9  substring3
9  10  substring3

相关问题更多 >

编程相关推荐

热门问题

热门文章