如何使用pandas将数据按字母顺序分类?

2024-03-28 20:43:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,其中包含一列包含一系列字符串

books = pd.DataFrame([[1,'In Search of Lost Time'],[2,'Don Quixote'],[3,'Ulysses'],[4,'The Great Gatsby'],[5,'Moby Dick']], columns = ['Book ID', 'Title'])

   Book ID                   Title
0        1  In Search of Lost Time
1        2             Don Quixote
2        3                 Ulysses
3        4        The Great Gatsby
4        5               Moby Dick

以及一系列的边界

boundaries = ['AAAAAAA','The Great Gatsby', 'zzzzzzzz']

我想使用这些边界将数据框中的值分类为字母顺序的容器,类似于pd.cut()如何处理数字数据。我的欲望输出将如下所示。你知道吗

   Book ID                   Title                          binning
0        1  In Search of Lost Time   ['AAAAAAA','The Great Gatsby')
1        2             Don Quixote   ['AAAAAAA','The Great Gatsby')
2        3                 Ulysses  ['The Great Gatsby','zzzzzzzz')
3        4        The Great Gatsby  ['The Great Gatsby','zzzzzzzz')
4        5               Moby Dick   ['AAAAAAA','The Great Gatsby')

这可能吗?你知道吗


Tags: ofthe数据insearchtimelostdon
1条回答
网友
1楼 · 发布于 2024-03-28 20:43:13

searchsorted

boundaries = np.array(['The Great Gatsby'])
bins = np.array(['[A..The Great Gatsby)', '[The Great Gatsby..Z]'])

books.assign(binning=bins[boundaries.searchsorted(books.Title)])

   Book ID                   Title                binning
0        1  In Search of Lost Time  [A..The Great Gatsby)
1        2             Don Quixote  [A..The Great Gatsby)
2        3                 Ulysses  [The Great Gatsby..Z]
3        4        The Great Gatsby  [A..The Great Gatsby)
4        5               Moby Dick  [A..The Great Gatsby)

将此扩展到其他一些边界集:

from string import ascii_uppercase as letters
boundaries = np.array([*string.ascii_uppercase[1:-1]])
bins = np.array([f'[{a}..{b})' for a, b in zip(letters, letters[1:])])

books.assign(binning=bins[boundaries.searchsorted(books.Title)])

   Book ID                   Title binning
0        1  In Search of Lost Time  [I..J)
1        2             Don Quixote  [D..E)
2        3                 Ulysses  [U..V)
3        4        The Great Gatsby  [T..U)
4        5               Moby Dick  [M..N)

相关问题 更多 >