如何限制Python中的列重复

Name, City, State Foo, L.A., CA Bar, L.A., CA Sam, L.A., CA Tricia, Kent, WA Bob, Kent, WA Ida, Boo, PA Monster Mash, Whack, PA Zoomacroom, L.A., CA Otter Pop, Boo, PA Snake, HP, WA Ronnie the Bear, Boo, PA

3条回答

网友

1楼 · 编辑于 2024-05-28 19:14:14

编辑：我思考OP在发布的前5分钟内更改了所需的数据帧。这个答案描述了如何删除所有列中的重复（不仅仅是对于这个城市/州的具体例子，在那里这没有太多意义）。

您可以对单个列执行此操作（删除出现3次以上的城市名称）：

In [11]: g = df.groupby('City')

In [12]: g.filter(lambda x: len(x['City']) < 4)
Out[12]: 
               Name   City State
5               Ida    Boo    PA
8         Otter Pop    Boo    PA
10  Ronnie the Bear    Boo    PA
9             Snake     HP    WA
3            Tricia   Kent    WA
4               Bob   Kent    WA
6      Monster Mash  Whack    PA

在所有列中都要这样做（这有点混乱！但是，您可以为任意帧创建一个函数来执行此操作……：

In [13]: less_than_4 = ((df.groupby('City').City.transform(lambda x: len(x) < 4))
                      & (df.groupby('State').State.transform(lambda x: len(x) < 4))
                      & ((df.groupby('Name').Name.transform(lambda x: len(x) < 4))))

In [14]: df[less_than_4]
Out[14]: 
     Name  City State
3  Tricia  Kent    WA
4     Bob  Kent    WA
9   Snake    HP    WA

更优雅一点：

from operator import and_
df[reduce(and_, (df.groupby(col)[col].transform(lambda x: len(x) < 4)
                      for col in df.columns))]

网友

2楼 · 编辑于 2024-05-28 19:14:14

比如说：

>>> small_cities = df.groupby(["City", "State"]).filter(lambda x: x.count() < 3)
>>> small_cities
           Name   City State
3        Tricia   Kent    WA
4           Bob   Kent    WA
6  Monster Mash  Whack    PA
9         Snake     HP    WA

[4 rows x 3 columns]

网友

3楼 · 编辑于 2024-05-28 19:14:14

大致如下：

with open(filename) as f:
    content = f.readlines()
for line in set(content):
    if content.count([-2:]) < 4:
        output.append(line[-2:])

希望这有帮助

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何限制Python中的列重复

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >