当你看到两个或两个不匹配的序列时会发生什么?

2024-05-23 14:02:25 发布

您现在位置:Python中文网/ 问答频道 /正文

所以,我创造了两个系列的100个元素,然后把它们放在一起。 但首先我对第一个系列进行了“排序”,这意味着索引没有对齐。 我以为会出错。或者糟糕的结果。但我得到的是第三个系列,有126个元素!真是个惊喜。知道为什么吗

请注意billy\u或\u peter输出列表中的4行“Richardson”。有4个值,两个为“真”,两个为“假”

我想可能有某种“笛卡尔积”导致了200行。但是相反,我看到了126行-这很奇怪

有什么想法

# Loc and Iloc also allow for conditional statments to filter rows of data
# using Loc on the logic test above only returns rows where the result is True
only_billys = df.loc[df["first_name"] == "Billy", :]
print(only_billys)

only_peters = df.loc[df["first_name"] == "Peter", :]
print(only_peters)
print()

only_richardsons = df.loc["Richardson", :]
print(only_richardsons)
print()

isBilly = (df["first_name"] == "Billy").sort_index()
print(isBilly.describe())
print()

isPeter = (df["first_name"] == "Peter")
print(isPeter.describe())
print()

billy_or_peter = isPeter | isBilly
print(billy_or_peter.describe())
print(billy_or_peter)

输出


(only_billys)
           id first_name      Phone Number       Time zone
last_name                                                 
Clark      20      Billy  62-(213)345-2549   Asia/Makassar
Andrews    23      Billy  86-(859)746-5367  Asia/Chongqing
Price      59      Billy  86-(878)547-7739   Asia/Shanghai
            id first_name     Phone Number      Time zone

(only_peters)
last_name                                                
Richardson   1      Peter  7-(789)867-9023  Europe/Moscow

            id first_name      Phone Number      Time zone

(only_richardsons)
last_name                                                 
Richardson   1      Peter   7-(789)867-9023  Europe/Moscow
Richardson  25     Donald  62-(259)282-5871   Asia/Jakarta

(isBilly.describe() - sorted index)
count       100
unique        2
top       False
freq         97
Name: first_name, dtype: object

(isPeter.describe())
count       100
unique        2
top       False
freq         99
Name: first_name, dtype: object

(billy_or_peter.describe() - 126 rows???)
count       126
unique        2
top       False
freq        121
Name: first_name, dtype: object

(billy_or_peter listing - notice 4 Richardsons where before there were only 2)
last_name
Adams         False
Allen         False
Andrews        True
Austin        False
Baker         False
Banks         False
Bell          False
Berry         False
Bishop        False
Black         False
Brooks        False
Brown         False
Bryant        False
Bryant        False
Bryant        False
Bryant        False
Burke         False
Butler        False
Butler        False
Butler        False
Butler        False
Carroll       False
Chapman       False
Chavez        False
Clark          True
Collins       False
Cook          False
Day           False
Day           False
Day           False
              ...  
Price          True
Reid          False
Reyes         False
Rice          False
*Richardson     True
*Richardson     True
*Richardson    False
*Richardson    False
Riley         False
Roberts       False
Robertson     False
Robinson      False
Rogers        False
Scott         False
Shaw          False
Shaw          False
Shaw          False
Shaw          False
Simmons       False
Snyder        False
Sullivan      False
Torres        False
Tucker        False
Vasquez       False
Wagner        False
Walker        False
Washington    False
Watkins       False
Wells         False
Williamson    False
Name: first_name, Length: 126, dtype: bool

Tags: ornamefalsetrueonlydfrichardsonpeter
1条回答
网友
1楼 · 发布于 2024-05-23 14:02:25

不匹配不是这里的问题,pandas将在|之前对齐。您的问题是由于索引重复造成的。为此,比较是作为匹配索引中的outer连接进行的。因此,一个中的2个richardson和另一个中的2个richardson将导致输出中的4行

为了更清楚地说明这一点,请看添加索引重复和未对齐的字符串时会发生什么。我们从笛卡尔积中得到索引1的6(2 x 3)行:

import pandas as pd

df1 = pd.DataFrame(list('abcd'), index=[1,1,2,3])
df2 = pd.DataFrame(list('1243'), index=[1,1,3,1])
df1+df2

     0
1   a1
1   a2
1   a3
1   b1
1   b2
1   b3
2  NaN
3   d4

相关问题 更多 >