所以,我创造了两个系列的100个元素,然后把它们放在一起。 但首先我对第一个系列进行了“排序”,这意味着索引没有对齐。 我以为会出错。或者糟糕的结果。但我得到的是第三个系列,有126个元素!真是个惊喜。知道为什么吗
请注意billy\u或\u peter输出列表中的4行“Richardson”。有4个值,两个为“真”,两个为“假”
我想可能有某种“笛卡尔积”导致了200行。但是相反,我看到了126行-这很奇怪
有什么想法
# Loc and Iloc also allow for conditional statments to filter rows of data
# using Loc on the logic test above only returns rows where the result is True
only_billys = df.loc[df["first_name"] == "Billy", :]
print(only_billys)
only_peters = df.loc[df["first_name"] == "Peter", :]
print(only_peters)
print()
only_richardsons = df.loc["Richardson", :]
print(only_richardsons)
print()
isBilly = (df["first_name"] == "Billy").sort_index()
print(isBilly.describe())
print()
isPeter = (df["first_name"] == "Peter")
print(isPeter.describe())
print()
billy_or_peter = isPeter | isBilly
print(billy_or_peter.describe())
print(billy_or_peter)
输出
(only_billys)
id first_name Phone Number Time zone
last_name
Clark 20 Billy 62-(213)345-2549 Asia/Makassar
Andrews 23 Billy 86-(859)746-5367 Asia/Chongqing
Price 59 Billy 86-(878)547-7739 Asia/Shanghai
id first_name Phone Number Time zone
(only_peters)
last_name
Richardson 1 Peter 7-(789)867-9023 Europe/Moscow
id first_name Phone Number Time zone
(only_richardsons)
last_name
Richardson 1 Peter 7-(789)867-9023 Europe/Moscow
Richardson 25 Donald 62-(259)282-5871 Asia/Jakarta
(isBilly.describe() - sorted index)
count 100
unique 2
top False
freq 97
Name: first_name, dtype: object
(isPeter.describe())
count 100
unique 2
top False
freq 99
Name: first_name, dtype: object
(billy_or_peter.describe() - 126 rows???)
count 126
unique 2
top False
freq 121
Name: first_name, dtype: object
(billy_or_peter listing - notice 4 Richardsons where before there were only 2)
last_name
Adams False
Allen False
Andrews True
Austin False
Baker False
Banks False
Bell False
Berry False
Bishop False
Black False
Brooks False
Brown False
Bryant False
Bryant False
Bryant False
Bryant False
Burke False
Butler False
Butler False
Butler False
Butler False
Carroll False
Chapman False
Chavez False
Clark True
Collins False
Cook False
Day False
Day False
Day False
...
Price True
Reid False
Reyes False
Rice False
*Richardson True
*Richardson True
*Richardson False
*Richardson False
Riley False
Roberts False
Robertson False
Robinson False
Rogers False
Scott False
Shaw False
Shaw False
Shaw False
Shaw False
Simmons False
Snyder False
Sullivan False
Torres False
Tucker False
Vasquez False
Wagner False
Walker False
Washington False
Watkins False
Wells False
Williamson False
Name: first_name, Length: 126, dtype: bool
不匹配不是这里的问题,
pandas
将在|
之前对齐。您的问题是由于索引重复造成的。为此,比较是作为匹配索引中的outer
连接进行的。因此,一个中的2个richardson和另一个中的2个richardson将导致输出中的4行为了更清楚地说明这一点,请看添加索引重复和未对齐的字符串时会发生什么。我们从笛卡尔积中得到索引1的6(2 x 3)行:
相关问题 更多 >
编程相关推荐