如何按值对多列执行有序选择

3条回答

网友

1楼 · 编辑于 2024-05-19 02:26:59

如果您在“选择2013年9月至2008年5月期间的所有行”帖子中要求查找2008年至2013年期间的行，也可以在下面进行尝试然后使用pandas.Series.between：

数据集借用自@jezrael。。你知道吗

用于演示的数据帧：

>>> stats_month_census_2
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5
5  2014   November     6
6  2014   December     7

使用pandas.Series.between()

>>> stats_month_census_2[stats_month_census_2['year'].between(2008, 2013, inclusive=True)]
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5

如果只是datetime格式的问题，您可以在下面尝试：

>>> stats_month_census_2[stats_month_census_2['year'].between('2008-05', '2013-09', inclusive=True)]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

使用DataFame.query查询：

>>> stats_month_census_2.query('"2008-05" <= year <= "2013-09"')
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

使用isin方法：选择两个日期之间的行

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05-01', '2013-09-01'))]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

或者，你也可以像下面这样通过。。你知道吗

>>> stats_month_census_2[stats_month_census_2['year'].isin(pd.date_range('2008-05', '2013-09'))]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

使用基于索引开始和结束日期的切片方法。。你知道吗

Start = stats_month_census_2[stats_month_census_2['year'] =='2008-05'].index[0]
End = stats_month_census_2[stats_month_census_2['year']=='2013-09'].index[0]

>>> stats_month_census_2.loc[Start:End]
        year      month  data
1 2008-05-01        May     3
2 2008-06-01       June     4
3 2013-09-01  September     6

注意：为了满足@jezrael在评论中提出的好奇心，我添加了如何将year列转换为datetime格式：

因为我们有下面的示例DataFrame，其中有两个不同的列year和month，其中year列只有years，month列是文本字符串格式，所以首先我们需要将字符串转换为int形式join，或者通过使用pandas pd.to_datetime方法将day指定为1将year&month添加到一起。你知道吗

df
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5
5  2014   November     6
6  2014   December     7

上面是datetime转换之前的原始数据帧，因此，我将采用下面的方法，这是我在vi-So期间学习到的。你知道吗

1-首先将month名称转换为int形式，并将其分配给一个名为Month的新列，这样我们以后就可以使用它进行转换了。你知道吗

df['Month'] = pd.to_datetime(df.month, format='%B').dt.month

第二，或者最后直接把年份列转换成一个适当的datetime格式，直接分配给year列本身，我们可以说这是一种就地的。你知道吗

df['Date'] = pd.to_datetime(df[['year', 'Month']].assign(Day=1))

Now the Desired DataFrame and year column is in datetime Form:

print(df)
        year      month  data  Month
0 2008-04-01      April     1      4
1 2008-05-01        May     3      5
2 2008-06-01       June     4      6
3 2013-09-01  September     6      9
4 2013-10-01    October     5     10
5 2014-11-01   November     6     11
6 2014-12-01   December     7     12

网友
2楼 · 编辑于 2024-05-19 02:26:59

您可以使用pd.to_datetime轻松地将这些列转换为DateTime列
>>df month year 0 January 2000 1 April 2001 2 July 2002 3 February 2010 4 February 2018 5 March 2014 6 June 2012 7 June 2011 8 May 2009 9 November 2016 >>df['date'] = pd.to_datetime(df['month'].astype(str) + '-' + df['year'].astype(str), format='%B-%Y') >>df month year date 0 January 2000 2000-01-01 1 April 2001 2001-04-01 2 July 2002 2002-07-01 3 February 2010 2010-02-01 4 February 2018 2018-02-01 5 March 2014 2014-03-01 6 June 2012 2012-06-01 7 June 2011 2011-06-01 8 May 2009 2009-05-01 9 November 2016 2016-11-01 >>df[(df.date <= "2013-09") & (df.date >= "2008-05") ] month year date 3 February 2010 2010-02-01 6 June 2012 2012-06-01 7 June 2011 2011-06-01 8 May 2009 2009-05-01

网友
3楼 · 编辑于 2024-05-19 02:26:59

您可以创建DatetimeIndex，然后按^{}选择：

stats_month_census_2 = pd.DataFrame({
    'year': [2008, 2008, 2008, 2013,2013],
    'month': ['April','May','June','September','October'],
    'data':[1,3,4,6,5]
})
print (stats_month_census_2)
   year      month  data
0  2008      April     1
1  2008        May     3
2  2008       June     4
3  2013  September     6
4  2013    October     5

s = stats_month_census_2.pop('year').astype(str) + stats_month_census_2.pop('month')
#if need year and month columns
#s = stats_month_census_2['year'].astype(str) + stats_month_census_2['month']
stats_month_census_2.index = pd.to_datetime(s, format='%Y%B')
print (stats_month_census_2)
            data
2008-04-01     1
2008-05-01     3
2008-06-01     4
2013-09-01     6
2013-10-01     5

print (stats_month_census_2['2008':'2013'])
            data
2008-04-01     1
2008-05-01     3
2008-06-01     4
2013-09-01     6
2013-10-01     5    

print (stats_month_census_2['2008-05':'2013-09'])
            data
2008-05-01     3
2008-06-01     4
2013-09-01     6

或者创建列并将^{}与^{}一起使用：

s = stats_month_census_2['year'].astype(str) + stats_month_census_2['month']
stats_month_census_2['date'] = pd.to_datetime(s, format='%Y%B')
print (stats_month_census_2)
   year      month  data       date
0  2008      April     1 2008-04-01
1  2008        May     3 2008-05-01
2  2008       June     4 2008-06-01
3  2013  September     6 2013-09-01
4  2013    October     5 2013-10-01

df = stats_month_census_2[stats_month_census_2['date'].between('2008-05', '2013-09')]
print (df)
   year      month  data       date
1  2008        May     3 2008-05-01
2  2008       June     4 2008-06-01
3  2013  September     6 2013-09-01

不幸的是，对于select between years，这种使用datetime列的方法是不可能的，因此需要使用pygo列的year解决方案：

#wrong output
df = stats_month_census_2[stats_month_census_2['date'].between('2008', '2013')]
print (df)

   year  month  data       date
0  2008  April     1 2008-04-01
1  2008    May     3 2008-05-01
2  2008   June     4 2008-06-01

相关问题更多 >

编程相关推荐

热门问题

热门文章