使用Datetimeindex选择行

2 投票

2 回答

955 浏览

提问于 2025-04-18 04:09

我正在使用Pandas这个Python库来比较两个数据框，每个数据框都有一列日期和两列数值。其中一个数据框叫做 LongDF，它的日期比另一个数据框 ShortDF 多。两个数据框都是通过日期来索引的，使用的是 pandas.tseries.index.DatetimeIndex。下面是它们的简化示例。

LongDF

╔════════════╦════════╦════════╗
║ Date       ║ Value1 ║ Value2 ║
╠════════════╬════════╬════════╣
║ 1990-03-17 ║ 6.84   ║ 1.77   ║
║ 1990-03-18 ║ 0.99   ║ 7.00   ║
║ 1990-03-19 ║ 4.90   ║ 8.48   ║
║ 1990-03-20 ║ 2.57   ║ 2.41   ║
║ 1990-03-21 ║ 4.10   ║ 8.33   ║
║ 1990-03-22 ║ 8.86   ║ 1.31   ║
║ 1990-03-23 ║ 6.01   ║ 6.22   ║
║ 1990-03-24 ║ 0.74   ║ 1.69   ║
║ 1990-03-25 ║ 5.56   ║ 7.30   ║
║ 1990-03-26 ║ 8.05   ║ 1.67   ║
║ 1990-03-27 ║ 8.87   ║ 8.22   ║
║ 1990-03-28 ║ 9.00   ║ 6.83   ║
║ 1990-03-29 ║ 1.34   ║ 6.00   ║
║ 1990-03-30 ║ 1.69   ║ 0.40   ║
║ 1990-03-31 ║ 8.71   ║ 3.26   ║
║ 1990-04-01 ║ 4.05   ║ 4.53   ║
║ 1990-04-02 ║ 9.75   ║ 4.79   ║
║ 1990-04-03 ║ 7.74   ║ 0.44   ║
╚════════════╩════════╩════════╝

ShortDF

╔════════════╦════════╦════════╗
║ Date       ║ Value1 ║ Value2 ║
╠════════════╬════════╬════════╣
║ 1990-03-25 ║ 1.98   ║ 3.92   ║
║ 1990-03-26 ║ 3.37   ║ 3.40   ║
║ 1990-03-27 ║ 2.93   ║ 7.93   ║
║ 1990-03-28 ║ 2.35   ║ 5.34   ║
║ 1990-03-29 ║ 1.41   ║ 7.62   ║
║ 1990-03-30 ║ 9.85   ║ 3.17   ║
║ 1990-03-31 ║ 9.95   ║ 0.35   ║
║ 1990-04-01 ║ 4.42   ║ 7.11   ║
║ 1990-04-02 ║ 1.33   ║ 6.47   ║
║ 1990-04-03 ║ 6.63   ║ 1.78   ║
╚════════════╩════════╩════════╝

我想做的事情是引用每个数据集中同一天的数据，把来自 两个数据集 的数据放到一个公式里，如果结果大于某个数字，就把日期和数值放到另一个数据框里。

我想我应该使用类似 for row in ShortDF.iterrows(): 的方式来遍历 ShortDF 中的每个日期，但我不知道怎么用 DatetimeIndex 来选择 LongDF 中对应的行。

任何帮助都将不胜感激。

条件筛选数据处理 pandas 数据框时间序列数据比较行选择日期索引

2 个回答

好吧，有时候我喜欢把 pandas 的 DataFrame 想象成简单的字典。这是因为字典的使用非常简单，把它们当作普通的字典来思考，往往能让你找到解决问题的方法，而不需要深入了解 pandas 的复杂内容。

在你的例子中，我会先创建一个常见日期的列表，前提是 DataFrame 中的值满足某个条件，然后用这些日期来创建一个新的数据框，以便访问现有数据框中的值。在我的例子里，这个条件是 DF1 中的值 1 加上 DF2 中的值 2 是否大于 10：

import pandas as pd
import random 
random.seed(123)

#Create some data
DF1 = pd.DataFrame({'Date'      :   ['1990-03-17', '1990-03-18', '1990-03-19', 
                                     '1990-03-20', '1990-03-21', '1990-03-22', 
                                     '1990-03-23', '1990-03-24', '1990-03-25', 
                                     '1990-03-26', '1990-03-27', '1990-03-28',
                                     '1990-03-29', '1990-03-30', '1990-03-31', 
                                     '1990-04-01', '1990-04-02', '1990-04-03'],
                    'Value1'    :   [round(random.uniform(1, 10), 2) 
                                     for x in xrange(18)],
                    'Value2'    :   [round(random.uniform(1, 10), 2) 
                                     for x in xrange(18)]
                   })

DF2 = pd.DataFrame({'Date'      :   ['1990-03-25', '1990-03-26', '1990-03-27', 
                                     '1990-03-28', '1990-03-29', '1990-03-30', 
                                     '1990-03-31', '1990-04-01', '1990-04-02',  
                                     '1990-04-03'],
                    'Value1'    :   [round(random.uniform(1, 10), 2) 
                                     for x in xrange(10)],
                    'Value2'    :   [round(random.uniform(1, 10), 2) 
                                     for x in xrange(10)]
                   })

DF1.set_index('Date', inplace = True)
DF2.set_index('Date', inplace = True)

#Create a list of common dates, where the values of DF1.Value1  summed 
#with DF.Value2 is greater than 10
Common_Set = list(DF1.index.intersection(DF2.index))
Common_Dates =  [date for date in Common_Set if 
             DF1.Value1[date] + DF2.Value1[date] > 10]

#And now create the data frame I think you want using the Common_Dates

DF_Output = pd.DataFrame({'L_Value1' : [DF1.Value1[date] for date in Common_Dates],
                          'L_Value2' : [DF1.Value2[date] for date in Common_Dates],
                          'S_Value1' : [DF2.Value1[date] for date in Common_Dates],
                          'S_Value2' : [DF2.Value2[date] for date in Common_Dates]
                         }, index = Common_Dates)

在 pandas 中，这绝对是可以做到的，正如评论所提到的，但对我来说，这只是一个简单的解决方案。Common_Dates 的操作其实可以用一行代码完成，但为了让大家更清楚，我没有这样做。

当然，如果你在两个数据框中有很多列，写出 DF_Output 数据框的构造函数可能会非常麻烦。如果是这样的话，你可以这样做：

DF1_Out = {'L' + col : [DF1[col][date] for date in Common_Dates] 
            for col in DF1.columns}
DF2_Out = {'S' + col : [DF2[col][date] for date in Common_Dates] 
            for col in DF2.columns}

DF_Out = {}
DF_Out.update(DF1_Out)
DF_Out.update(DF2_Out)

DF_Output2 = pd.DataFrame(DF_Out, index = Common_Dates)

这两种方法都能给我这个结果：

            LValue1  LValue2  SValue1  SValue2
1990-03-25     8.67     6.16     3.84     4.37
1990-03-27     4.03     8.54     7.92     7.79
1990-03-29     3.21     4.09     7.16     8.38
1990-03-31     4.93     2.86     7.00     6.92
1990-04-01     1.79     6.48     9.01     2.53
1990-04-02     6.38     5.74     5.38     4.03

我想这可能无法满足很多人的需求，但这就是我处理这个问题的方法。顺便说一句，如果你能在后续的问题中做一些关于创建数据框的基础工作，那就太好了。

回答于 2025-04-18 由 Python大师

分享举报

好的，我现在清醒了，利用你的数据你可以这样做：

In [425]:
# key here is to tell the merge to use both sides indices
merged = df1.merge(df2,left_index=True, right_index=True)
# the resultant merged dataframe will have duplicate columns, this is fine
merged
Out[425]:
            Value1_x  Value2_x  Value1_y  Value2_y
Date                                              
1990-03-25      5.56      7.30      1.98      3.92
1990-03-26      8.05      1.67      3.37      3.40
1990-03-27      8.87      8.22      2.93      7.93
1990-03-28      9.00      6.83      2.35      5.34
1990-03-29      1.34      6.00      1.41      7.62
1990-03-30      1.69      0.40      9.85      3.17
1990-03-31      8.71      3.26      9.95      0.35
1990-04-01      4.05      4.53      4.42      7.11
1990-04-02      9.75      4.79      1.33      6.47
1990-04-03      7.74      0.44      6.63      1.78

[10 rows x 4 columns]
In [432]:
# now using boolean indexing we want just the rows where there are values larger than 9 and then select the highest value
merged[merged.max(axis=1) > 9].max(axis=1)
Out[432]:
Date
1990-03-30    9.85
1990-03-31    9.95
1990-04-02    9.75
dtype: float64

回答于 2025-04-18 由 Python大师

分享举报

使用Datetimeindex选择行

2 个回答

撰写回答