查找特定类别的移动时间段的数据帧长度

Name Date_1 Date_2 0 Alan 2013-06-21 2013-06-26 1 Bob 2011-01-29 2011-02-01 2 Chris 2010-11-15 2010-11-17 3 Bob 2016-03-14 2016-03-16 4 Doug 2011-03-07 2011-03-10 5 Elijah 2011-02-24 2011-03-01 6 Bob 2011-01-03 2011-01-13 7 Bob 2011-02-07 2011-02-25 8 Frank 2014-07-21 2014-07-23 9 Chris 2011-02-18 2011-02-22 10 Doug 2010-09-13 2010-09-17 11 Chris 2011-01-15 2011-01-19 12 George 2010-06-29 2010-06-30

Date Name Period 12971 2015-08-18 Alan 2015-08-16 12972 2015-08-19 Alan 2015-08-17 12973 2015-08-20 Alan 2015-08-18 12974 2015-08-21 Alan 2015-08-19 12975 2015-08-22 Alan 2015-08-20 12976 2015-08-23 Alan 2015-08-21 12977 2015-08-24 Alan 2015-08-22 12978 2015-08-25 Alan 2015-08-23 12979 2015-08-26 Alan 2015-08-24 12980 2015-08-27 Alan 2015-08-25 12981 2015-08-28 Alan 2015-08-26 12982 2015-08-29 Alan 2015-08-27

1条回答

网友

1楼 · 发布于 2024-06-06 06:38:15

我不得不稍微更改您提供的表，因为您当前的版本没有任何重叠。我假设您要执行以下操作：

模式很简单，首先对两个表进行外部联接，然后对结果进行透视，最后对感兴趣的数据帧进行左联接。这应该比你的方法快，但是，它可能更占用内存

解析数据

第一件事就是对您提供的数据执行一些基本的解析，并将其放入dataframe中（您可能可以跳过这个，我提供这个是为了跟踪）：

import pandas as pd
from io import StringIO
import re

# First set tables
table = """    Name    Date_1  Date_2
0   Alan    2013-06-21  2013-06-26
1   Bob     2011-01-29  2011-02-01
2   Chris   2010-11-15  2010-11-17
3   Bob     2016-03-14  2016-03-16
4   Doug    2011-03-07  2011-03-10
5   Elijah  2011-02-24  2011-03-01
6   Bob     2011-01-03  2011-01-13
7   Bob     2011-02-07  2011-02-25
8   Frank   2014-07-21  2014-07-23
9   Chris   2011-02-18  2011-02-22
10  Doug    2010-09-13  2010-09-17
11  Chris   2011-01-15  2011-01-19
12  George  2010-06-29  2010-06-30"""

table2 = """        Date        Name    Period
12971   2015-08-18  Alan    2015-08-16
12972   2015-08-19  Alan    2015-08-17
12973   2015-08-20  Alan    2015-08-18
12974   2015-08-21  Alan    2015-08-19
12975   2015-08-22  Alan    2015-08-20
12976   2015-08-23  Alan    2015-08-21
12977   2015-08-24  Alan    2015-08-22
12978   2015-08-25  Alan    2015-08-23
12979   2015-08-26  Alan    2015-08-24
12980   2015-08-27  Alan    2015-08-25
12981   2015-08-28  Alan    2015-08-26
12982   2015-08-29  Alan    2015-08-27
12983   2013-06-24  Alan    2013-06-25"""

# Prepare tables in format that makes date lookups easier
series = pd.read_csv(StringIO(table))['    Name    Date_1  Date_2'].apply(lambda x: ["".join(re.findall("[A-Za-z0-9-]",i)) for i in x.split(" ") if re.findall("[A-Za-z0-9-]",i) != []])
df = pd.DataFrame(series.values.tolist(), columns = ["index", "Name", "Date_1","Date_2"])
df["Date_1"] = pd.to_datetime(df["Date_1"])
df["Date_2"] = pd.to_datetime(df["Date_2"])

series = pd.read_csv(StringIO(table2))['        Date        Name    Period'].apply(lambda x: ["".join(re.findall("[A-Za-z0-9-]",i)) for i in x.split(" ") if re.findall("[A-Za-z0-9-]",i) != []])
df1 = pd.DataFrame(series.values.tolist(), columns = ["index", "Date", "Name","Period"])
df1["Date"] = pd.to_datetime(df1["Date"])
df1["Period"] = pd.to_datetime(df1["Period"])

外部连接

外联接在Name列上很简单：

outer = pd.merge(df1,df, on="Name",how="outer")

在df1上合并（假定需求）

只需首先通过Name、Date和Period对数据进行透视，然后对这些数据进行计数。然后reset_index并与原始表合并，当找不到查找时，我假定0值

# Pivot table
pivot = outer[(outer["Date_1"] < outer["Period"]) & (outer["Date_2"] > outer["Date"])].pivot_table(index=["Name","Date","Period"],
                                                                                    values= ["Date_1"],
                                                                                    aggfunc="count").reset_index()
# Rename columns for merging
pivot.columns = [["Name","Date","Period","Volume"]]

pd.merge(df1,pivot, how = "left", on=["Name","Date","Period"]).fillna(0)

在df上合并（为了完整性）

# Pivot table
pivot = outer[(outer["Date_1"] < outer["Period"]) & (outer["Date_2"] > outer["Date"])].pivot_table(index=["Name"],
                                                                                    values= ["Date"],
                                                                                    aggfunc="count").reset_index()
# Rename columns for merging
pivot.columns = [["Name","Volume"]]

pd.merge(df,pivot, how = "left", on="Name").fillna(0)

解析数据

外部连接

在df1上合并（假定需求）

在df上合并（为了完整性）

相关问题更多 >

编程相关推荐

热门问题

热门文章