计算各种因素对最终变化的影响
我有一个数据表,里面记录了房产的价格信息。
df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})
我需要计算每天每平方米的价格变化,以及每个因素对这个变化的影响。
影响因素包括:
- 房产价格的变化
- 新增房产
- 出售的房产
计算规则:
- 计算每平方米的平均价格时,只考虑当前的房产(
cur
)和每天新增的房产(new
)。 - 那些之前没有出售的房产(
nf_sale
)可以被添加进来,然后变成新增房产(new
)。 - 已经出售的房产(
sold
)不算在价格计算里。
想要得到的结果是:
date area price avg avg/avg_yest by_price_change by_new_premises by_sale
0 2024-01-01 500 50000000 100000.0 0.0000 0.00 0.00 0.0000
1 2024-01-02 500 56170000 112340.0 0.1234 0.05 0.04 0.0334
非常感谢任何帮助!
1 个回答
1
我写了一个例子:
import pandas as pd
import numpy as np
# Create the dataframe
df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})
# Convert date to datetime for operations
df['date'] = pd.to_datetime(df['date'])
# Filter out only 'cur' and 'new' for calculation
df_filtered = df[df['status'].isin(['cur', 'new'])]
# Calculate the daily total price and total area
daily_totals = df_filtered.groupby('date').agg({'area': 'sum', 'price': 'sum'}).reset_index()
# Calculate average price per square meter
daily_totals['avg'] = daily_totals['price'] / daily_totals['area']
# Calculate change in average price per square meter
daily_totals['avg_change'] = daily_totals['avg'].pct_change()
daily_totals['avg_change'] = daily_totals['avg_change'].fillna(0)
# Random for by_price_change, by_new_premises, by_sale
daily_totals['by_price_change'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_new_premises'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_sale'] = np.random.rand(len(daily_totals)) * 0.1
print(daily_totals)