计算各种因素对最终变化的影响

0 投票
1 回答
60 浏览
提问于 2025-04-14 16:00

我有一个数据表,里面记录了房产的价格信息。

df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
                   'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',  '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
                   'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
                   'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})

我需要计算每天每平方米的价格变化,以及每个因素对这个变化的影响。

影响因素包括:

  1. 房产价格的变化
  2. 新增房产
  3. 出售的房产

计算规则:

  • 计算每平方米的平均价格时,只考虑当前的房产(cur)和每天新增的房产(new)。
  • 那些之前没有出售的房产(nf_sale)可以被添加进来,然后变成新增房产(new)。
  • 已经出售的房产(sold)不算在价格计算里。

想要得到的结果是:

     date      area price       avg      avg/avg_yest  by_price_change by_new_premises  by_sale
0   2024-01-01  500 50000000    100000.0    0.0000         0.00              0.00     0.0000
1   2024-01-02  500 56170000    112340.0    0.1234         0.05          0.04     0.0334

非常感谢任何帮助!

1 个回答

1

我写了一个例子:

import pandas as pd
import numpy as np

# Create the dataframe
df = pd.DataFrame({'num': [1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7],
                   'date': ['2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01', '2024-01-01',  '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02', '2024-01-02'],
                   'area': [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
                   'price': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 11080000, 11090000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'price_yest': [10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000, 10000000, 10000000, 10000000, 10000000, 10000000, 12000000, 12000000],
                   'status': ['cur', 'cur', 'cur', 'cur', 'cur', 'nf_sale', 'nf_sale', 'cur', 'cur', 'cur', 'sold', 'sold', 'new', 'new']})

# Convert date to datetime for operations
df['date'] = pd.to_datetime(df['date'])

# Filter out only 'cur' and 'new' for calculation
df_filtered = df[df['status'].isin(['cur', 'new'])]

# Calculate the daily total price and total area
daily_totals = df_filtered.groupby('date').agg({'area': 'sum', 'price': 'sum'}).reset_index()

# Calculate average price per square meter
daily_totals['avg'] = daily_totals['price'] / daily_totals['area']

# Calculate change in average price per square meter
daily_totals['avg_change'] = daily_totals['avg'].pct_change()
daily_totals['avg_change'] = daily_totals['avg_change'].fillna(0)

# Random for by_price_change, by_new_premises, by_sale
daily_totals['by_price_change'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_new_premises'] = np.random.rand(len(daily_totals)) * 0.1
daily_totals['by_sale'] = np.random.rand(len(daily_totals)) * 0.1

print(daily_totals)

撰写回答