如何自动对数据帧的行进行重复计算？

datetime date time type ... day minute second dayofyear 1 2017-12-19 17:08:30 171219 1708 air ... 19 8 30 353 2 2018-01-05 15:22:30 180105 1522 air ... 5 22 30 5 3 2018-01-05 15:23:30 180105 1523 air ... 5 23 30 5 4 2018-01-05 15:24:30 180105 1524 air ... 5 24 30 5 5 2018-01-05 15:25:30 180105 1525 air ... 5 25 30 5

from numpy.polynomial.polynomial import polyfit from scipy import stats period = MyData[((MyData['year']==2019) & (MyData['month']==12) & (MyData['day']==31)) # defining the time period I want from the data p=(period['total_co2'])**-1 # defining the x axis data q = period['d13C'] # defining the y axis data c, m = polyfit(p,q,1) # creating a regression line, with y interecpt,c and gradient, m slope, intercept, r_value, p_value, std_err = stats.linregress(p, q) # calculating some statistical properties of the regression line. I'm mainly interested in the R^2 value print('R-squared: ', r_value**2)

1条回答

网友

1楼 · 发布于 2024-05-27 12:45:43

我已经测试了这段代码，我相信它提供了您想要的输出：

import pandas as pd
import numpy as np
from numpy.polynomial.polynomial import polyfit
from scipy import stats

# Restricted the columns and set the dtypes to deal with memory issues when importing a large csv
MyData = pd.read_csv('.../MyData.txt', usecols=['total_co2', 'd13C', 'year', 'month', 'day', 'datetime'], dtype={'total_co2':np.float64, 'd13C':np.float64, 'year':str, 'month':str, 'day':str})

# Created a helper column that is used later to filter and report out the period
MyData['ymd'] = MyData['year'] +'-'+ MyData['month'] +'-'+ MyData['day']

# Empty list that will receive all of the periods with acceptable r-squareds
accepted_date_list = []

# for loop to filter the dataframe according to the unique periods (created with the helper column above)
for d in MyData['ymd'].unique():
    acceptable_date = {} # create a dictionary to populate and send to the list
    period = MyData[MyData.ymd == d] # filter the dataframe with the unique periods created above
    p=(period['total_co2'])**-1 
    q = period['d13C'] 
    c, m = polyfit(p,q,1) 
    slope, intercept, r_value, p_value, std_err = stats.linregress(p, q)

    if r_value**2 > 0.8: # if statement provides the test. If r2 is acceptable, populate the dictionary then send the dictionary to the list
        acceptable_date['period'] = d
        acceptable_date['r-squared'] = r_value**2
        accepted_date_list.append(acceptable_date)
    else:
        pass
   
accepted_dates = pd.DataFrame(accepted_date_list) # convert the list to a Pandas DataFrame (or whatever else you want to do with it)

print(accepted_dates)

输出：

        period  r-squared
0     2018-1-6   0.910516
1     2018-1-9   0.917216
2    2018-1-10   0.980263
3    2018-1-11   0.965971
4    2018-1-12   0.894795
5    2018-1-13   0.831683
6    2018-1-18   0.852207
7    2018-1-21   0.944162
8    2018-1-22   0.871262
9    2018-1-26   0.844020
10   2018-1-27   0.890742
11   2018-1-30   0.971747
...

相关问题更多 >

编程相关推荐

热门问题

热门文章