将csv读取到数组,对数组执行线性回归,并根据梯度在Python中写入csv

2024-04-23 18:01:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我必须解决一个远远超过我当前Python编程技能的问题。我很难将不同的模块(csv reader、numpy等)组合到一个脚本中。我的数据包含了一大串多天的不同时间(以分钟分辨率)的天气变量。我的目标是确定列表中每天上午9点到晚上12点之间的风速趋势。如果风速的梯度是正的,我希望将发生这种情况的日期写在一个新的csv文件中,并附上风向。在

数据扩展到数千行,如下所示:

hd,Station Number,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Local standard time,Year Month Day Hours Minutes in YYYY,MM,DD,HH24,MI format in Universal coordinated time,Precipitation since last (AWS) observation in mm,Quality of precipitation since last (AWS) observation value,Air Temperature in degrees Celsius,Quality of air temperature,Air temperature (1-minute maximum) in degrees Celsius,Quality of air temperature (1-minute maximum),Air temperature (1-minute minimum) in degrees Celsius,Quality of air temperature (1-minute minimum),Wet bulb temperature in degrees Celsius,Quality of Wet bulb temperature,Wet bulb temperature (1 minute maximum) in degrees Celsius,Quality of wet bulb temperature (1 minute maximum),Wet bulb temperature (1 minute minimum) in degrees Celsius,Quality of wet bulb temperature (1 minute minimum),Dew point temperature in degrees Celsius,Quality of dew point temperature,Dew point temperature (1-minute maximum) in degrees Celsius,Quality of Dew point Temperature (1-minute maximum),Dew point temperature (1 minute minimum) in degrees Celsius,Quality of Dew point Temperature (1 minute minimum),Relative humidity in percentage %,Quality of relative humidity,Relative humidity (1 minute maximum) in percentage %,Quality of relative humidity (1 minute maximum),Relative humidity (1 minute minimum) in percentage %,Quality of Relative humidity (1 minute minimum),Wind (1 minute) speed in km/h,Wind (1 minute) speed quality,Minimum wind speed (over 1 minute) in km/h,Minimum wind speed (over 1 minute) quality,Wind (1 minute) direction in degrees true,Wind (1 minute) direction quality,Standard deviation of wind (1 minute),Standard deviation of wind (1 minute) direction quality,Maximum wind gust (over 1 minute) in km/h,Maximum wind gust (over 1 minute) quality,Visibility (automatic - one minute data) in km,Quality of visibility (automatic - one minute data),Mean sea level pressure in hPa,Quality of mean sea level pressure,Station level pressure in hPa,Quality of station level pressure,QNH pressure in hPa,Quality of QNH pressure,#
hd, 40842,2000,03,20,10,50,2000,03,20,10,50,2000,03,20,00,50,      ,N, 25.7,N, 25.7,N, 25.6,N, 21.5,N, 21.5,N, 21.4,N, 19.2,N, 19.2,N, 19.0,N, 67,N, 68,N, 66,N, 13,N,  9,N,100,N,  4,N, 15,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,51,2000,03,20,10,51,2000,03,20,00,51,   0.0,N, 25.6,N, 25.8,N, 25.6,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.4,N, 19.2,N, 68,N, 68,N, 66,N, 11,N,  9,N,107,N, 11,N, 13,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,52,2000,03,20,10,52,2000,03,20,00,52,   0.0,N, 25.8,N, 25.8,N, 25.6,N, 21.7,N, 21.7,N, 21.5,N, 19.5,N, 19.5,N, 19.2,N, 68,N, 69,N, 66,N, 11,N,  9,N, 83,N, 13,N, 13,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,53,2000,03,20,10,53,2000,03,20,00,53,   0.0,N, 25.8,N, 25.9,N, 25.8,N, 21.6,N, 21.8,N, 21.6,N, 19.3,N, 19.6,N, 19.3,N, 67,N, 68,N, 66,N,  9,N,  8,N, 87,N, 14,N, 11,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,54,2000,03,20,10,54,2000,03,20,00,54,   0.0,N, 25.8,N, 25.8,N, 25.8,N, 21.6,N, 21.6,N, 21.6,N, 19.3,N, 19.3,N, 19.2,N, 67,N, 67,N, 67,N,  8,N,  4,N, 98,N, 23,N,  9,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,55,2000,03,20,10,55,2000,03,20,00,55,   0.0,N, 25.7,N, 25.8,N, 25.7,N, 21.5,N, 21.6,N, 21.5,N, 19.2,N, 19.3,N, 19.2,N, 67,N, 68,N, 66,N,  8,N,  4,N, 68,N, 15,N,  9,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,56,2000,03,20,10,56,2000,03,20,00,56,   0.0,N, 25.9,N, 25.9,N, 25.7,N, 21.7,N, 21.7,N, 21.5,N, 19.4,N, 19.4,N, 19.2,N, 67,N, 68,N, 66,N,  8,N,  5,N, 69,N, 16,N,  9,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,57,2000,03,20,10,57,2000,03,20,00,57,   0.0,N, 26.0,N, 26.0,N, 25.9,N, 21.8,N, 21.8,N, 21.7,N, 19.5,N, 19.5,N, 19.4,N, 67,N, 68,N, 66,N,  9,N,  5,N, 72,N, 10,N, 11,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#
hd, 40842,2000,03,20,10,58,2000,03,20,10,58,2000,03,20,00,58,   0.0,N, 26.0,N, 26.1,N, 26.0,N, 21.7,N, 21.8,N, 21.7,N, 19.4,N, 19.5,N, 19.3,N, 66,N, 67,N, 66,N,  8,N,  5,N, 69,N, 13,N, 11,N,     ,N,1018.6,N,1017.5,N,1018.6,N,#

完整的文件只包含风速从上午9点增加到晚上12点的日期,希望采用以下格式:

^{pr2}$

梯度的精确值并不重要,只关心它是否为正,因此可以构造一个形式为(1,2,3,4,5…)的第二个数组,作为线性回归数组的第二维。事实上,在12AM和9am之间,数据的长度应该是180.9am。在

是通过多个脚本更容易解决这个问题(记住,我必须对100多个文件执行此操作),还是有一些简单的方法可以在单个脚本中解决这个问题?在

尝试的代码:

import glob
import pandas as pd
import numpy as np

for file in glob.glob('X:/brisbaneweatherdata/*.txt'):
    df = pd.read_csv(file)
    for date, group in df.groupby(['Year Month Day Hours Minutes in YYYY','MM','DD']):
        morning_data = group[group.HH24.between('09','12')]
        # calculate your linear regression here
        gradient, intercept = np.polyfit(morning_data.HH24,morning_data['Wind (1 minute) speed in km/h'], 1)
        wind_direction= np.average(morning_data.HH24,morning_data['Wind (1 minute) direction in degrees true'])
        if gradient > 0 :
            print(date + "," + gradient + "," + wind_direction)

收到的错误消息:

runfile('X:/python/linearregression.py', wdir='X:/python')
X:/python/linearregression.py:1: DtypeWarning: Columns (17,25,27,29,31,33,35,37,55,57,59) have mixed types. Specify dtype option on import or set low_memory=False.
  import glob
Traceback (most recent call last):

  File "<ipython-input-26-ace8af14da2c>", line 1, in <module>
    runfile('X:/python/linearregression.py', wdir='X:/python')

  File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)

  File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)

  File "X:/python/linearregression.py", line 8, in <module>
    morning_data = group[group.HH24.between('09','12')]

  File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\series.py", line 2486, in between
    lmask = self >= left

  File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\ops.py", line 761, in wrapper
    res = na_op(values, other)

  File "C:\Users\kirkj\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\ops.py", line 716, in na_op
    raise TypeError("invalid type comparison")

TypeError: invalid type comparison

Tags: ofinpydatalocalwinddegreestemperature
1条回答
网友
1楼 · 发布于 2024-04-23 18:01:40

我认为您应该能够在一个相当简单的脚本中使用glob迭代文件,并使用{}读取数据。这是一个基本的结构

import glob
import pandas as pd
for file in glob.glob('data/*'):
    df = pd.read_csv(file)
    for date, group in df.groupby(['year','month','day']:
        morning_data = group[group.HH24.between('09','12')]
        # calculate your linear regression here
        gradient, intercept = np.polyfit(morning_data.HH24,morning_data['wind speed'], 1)
        if gradient > 0 :
            print(gradient + "," + wind_direction + "," + gradient)

相关问题 更多 >