从日期到字符串的Pyspark类型转换问题

2024-05-16 05:48:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用的是pyspark 2.1。下面是我的数据帧内容

expecteddays,date

139,30.JUl.2017

134,01.NOV.2018

我的输出应该如下所示

^{pr2}$

最后一列的填充由下面的模块dateRangeBetween和{}负责

下面是我的代码

from datetime import datetime 
from datetime import timedelta
import pandas as pd
from datetime import timedelta
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql.functions import concat,explode
from datetime import datetime
from pyspark.sql.functions import udf
from pyspark.sql.types import  StringType
from datetime import timedelta
import pandas as pd
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType
from pyspark.sql import types   maintenance_final_join=spark.read.csv('/user/NaveenSri/adh_dev_engg/test.csv',header=True)

def get_date(dateFormat="%d-%m-%Y", addDays=0 ,timeNow=0 ): 
    #print('inside get date',timesNow)
    if (addDays!=0):
        anotherTime = timeNow + timedelta(days=addDays)
    else:
        anotherTime = timeNow
    return anotherTime.strftime(dateFormat)
def dateRangebetween(expectedDate , estimatedDays):
output_format = '%d-%m-%Y'



dateRangeList =[]
j=2
#print('inside Date range',expectedDate)
rangeEnddate= datetime.strptime(get_date(output_format, 730,expectedDate), '%d-%m-%Y').date()
#print('rangeEnddate---',rangeEnddate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,expectedDate), '%d-%m-%Y').date()
#print('calculatedDate----',calculatedDate)

while(calculatedDate<=rangeEnddate):    
   # print(calculatedDate)
    #print (estimatedDays)  
    dateRangeList.append(calculatedDate)
    calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date()

#print('-----', datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date())  
return dateRangeList

dateRange = udf(dateRangebetween, types.ArrayType(types.StringType()))
addDays=182
result = maintenance_final_join.withColumn('Part_Dates',dateRange(maintenance_final_join.Expected,maintenance_final_join.estimateddays)).show()

执行后,我收到以下错误:

TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found

Tags: fromimportformatoutputsqlgetdatetimedate
1条回答
网友
1楼 · 发布于 2024-05-16 05:48:49

首先,你能把你的缩进量修好吗。您的dateRangebetween()函数很难读懂。在

但是,您的问题在于:

dateRangeList.append(calculatedDate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays, 
        calculatedDate), '%d-%m-%Y').date()

calculatedDate是一个日期时间对象。然后将这个对象(不是字符串表示)附加到dateRangeList并返回它。然后在主程序中,尝试对datetime对象数组执行udf。在

我想你的目的是使用字符串表示。如果你改变了

^{pr2}$

如果插入正确格式的字符串来代替点,则至少可以处理字符串对象而不是日期时间。在

相关问题 更多 >