我使用的是pyspark 2.1。下面是我的数据帧内容
expecteddays,date
139,30.JUl.2017
134,01.NOV.2018
我的输出应该如下所示
^{pr2}$最后一列的填充由下面的模块dateRangeBetween
和{
下面是我的代码
from datetime import datetime
from datetime import timedelta
import pandas as pd
from datetime import timedelta
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql.functions import concat,explode
from datetime import datetime
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
from datetime import timedelta
import pandas as pd
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType
from pyspark.sql import types maintenance_final_join=spark.read.csv('/user/NaveenSri/adh_dev_engg/test.csv',header=True)
def get_date(dateFormat="%d-%m-%Y", addDays=0 ,timeNow=0 ):
#print('inside get date',timesNow)
if (addDays!=0):
anotherTime = timeNow + timedelta(days=addDays)
else:
anotherTime = timeNow
return anotherTime.strftime(dateFormat)
def dateRangebetween(expectedDate , estimatedDays):
output_format = '%d-%m-%Y'
dateRangeList =[]
j=2
#print('inside Date range',expectedDate)
rangeEnddate= datetime.strptime(get_date(output_format, 730,expectedDate), '%d-%m-%Y').date()
#print('rangeEnddate---',rangeEnddate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,expectedDate), '%d-%m-%Y').date()
#print('calculatedDate----',calculatedDate)
while(calculatedDate<=rangeEnddate):
# print(calculatedDate)
#print (estimatedDays)
dateRangeList.append(calculatedDate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date()
#print('-----', datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date())
return dateRangeList
dateRange = udf(dateRangebetween, types.ArrayType(types.StringType()))
addDays=182
result = maintenance_final_join.withColumn('Part_Dates',dateRange(maintenance_final_join.Expected,maintenance_final_join.estimateddays)).show()
执行后,我收到以下错误:
TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found
首先,你能把你的缩进量修好吗。您的
dateRangebetween()
函数很难读懂。在但是,您的问题在于:
calculatedDate是一个日期时间对象。然后将这个对象(不是字符串表示)附加到dateRangeList并返回它。然后在主程序中,尝试对datetime对象数组执行udf。在
我想你的目的是使用字符串表示。如果你改变了
^{pr2}$如果插入正确格式的字符串来代替点,则至少可以处理字符串对象而不是日期时间。在
相关问题 更多 >
编程相关推荐