我的初始文件在AWS S3
中。有人能告诉我如何在Luigi Task
中设置这个吗?
我查看了文档并找到了luigi.S3
,但不清楚该怎么做,然后我在web上搜索,只从mortar-luigi
和luigi顶部的实现获得链接。
更新
在遵循为@matagus提供的示例之后(我也按照建议创建了~/.boto
文件):
# coding: utf-8
import luigi
from luigi.s3 import S3Target, S3Client
class MyS3File(luigi.ExternalTask):
def output(self):
return S3Target('s3://my-bucket/19170205.txt')
class ProcessS3File(luigi.Task):
def requieres(self):
return MyS3File()
def output(self):
return luigi.LocalTarget('/tmp/resultado.txt')
def run(self):
result = None
for input in self.input():
print("Doing something ...")
with input.open('r') as f:
for line in f:
result = 'This is a line'
if result:
out_file = self.output().open('w')
out_file.write(result)
当我执行的时候什么也没发生
DEBUG: Checking if ProcessS3File() is complete
INFO: Informed scheduler that task ProcessS3File() has status PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 21171] Worker Worker(salt=226574718, workers=1, host=heliodromus, username=nanounanue, pid=21171) running ProcessS3File()
INFO: [pid 21171] Worker Worker(salt=226574718, workers=1, host=heliodromus, username=nanounanue, pid=21171) done ProcessS3File()
DEBUG: 1 running tasks, waiting for next task to finish
INFO: Informed scheduler that task ProcessS3File() has status DONE
DEBUG: Asking scheduler for work...
INFO: Done
INFO: There are no more tasks to run at this time
INFO: Worker Worker(salt=226574718, workers=1, host=heliodromus, username=nanounanue, pid=21171) was stopped. Shutting down Keep-Alive thread
如您所见,消息Doing something...
从不打印。怎么了?
这里的关键是定义一个没有输入的外部任务,哪些输出是您在S3中已经拥有的文件。Luigi文档在Requiring another Task中提到了这一点:
所以,基本上你会得到这样的结果:
更新:
Luigi使用boto从AWS S3读取文件和/或将文件写入AWS S3,因此为了使此代码工作,您需要在boto配置文件
~/boto
中提供凭据(查找其他possible config file locations here):相关问题 更多 >
编程相关推荐