从阿里OSS中读取一个对象,并使用python进行修改

2024-05-29 03:52:28 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我的数据是阿里云OSS bucket中的CSV文件格式。 我目前正在执行一个Python脚本,其中:

  1. 我将文件下载到本地计算机中
  2. 在本地计算机中使用Python脚本进行更改
  3. 将其存储在AWS云中

我必须修改这个方法,并在阿里云中安排一个cron作业来自动运行这个脚本。 Python脚本将上传到阿里云的任务管理中

因此,新的步骤将是:

  1. 将一个文件从OSS bucket读入Pandas
  2. 修改它-将其与其他数据合并,对某些列进行更改。-将在熊猫中完成
  3. 将修改后的文件存储到AWS RDS中

我被困在第一步本身。 错误日志:

"No module found" for OSS2 & pandas.

正确的做法是什么

这是我的脚本草稿(关于如何在本地机器上执行脚本):

import os,re
import oss2 -- **throws an error. No module found.**
import datetime as dt
import pandas as pd -- **throws an error. No module found.**
import tarfile
import mysql.connector
from datetime import datetime
from itertools import islice
dates = (dt.datetime.now()+dt.timedelta(days=-1)).strftime("%Y%m%d")
def download_file(access_key_id,access_key_secret,endpoint,bucket):

    #Authentication
    auth = oss2.Auth(access_key_id, access_key_secret)

    # Bucket name
    bucket = oss2.Bucket(auth, endpoint, bucket)

    # Download the file
    try:
        # List all objects in the fun folder and its subfolders.
        for obj in oss2.ObjectIterator(bucket, prefix=dates+'order'):
            order_file = obj.key
            objectName = order_file.split('/')[1]
            df = pd.read_csv(bucket.get_object(order_file)) # to read into pandas
            # FUNCTION to modify and upload
        print("File downloaded")
    except:
        print("Pls check!!! File not read")
    return objectName

Tags: 文件keynoimport脚本pandasdatetimebucket
1条回答
网友
1楼 · 发布于 2024-05-29 03:52:28
import os,re
import oss2 
import datetime as dt
import pandas as pd 
import tarfile
import mysql.connector
from datetime import datetime
from itertools import islice

import io ## include this new library 

dates = (dt.datetime.now()+dt.timedelta(days=-1)).strftime("%Y%m%d")
def download_file(access_key_id,access_key_secret,endpoint,bucket):

    #Authentication
    auth = oss2.Auth(access_key_id, access_key_secret)

    # Bucket name
    bucket = oss2.Bucket(auth, endpoint, bucket)

    # Download the file
    try:
        # List all objects in the fun folder and its subfolders.
        for obj in oss2.ObjectIterator(bucket, prefix=dates+'order'):
            order_file = obj.key
            objectName = order_file.split('/')[1]


            bucket_object = bucket.get_object(order_file).read() ## read the file from OSS 
            img_buf = io.BytesIO(bucket_object)) 

            df = pd.read_csv(img_buf) # to read into pandas
            # FUNCTION to modify and upload
        print("File downloaded")
    except:
        print("Pls check!!! File not read")
    return objectName

相关问题 更多 >

    热门问题