如何删除S3桶文件夹中的内容

-2 投票
1 回答
43 浏览
提问于 2025-04-12 22:01

我正在运行一个Lambda函数,用来查询ALB的访问日志,并把结果发送到一个S3存储桶里。这个过程中执行了两个查询:

Daily Logs
Monthly Logs

我创建了一个存储桶和两个文件夹,分别叫做DailyLogs和MonthlyLogs,这样当Lambda函数执行时,日志就会被存储到相应的文件夹里。

我还添加了一个功能,用来删除日志里之前的CSV文件,并用新生成的日志替换它们。但是,当Lambda函数执行时,整个DailyLogs和MonthlyLogs文件夹都被删除了。

我只想删除文件夹里的内容,用新的日志替换掉,而不是删除整个文件夹。

你能帮我一下吗?

附上了Lambda代码。

import boto3
import json
import time
from datetime import datetime

# Query string to execute
daily_query = "SELECT * FROM \"DATABASE\".\"TABLE\" WHERE user_agent LIKE '%test%' AND date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') >= date_parse(date_format(date_add('day', -1, current_date), '%Y-%m-%d'), '%Y-%m-%d') AND date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') < date_parse(date_format(current_date, '%Y-%m-%d'), '%Y-%m-%d') ORDER BY time ASC"
monthly_query = "SELECT * FROM \"DATABASE\".\"TABLE\" WHERE parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime(CAST(date_trunc('month', current_date) AS varchar), 'yyyy-MM-dd') AND parse_datetime(CAST(current_date AS varchar), 'yyyy-MM-dd') AND user_agent LIKE '%test%' ORDER BY time ASC"

# Database to execute the query against
DATABASE = 'DATABASE'

# Output bucket
bucket_name = 'BUCKET_NAME'

# Initialize Boto3 clients
s3_client = boto3.client('s3')
athena_client = boto3.client('athena')

def lambda_handler(event, context):
    try:
        # Get current date
        current_date = datetime.now()

        # Create folder names for daily and monthly logs
        daily_folder = f"DailyLogs/{current_date.strftime('%Y-%m-%d')}/"
        monthly_folder = f"MonthlyLogs/{current_date.strftime('%Y-%m')}/"

        # Delete existing files in the S3 bucket
        delete_daily_files(daily_folder)
        delete_monthly_files(monthly_folder)

        # Start the query executions
        response = athena_client.start_query_execution(
            QueryString=daily_query,
            QueryExecutionContext={'Database': DATABASE},
            ResultConfiguration={'OutputLocation': f's3://{bucket_name}/{daily_folder}'}
        )

        response = athena_client.start_query_execution(
            QueryString=monthly_query,
            QueryExecutionContext={'Database': DATABASE},
            ResultConfiguration={'OutputLocation': f's3://{bucket_name}/{monthly_folder}'}
        )

        return response
    except Exception as e:
        print(f"An error occurred: {str(e)}")
        return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}

def delete_daily_files(folder):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
    if 'Contents' in response:
        keys_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
        if keys_to_delete:
            s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': keys_to_delete})

def delete_monthly_files(folder):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
    if 'Contents' in response:
        keys_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
        if keys_to_delete:
            s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': keys_to_delete})

请查看附上的代码。现在不是文件被删除,而是整个文件夹都被删除了。

1 个回答

-1

S3 存储桶里没有“文件夹”。它们只包含文件,也叫“对象”,每个对象都有一个独特的标识符,也就是 整个文件名

你可以限制某些操作(比如列出文件)只针对特定的前缀字符串。例如,假设一个存储桶里有以下内容:

DailyLogs/2024-03-23/file-1.txt
DailyLogs/2024-03-23/file-2.txt
DailyLogs/2024-03-23/file-3.txt
DailyLogs/2024-03-24/file-1.txt
DailyLogs/2024-03-24/file-2.txt
DailyLogs/2024-03-24/file-3.txt

我们可以像这样请求包含在 2024-03-24 的文件(就像你在代码中做的那样):

response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="DailyLogs/2024-03-24")

然后会返回对象的列表:

DailyLogs/2024-03-23/file-1.txt
DailyLogs/2024-03-23/file-2.txt
DailyLogs/2024-03-23/file-3.txt

但是“前缀”的参数不一定要是“文件夹”;我们也可以这样做:

response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="DailyL")

这将返回:

DailyLogs/2024-03-23/file-1.txt
DailyLogs/2024-03-23/file-2.txt
DailyLogs/2024-03-23/file-3.txt
DailyLogs/2024-03-24/file-1.txt
DailyLogs/2024-03-24/file-2.txt
DailyLogs/2024-03-24/file-3.txt

换句话说,Prefix 的参数只是一个随便的字符串,而你存储桶里对象的名字也是随便的字符串。

撰写回答