如何删除S3桶文件夹中的内容
我正在运行一个Lambda函数,用来查询ALB的访问日志,并把结果发送到一个S3存储桶里。这个过程中执行了两个查询:
Daily Logs
Monthly Logs
我创建了一个存储桶和两个文件夹,分别叫做DailyLogs和MonthlyLogs,这样当Lambda函数执行时,日志就会被存储到相应的文件夹里。
我还添加了一个功能,用来删除日志里之前的CSV文件,并用新生成的日志替换它们。但是,当Lambda函数执行时,整个DailyLogs和MonthlyLogs文件夹都被删除了。
我只想删除文件夹里的内容,用新的日志替换掉,而不是删除整个文件夹。
你能帮我一下吗?
附上了Lambda代码。
import boto3
import json
import time
from datetime import datetime
# Query string to execute
daily_query = "SELECT * FROM \"DATABASE\".\"TABLE\" WHERE user_agent LIKE '%test%' AND date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') >= date_parse(date_format(date_add('day', -1, current_date), '%Y-%m-%d'), '%Y-%m-%d') AND date_parse(time, '%Y-%m-%dT%H:%i:%s.%fZ') < date_parse(date_format(current_date, '%Y-%m-%d'), '%Y-%m-%d') ORDER BY time ASC"
monthly_query = "SELECT * FROM \"DATABASE\".\"TABLE\" WHERE parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime(CAST(date_trunc('month', current_date) AS varchar), 'yyyy-MM-dd') AND parse_datetime(CAST(current_date AS varchar), 'yyyy-MM-dd') AND user_agent LIKE '%test%' ORDER BY time ASC"
# Database to execute the query against
DATABASE = 'DATABASE'
# Output bucket
bucket_name = 'BUCKET_NAME'
# Initialize Boto3 clients
s3_client = boto3.client('s3')
athena_client = boto3.client('athena')
def lambda_handler(event, context):
try:
# Get current date
current_date = datetime.now()
# Create folder names for daily and monthly logs
daily_folder = f"DailyLogs/{current_date.strftime('%Y-%m-%d')}/"
monthly_folder = f"MonthlyLogs/{current_date.strftime('%Y-%m')}/"
# Delete existing files in the S3 bucket
delete_daily_files(daily_folder)
delete_monthly_files(monthly_folder)
# Start the query executions
response = athena_client.start_query_execution(
QueryString=daily_query,
QueryExecutionContext={'Database': DATABASE},
ResultConfiguration={'OutputLocation': f's3://{bucket_name}/{daily_folder}'}
)
response = athena_client.start_query_execution(
QueryString=monthly_query,
QueryExecutionContext={'Database': DATABASE},
ResultConfiguration={'OutputLocation': f's3://{bucket_name}/{monthly_folder}'}
)
return response
except Exception as e:
print(f"An error occurred: {str(e)}")
return {'statusCode': 500, 'body': json.dumps({'error': str(e)})}
def delete_daily_files(folder):
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
if 'Contents' in response:
keys_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
if keys_to_delete:
s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': keys_to_delete})
def delete_monthly_files(folder):
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder)
if 'Contents' in response:
keys_to_delete = [{'Key': obj['Key']} for obj in response['Contents']]
if keys_to_delete:
s3_client.delete_objects(Bucket=bucket_name, Delete={'Objects': keys_to_delete})
请查看附上的代码。现在不是文件被删除,而是整个文件夹都被删除了。
1 个回答
-1
S3 存储桶里没有“文件夹”。它们只包含文件,也叫“对象”,每个对象都有一个独特的标识符,也就是 整个文件名。
你可以限制某些操作(比如列出文件)只针对特定的前缀字符串。例如,假设一个存储桶里有以下内容:
DailyLogs/2024-03-23/file-1.txt
DailyLogs/2024-03-23/file-2.txt
DailyLogs/2024-03-23/file-3.txt
DailyLogs/2024-03-24/file-1.txt
DailyLogs/2024-03-24/file-2.txt
DailyLogs/2024-03-24/file-3.txt
我们可以像这样请求包含在 2024-03-24
的文件(就像你在代码中做的那样):
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="DailyLogs/2024-03-24")
然后会返回对象的列表:
DailyLogs/2024-03-23/file-1.txt
DailyLogs/2024-03-23/file-2.txt
DailyLogs/2024-03-23/file-3.txt
但是“前缀”的参数不一定要是“文件夹”;我们也可以这样做:
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix="DailyL")
这将返回:
DailyLogs/2024-03-23/file-1.txt
DailyLogs/2024-03-23/file-2.txt
DailyLogs/2024-03-23/file-3.txt
DailyLogs/2024-03-24/file-1.txt
DailyLogs/2024-03-24/file-2.txt
DailyLogs/2024-03-24/file-3.txt
换句话说,Prefix
的参数只是一个随便的字符串,而你存储桶里对象的名字也是随便的字符串。