使用Python/Boto/Django直接上传到S3并构建策略
我在解决这个问题时经历了很多次尝试,查找了许多不同的例子,也仔细看过相关的文档。
我想把Plupload(http://www.plupload.com/)和AWS S3的直接上传方法(http://aws.amazon.com/articles/1434)结合起来。不过,我觉得我在构建我的策略和签名时出了问题。当我提交表单时,服务器没有回应,反而是我的连接被重置了。
我尝试过使用示例中的Python代码:
import base64
import hmac, sha
policy = base64.b64encode(policy_document)
signature = base64.b64encode(
hmac.new(aws_secret_key, policy, sha).digest())
我还试过使用Python中更新的hashlib库。不管我用什么方法来构建我的策略和签名,我得到的值总是和这里生成的不同:
http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html
我也看过这个问题:
但是我发现提供的例子太复杂了,没能准确实现。
我最近的尝试是使用boto库的一部分:
http://boto.cloudhackers.com/ref/s3.html#module-boto.s3.connection
但是使用S3Connection.build_post_form_args方法对我来说也没有效果。
如果有人能提供一个如何使用Python创建上传表单的正确示例,我会非常感激。即使是一些简单的见解,解释为什么连接总是被重置,也会很好。
一些注意事项:
我希望能使用hashlib。
我想从Amazon获得XML响应(大概“success_action_status = '201'”可以做到这一点)。
我需要能够上传较大的文件,最大大小约为2GB。
最后一点,当我在Chrome中运行这个时,它会显示上传进度,上传通常在37%左右就失败了。
4 个回答
我这几天一直在为同样的问题苦恼,几乎用的是完全一样的代码。(可以看看 Python生成的S3 POST签名)我刚试着按照White Box Dev的代码来编码我的策略,但还是没能得到AWS建议的结果。最后我放弃了,转而使用...
http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html
...然后把它返回的值直接放进HTML表单里。效果很好。
@Mr. Oodles:如果你把aws_secret_key存放在一个单独的文件里,使用bash命令ls -al来检查它的字节数,确保在生成签名之前它是40个字节长。正如White Box Dev提到的,AWS不喜欢\n这个字符,可能在保存aws_secret_key字符串时,你不小心把这个隐藏字符(或者回车符^M)也一起保存了...这样就变成41个字节了。你可以试试在读取到脚本后用.replace("\n", "")或者.rstrip()来去掉它,.encode("utf-8")也可能对你有用。不过这些方法对我都没用。我很好奇你是在Windows还是Unix上运行Python...你也可以试试用emacs保存字符串,这样就不会自动插入\n了。
我试着使用Boto这个工具,但发现它不允许我添加我想要的所有头部信息。下面你可以看到我怎么生成策略、签名,以及一个包含表单值的字典。
注意,所有的x-amz-meta-*标签都是自定义的头部属性,你并不一定需要它们。另外,几乎所有要放在表单里的内容都需要包含在被编码和签名的策略中。
def generate_post_form(bucket_name, key, post_key, file_id, file_name, content_type):
import hmac
from hashlib import sha1
from django.conf import settings
policy = """{"expiration": "%(expires)s","conditions": [{"bucket":"%(bucket)s"},["eq","$key","%(key)s"],{"acl":"private"},{"x-amz-meta-content_type":"%(content_type)s"},{"x-amz-meta-file_name":"%(file_name)s"},{"x-amz-meta-post_key":"%(post_key)s"},{"x-amz-meta-file_id":"%(file_id)s"},{"success_action_status":"200"}]}"""
policy = policy%{
"expires":(datetime.utcnow()+settings.TIMEOUT).strftime("%Y-%m-%dT%H:%M:%SZ"), # This has to be formatted this way
"bucket": bucket_name, # the name of your bucket
"key": key, # this is the S3 key where the posted file will be stored
"post_key": post_key, # custom properties begin here
"file_id":file_id,
"file_name": file_name,
"content_type": content_type,
}
encoded = policy.encode('utf-8').encode('base64').replace("\n","") # Here we base64 encode a UTF-8 version of our policy. Make sure there are no new lines, Amazon doesn't like them.
return ("%s://%s.s3.amazonaws.com/"%(settings.HTTP_CONNECTION_TYPE, self.bucket_name),
{"policy":encoded,
"signature":hmac.new(settings.AWS_SECRET_KEY,encoded,sha1).digest().encode("base64").replace("\n",""), # Generate the policy signature using our Amazon Secret Key
"key": key,
"AWSAccessKeyId": settings.AWS_ACCESS_KEY, # Obviously the Amazon Access Key
"acl":"private",
"x-amz-meta-post_key":post_key,
"x-amz-meta-file_id":file_id,
"x-amz-meta-file_name": file_name,
"x-amz-meta-content_type": content_type,
"success_action_status":"200",
})
返回的元组可以用来生成一个表单,这个表单会向生成的S3网址发送请求,表单里会包含字典中的所有键值对作为隐藏字段,还有一个实际的文件输入字段,名字/ID应该是"file"。
希望这个例子能对你有所帮助。
Nathan的回答让我找到了方向。我这里提供了两个目前对我有效的解决方案。
第一个解决方案是用普通的Python写的。第二个则是用boto这个库。
我最开始尝试用boto,但总是出错。于是我回去查看了亚马逊的Ruby文档,结果发现可以用Python在不使用boto的情况下让S3接受文件。(通过HTML POST上传文件到S3)
在搞清楚事情的原委后,我修复了我的错误,最终也能使用boto了,这样做更简单。
我提供第一个解决方案是因为它清楚地展示了如何用Python设置策略文档和签名。
我的目标是创建一个动态的HTML上传页面,以及用户在成功上传后看到的“成功”页面。解决方案1展示了如何动态生成上传表单页面,而解决方案2则展示了如何创建上传表单页面和成功页面。
解决方案1:
import base64
import hmac, hashlib
###### EDIT ONLY THE FOLLOWING ITEMS ######
DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
EXPIRE_DATE = "2015-01-01T00:00:00Z" # Jan 1, 2015 gmt
FILE_TO_UPLOAD = "${filename}"
BUCKET = "media.mysite.com"
KEY = ""
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = ""
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or "https" for better security
PAGE_TITLE = "My Html Upload to S3 Form"
ACTION = "%s://%s.s3.amazonaws.com/" % (HTTP_OR_HTTPS, BUCKET)
###### DON'T EDIT FROM HERE ON DOWN ######
policy_document_data = {
"expire": EXPIRE_DATE,
"bucket_name": BUCKET,
"key_name": KEY,
"acl_name": ACL,
"success_redirect": SUCCESS,
"content_name": CONTENT_TYPE,
"content_length": CONTENT_LENGTH,
}
policy_document = """
{"expiration": "%(expire)s",
"conditions": [
{"bucket": "%(bucket_name)s"},
["starts-with", "$key", "%(key_name)s"],
{"acl": "%(acl_name)s"},
{"success_action_redirect": "%(success_redirect)s"},
["starts-with", "$Content-Type", "%(content_name)s"],
["content-length-range", 0, %(content_length)d]
]
}
""" % policy_document_data
policy = base64.b64encode(policy_document)
signature = base64.b64encode(hmac.new(AWS_SECRET_KEY, policy, hashlib.sha1).digest())
html_page_data = {
"page_title": PAGE_TITLE,
"action_name": ACTION,
"filename": FILE_TO_UPLOAD,
"access_name": AWS_ACCESS_KEY,
"acl_name": ACL,
"redirect_name": SUCCESS,
"policy_name": policy,
"sig_name": signature,
"content_name": CONTENT_TYPE,
}
html_page = """
<html>
<head>
<title>%(page_title)s</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<form action="%(action_name)s" method="post" enctype="multipart/form-data">
<input type="hidden" name="key" value="%(filename)s">
<input type="hidden" name="AWSAccessKeyId" value="%(access_name)s">
<input type="hidden" name="acl" value="%(acl_name)s">
<input type="hidden" name="success_action_redirect" value="%(redirect_name)s">
<input type="hidden" name="policy" value="%(policy_name)s">
<input type="hidden" name="signature" value="%(sig_name)s">
<input type="hidden" name="Content-Type" value="%(content_name)s">
<!-- Include any additional input fields here -->
Browse to locate the file to upload:<br \> <br \>
<input name="file" type="file"><br> <br \>
<input type="submit" value="Upload File to S3">
</form>
</body>
</html>
""" % html_page_data
with open(HTML_NAME, "wb") as f:
f.write(html_page)
###### Dump output if testing ######
if DEBUG:
if 1: # Set true if not using the LEO editor
class G:
def es(self, data):print(data)
g = G()
items = [
"",
"",
"policy_document: %s" % policy_document,
"ploicy: %s" % policy,
"signature: %s" % signature,
"",
"",
]
for item in items:
g.es(item)
解决方案2:
from boto.s3 import connection
###### EDIT ONLY THE FOLLOWING ITEMS ######
DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
SUCCESS_NAME = "success.html"
EXPIRES = 60*60*24*356 # seconds = 1 year
BUCKET = "media.mysite.com"
KEY = "${filename}" # will match file entered by user
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = "" # seems to work this way
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or https for better security
PAGE_TITLE = "My Html Upload to S3 Form"
###### DON'T EDIT FROM HERE ON DOWN ######
conn = connection.S3Connection(AWS_ACCESS_KEY,AWS_SECRET_KEY)
args = conn.build_post_form_args(
BUCKET,
KEY,
expires_in=EXPIRES,
acl=ACL,
success_action_redirect=SUCCESS,
max_content_length=CONTENT_LENGTH,
http_method=HTTP_OR_HTTPS,
fields=None,
conditions=None,
storage_class='STANDARD',
server_side_encryption=None,
)
form_fields = ""
line = ' <input type="hidden" name="%s" value="%s" >\n'
for item in args['fields']:
new_line = line % (item["name"], item["value"])
form_fields += new_line
html_page_data = {
"page_title": PAGE_TITLE,
"action": args["action"],
"input_fields": form_fields,
}
html_page = """
<html>
<head>
<title>%(page_title)s</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<form action="%(action)s" method="post" enctype="multipart/form-data" >
%(input_fields)s
<!-- Include any additional input fields here -->
Browse to locate the file to upload:<br \> <br \>
<input name="file" type="file"><br> <br \>
<input type="submit" value="Upload File to S3">
</form>
</body>
</html>
""" % html_page_data
with open(HTML_NAME, "wb") as f:
f.write(html_page)
success_page = """
<html>
<head>
<title>S3 POST Success Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<script src="jquery.js"></script>
<script src="purl.js"></script>
<!--
Amazon S3 passes three data items in the url of this page if
the upload was successful:
bucket = bucket name
key = file name upload to the bucket
etag = hash of file
The following script parses these values and puts them in
the page to be displayed.
-->
<script type="text/javascript">
var pname,url,val,params=["bucket","key","etag"];
$(document).ready(function()
{
url = $.url();
for (param in params)
{
pname = params[param];
val = url.param(pname);
if(typeof val != 'undefined')
document.getElementById(pname).value = val;
}
});
</script>
</head>
<body>
<div style="margin:0 auto;text-align:center;">
<p>Congratulations!</p>
<p>You have successfully uploaded the file.</p>
<form action="#" method="get"
>Location:
<br />
<input type="text" name="bucket" id="bucket" />
<br />File Name:
<br />
<input type="text" name="key" id="key" />
<br />Hash:
<br />
<input type="text" name="etag" id="etag" />
</form>
</div>
</body>
</html>
"""
with open(SUCCESS_NAME, "wb") as f:
f.write(success_page)
###### Dump output if testing ######
if DEBUG:
if 1: # Set true if not using the LEO editor
class G:
def es(self, data):print(data)
g = G()
g.es("conn = %s" % conn)
for key in args.keys():
if key is not "fields":
g.es("%s: %s" % (key, args[key]))
continue
for item in args['fields']:
g.es(item)