使用Python/Boto/Django直接上传到S3并构建策略

5 投票
4 回答
5576 浏览
提问于 2025-04-16 23:59

我在解决这个问题时经历了很多次尝试,查找了许多不同的例子,也仔细看过相关的文档。

我想把Plupload(http://www.plupload.com/)和AWS S3的直接上传方法(http://aws.amazon.com/articles/1434)结合起来。不过,我觉得我在构建我的策略和签名时出了问题。当我提交表单时,服务器没有回应,反而是我的连接被重置了。

我尝试过使用示例中的Python代码:

import base64
import hmac, sha

policy = base64.b64encode(policy_document)

signature = base64.b64encode(
hmac.new(aws_secret_key, policy, sha).digest())

我还试过使用Python中更新的hashlib库。不管我用什么方法来构建我的策略和签名,我得到的值总是和这里生成的不同:

http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

我也看过这个问题:

如何让Plupload直接上传到Amazon S3?

但是我发现提供的例子太复杂了,没能准确实现。

我最近的尝试是使用boto库的一部分:

http://boto.cloudhackers.com/ref/s3.html#module-boto.s3.connection

但是使用S3Connection.build_post_form_args方法对我来说也没有效果。

如果有人能提供一个如何使用Python创建上传表单的正确示例,我会非常感激。即使是一些简单的见解,解释为什么连接总是被重置,也会很好。

一些注意事项:

我希望能使用hashlib。
我想从Amazon获得XML响应(大概“success_action_status = '201'”可以做到这一点)。
我需要能够上传较大的文件,最大大小约为2GB。

最后一点,当我在Chrome中运行这个时,它会显示上传进度,上传通常在37%左右就失败了。

4 个回答

1

我这几天一直在为同样的问题苦恼,几乎用的是完全一样的代码。(可以看看 Python生成的S3 POST签名)我刚试着按照White Box Dev的代码来编码我的策略,但还是没能得到AWS建议的结果。最后我放弃了,转而使用...

http://s3.amazonaws.com/doc/s3-example-code/post/post_sample.html

...然后把它返回的值直接放进HTML表单里。效果很好。

@Mr. Oodles:如果你把aws_secret_key存放在一个单独的文件里,使用bash命令ls -al来检查它的字节数,确保在生成签名之前它是40个字节长。正如White Box Dev提到的,AWS不喜欢\n这个字符,可能在保存aws_secret_key字符串时,你不小心把这个隐藏字符(或者回车符^M)也一起保存了...这样就变成41个字节了。你可以试试在读取到脚本后用.replace("\n", "")或者.rstrip()来去掉它,.encode("utf-8")也可能对你有用。不过这些方法对我都没用。我很好奇你是在Windows还是Unix上运行Python...你也可以试试用emacs保存字符串,这样就不会自动插入\n了。

3

我试着使用Boto这个工具,但发现它不允许我添加我想要的所有头部信息。下面你可以看到我怎么生成策略、签名,以及一个包含表单值的字典。

注意,所有的x-amz-meta-*标签都是自定义的头部属性,你并不一定需要它们。另外,几乎所有要放在表单里的内容都需要包含在被编码和签名的策略中。

def generate_post_form(bucket_name, key, post_key, file_id, file_name, content_type):
  import hmac
  from hashlib import sha1
  from django.conf import settings
  policy = """{"expiration": "%(expires)s","conditions": [{"bucket":"%(bucket)s"},["eq","$key","%(key)s"],{"acl":"private"},{"x-amz-meta-content_type":"%(content_type)s"},{"x-amz-meta-file_name":"%(file_name)s"},{"x-amz-meta-post_key":"%(post_key)s"},{"x-amz-meta-file_id":"%(file_id)s"},{"success_action_status":"200"}]}"""
  policy = policy%{
    "expires":(datetime.utcnow()+settings.TIMEOUT).strftime("%Y-%m-%dT%H:%M:%SZ"), # This has to be formatted this way
    "bucket": bucket_name, # the name of your bucket
    "key": key, # this is the S3 key where the posted file will be stored
    "post_key": post_key, # custom properties begin here
    "file_id":file_id,
    "file_name": file_name,
    "content_type": content_type,
  }
  encoded = policy.encode('utf-8').encode('base64').replace("\n","") # Here we base64 encode a UTF-8 version of our policy.  Make sure there are no new lines, Amazon doesn't like them.
  return ("%s://%s.s3.amazonaws.com/"%(settings.HTTP_CONNECTION_TYPE, self.bucket_name),
          {"policy":encoded,
           "signature":hmac.new(settings.AWS_SECRET_KEY,encoded,sha1).digest().encode("base64").replace("\n",""), # Generate the policy signature using our Amazon Secret Key
           "key": key,
           "AWSAccessKeyId": settings.AWS_ACCESS_KEY, # Obviously the Amazon Access Key
           "acl":"private",
           "x-amz-meta-post_key":post_key,
           "x-amz-meta-file_id":file_id,
           "x-amz-meta-file_name": file_name,
           "x-amz-meta-content_type": content_type,
           "success_action_status":"200",
          })

返回的元组可以用来生成一个表单,这个表单会向生成的S3网址发送请求,表单里会包含字典中的所有键值对作为隐藏字段,还有一个实际的文件输入字段,名字/ID应该是"file"。

希望这个例子能对你有所帮助。

5

Nathan的回答让我找到了方向。我这里提供了两个目前对我有效的解决方案。

第一个解决方案是用普通的Python写的。第二个则是用boto这个库。

我最开始尝试用boto,但总是出错。于是我回去查看了亚马逊的Ruby文档,结果发现可以用Python在不使用boto的情况下让S3接受文件。(通过HTML POST上传文件到S3)

在搞清楚事情的原委后,我修复了我的错误,最终也能使用boto了,这样做更简单。

我提供第一个解决方案是因为它清楚地展示了如何用Python设置策略文档和签名。

我的目标是创建一个动态的HTML上传页面,以及用户在成功上传后看到的“成功”页面。解决方案1展示了如何动态生成上传表单页面,而解决方案2则展示了如何创建上传表单页面和成功页面。

解决方案1:

import base64
import hmac, hashlib

###### EDIT ONLY THE FOLLOWING ITEMS ######

DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
EXPIRE_DATE = "2015-01-01T00:00:00Z" # Jan 1, 2015 gmt
FILE_TO_UPLOAD = "${filename}"
BUCKET = "media.mysite.com"
KEY = ""
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = ""
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or "https" for better security
PAGE_TITLE = "My Html Upload to S3 Form"
ACTION = "%s://%s.s3.amazonaws.com/" % (HTTP_OR_HTTPS, BUCKET)

###### DON'T EDIT FROM HERE ON DOWN ######

policy_document_data = {
"expire": EXPIRE_DATE,
"bucket_name": BUCKET,
"key_name": KEY,
"acl_name": ACL,
"success_redirect": SUCCESS,
"content_name": CONTENT_TYPE,
"content_length": CONTENT_LENGTH,
}

policy_document = """
{"expiration": "%(expire)s",
  "conditions": [ 
    {"bucket": "%(bucket_name)s"}, 
    ["starts-with", "$key", "%(key_name)s"],
    {"acl": "%(acl_name)s"},
    {"success_action_redirect": "%(success_redirect)s"},
    ["starts-with", "$Content-Type", "%(content_name)s"],
    ["content-length-range", 0, %(content_length)d]
  ]
}
""" % policy_document_data

policy = base64.b64encode(policy_document)
signature = base64.b64encode(hmac.new(AWS_SECRET_KEY, policy, hashlib.sha1).digest())

html_page_data = {
"page_title": PAGE_TITLE,
"action_name": ACTION,
"filename": FILE_TO_UPLOAD,
"access_name": AWS_ACCESS_KEY,
"acl_name": ACL,
"redirect_name": SUCCESS,
"policy_name": policy,
"sig_name": signature,
"content_name": CONTENT_TYPE,
}

html_page = """
<html> 
 <head>
  <title>%(page_title)s</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 </head>
<body>
 <form action="%(action_name)s" method="post" enctype="multipart/form-data">
  <input type="hidden" name="key" value="%(filename)s">
  <input type="hidden" name="AWSAccessKeyId" value="%(access_name)s">
  <input type="hidden" name="acl" value="%(acl_name)s">
  <input type="hidden" name="success_action_redirect" value="%(redirect_name)s">
  <input type="hidden" name="policy" value="%(policy_name)s">
  <input type="hidden" name="signature" value="%(sig_name)s">
  <input type="hidden" name="Content-Type" value="%(content_name)s">

  <!-- Include any additional input fields here -->

  Browse to locate the file to upload:<br \> <br \>

  <input name="file" type="file"><br> <br \>
  <input type="submit" value="Upload File to S3"> 
 </form> 
</body>
</html>
""" % html_page_data

with open(HTML_NAME, "wb") as f:
    f.write(html_page)

###### Dump output if testing ######
if DEBUG:

    if 1: # Set true if not using the LEO editor
        class G:
            def es(self, data):print(data)
        g = G()

    items = [
    "",
    "",
    "policy_document: %s" % policy_document,
    "ploicy: %s" % policy,
    "signature: %s" % signature,
    "",
    "",
    ]
    for item in items:
        g.es(item)

解决方案2:

from boto.s3 import connection

###### EDIT ONLY THE FOLLOWING ITEMS ######

DEBUG = 1
AWS_SECRET_KEY = "MySecretKey"
AWS_ACCESS_KEY = "MyAccessKey"
HTML_NAME = "S3PostForm.html"
SUCCESS_NAME = "success.html"
EXPIRES = 60*60*24*356 # seconds = 1 year
BUCKET = "media.mysite.com"
KEY = "${filename}" # will match file entered by user
ACL = "public-read" # or "private"
SUCCESS = "http://media.mysite.com/success.html"
CONTENT_TYPE = "" # seems to work this way
CONTENT_LENGTH = 1024**3 # One gigabyte
HTTP_OR_HTTPS = "http" # Or https for better security
PAGE_TITLE = "My Html Upload to S3 Form"

###### DON'T EDIT FROM HERE ON DOWN ######

conn = connection.S3Connection(AWS_ACCESS_KEY,AWS_SECRET_KEY)
args = conn.build_post_form_args(
    BUCKET,
    KEY,
    expires_in=EXPIRES,
    acl=ACL,
    success_action_redirect=SUCCESS,
    max_content_length=CONTENT_LENGTH,
    http_method=HTTP_OR_HTTPS,
    fields=None,
    conditions=None,
    storage_class='STANDARD',
    server_side_encryption=None,
    )

form_fields = ""
line = '  <input type="hidden" name="%s" value="%s" >\n'
for item in args['fields']:
    new_line = line % (item["name"], item["value"])
    form_fields += new_line

html_page_data = {
"page_title": PAGE_TITLE,
"action": args["action"],
"input_fields": form_fields,
}

html_page = """
<html> 
 <head>
  <title>%(page_title)s</title> 
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
 </head>
<body>
 <form action="%(action)s" method="post" enctype="multipart/form-data" >
%(input_fields)s
  <!-- Include any additional input fields here -->

  Browse to locate the file to upload:<br \> <br \>

  <input name="file" type="file"><br> <br \>
  <input type="submit" value="Upload File to S3"> 
 </form> 
</body>
</html>
""" % html_page_data

with open(HTML_NAME, "wb") as f:
    f.write(html_page)

success_page = """
<html>
  <head>
    <title>S3 POST Success Page</title>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      <script src="jquery.js"></script>
      <script src="purl.js"></script>
<!--

    Amazon S3 passes three data items in the url of this page if
        the upload was successful:
        bucket = bucket name
        key = file name upload to the bucket
        etag = hash of file

    The following script parses these values and puts them in
    the page to be displayed.

-->

<script type="text/javascript">
var pname,url,val,params=["bucket","key","etag"];
$(document).ready(function()
{
  url = $.url();
  for (param in params)
  {
    pname = params[param];
    val = url.param(pname);
    if(typeof val != 'undefined')
      document.getElementById(pname).value = val;
  }
});
</script>

  </head>
  <body>
      <div style="margin:0 auto;text-align:center;">
      <p>Congratulations!</p>
      <p>You have successfully uploaded the file.</p>
        <form action="#" method="get"
          >Location:
        <br />
          <input type="text" name="bucket" id="bucket" />
        <br />File Name:
        <br />
          <input type="text" name="key" id="key" />
        <br />Hash:
        <br />
          <input type="text" name="etag" id="etag" />
      </form>
    </div>
  </body>
</html>
"""

with open(SUCCESS_NAME, "wb") as f:
    f.write(success_page)

###### Dump output if testing ######
if DEBUG:

    if 1: # Set true if not using the LEO editor
        class G:
            def es(self, data):print(data)
        g = G()

    g.es("conn = %s" % conn)
    for key in args.keys():
        if key is not "fields":
            g.es("%s: %s" % (key, args[key]))
            continue
        for item in args['fields']:
            g.es(item)

撰写回答