使用临时凭证将数据从S3 bucket传输到GCP bucket

2024-04-18 09:33:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从NIMH Data Archive下载一个公共数据集。在他们的网站上创建一个帐户并接受他们的数据使用协议后,我可以下载一个CSV文件,其中包含我感兴趣的数据集中所有文件的路径。每条路径的形式都是s3://NDAR_Central_1/...。你知道吗

在我的个人电脑上下载

NDA Github repository中,nda-toolsPython库公开了一些有用的Python代码,可以将这些文件下载到我自己的计算机上。假设我要下载以下文件:

s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz

给定我的用户名(USRNAME)和密码(PASSWD)(我用于在NIMH数据存档上创建帐户的密码),以下代码允许我将此文件下载到我个人计算机上的TARGET_PATH

from NDATools.clientscripts.downloadcmd import configure
from NDATools.Download import Download

config = configure(username=USRNAME, password=PASSWD)
s3Download = Download(TARGET_PATH, config)

target_fnames = ['s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz']

s3Download.get_links('paths', target_fnames, filters=None)
s3Download.get_tokens()
s3Download.start_workers(False, None, 1)

在引擎盖后面,s3Download^{}方法将使用USRNAMEPASSWD生成临时访问密钥、密钥和安全令牌。然后,^{}方法将使用boto3和s3transfer Python库来download所选文件。你知道吗

一切正常!你知道吗

2下载到GCP bucket

现在,假设我在GCP上创建了一个项目,并希望直接将这个文件下载到GCP bucket中。你知道吗

理想情况下,我想做一些类似的事情:

gsutil cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket

为此,我在Cloud Shell中执行以下Python代码(通过运行python3):

from NDATools.TokenGenerator import NDATokenGenerator
data_api_url = 'https://nda.nih.gov/DataManager/dataManager'
generator = NDATokenGenerator(data_api_url)
token = generator.generate_token(USRNAME, PASSWD)

这给了我访问密钥、密钥和会话令牌。事实上,在以下情况下

  • ACCESS_KEY表示token.access_key的值
  • SECRET_KEY表示token.secret_key的值
  • SECURITY_TOKEN是指token.session的值。你知道吗

然后,我在Cloud Shell中将这些凭据设置为环境变量:

export AWS_ACCESS_KEY_ID = [copy-paste ACCESS_KEY here]
export AWS_SECRET_ACCESS_KEY = [copy-paste SECRET_KEY here]
export AWS_SECURITY_TOKEN = [copy-paste SECURITY_TOKEN here]

最后,我还在我家里设置了.boto配置文件。看起来是这样的:

[Credentials]
aws_access_key_id = $AWS_ACCESS_KEY_ID
aws_secret_access_key = $AWS_SECRET_ACCESS_KEY
aws_session_token = $AWS_SECURITY_TOKEN
[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat
use-sigv4=True
host=s3.us-east-1.amazonaws.com

当我运行以下命令时:

gsutil cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket

我的结局是:

AccessDeniedException: 403 AccessDenied

完整回溯如下:

Non-MD5 etag ("a21a0b2eba27a0a32a26a6b30f3cb060-6") present for key <Key: NDAR_Central_1,submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz>, data integrity checks are not possible.
Copying s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz [Content-Type=application/x-gzip]...
Exception in thread Thread-2:iB]
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/google/google-cloud-sdk/platform/gsutil/gslib/daisy_chain_wrapper.py", line 213, in PerformDownload
    decryption_tuple=self.decryption_tuple)
  File "/google/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 353, in GetObjectMedia
    decryption_tuple=decryption_tuple)
  File "/google/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 590, in GetObjectMedia
    generation=generation)
  File "/google/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1723, in _TranslateExceptionAndRaise
    raise translated_exception  # pylint: disable=raising-bad-type
AccessDeniedException: AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A93DBEA60B68E04D</RequestId><HostId>Z5XqPBmUdq05btXgZ2Tt7HQMzodgal6XxTD6OLQ2sGjbP20AyZ+fVFjbNfOF5+Bdy6RuXGSOzVs=</HostId></Error>

AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A93DBEA60B68E04D</RequestId><HostId>Z5XqPBmUdq05btXgZ2Tt7HQMzodgal6XxTD6OLQ2sGjbP20AyZ+fVFjbNfOF5+Bdy6RuXGSOzVs=</HostId></Error>

I would like to be able to directly download this file from a S3 bucket to my GCP bucket (without having to create a VM, setup Python and run the code above [which works]). Why is it that the temporary generated credentials work on my computer but do not work in GCP Cloud Shell?

调试命令的完整日志

gsutil -DD cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket

可以找到here。你知道吗


Tags: 文件keyintokenawssubmissions3bucket
1条回答
网友
1楼 · 发布于 2024-04-18 09:33:55

您试图实现的过程称为"Transfer Job"

为了将文件从Amazon S3存储桶传输到云存储桶:

A. Click the Burger Menu on the top left corner

B. Go to Storage > Transfer

C. Click Create Transfer

  1. Under Select source, select Amazon S3 bucket.

  2. In the Amazon S3 bucket text box, specify the source Amazon S3 bucket name. The bucket name is the name as it appears in the AWS Management Console.

  3. In the respective text boxes, enter the Access key ID and Secret key associated with the Amazon S3 bucket.

  4. To specify a subset of files in your source, click Specify file filters beneath the bucket field. You can include or exclude files based on file name prefix and file age.

  5. Under Select destination, choose a sink bucket or create a new one.

    • To choose an existing bucket, enter the name of the bucket (without the prefix gs://), or click Browse and browse to it.
    • To transfer files to a new bucket, click Browse and then click the New bucket icon.
  6. Enable overwrite/delete options if needed.

    By default, your transfer job only overwrites an object when the source version is different from the sink version. No other objects are overwritten or deleted. Enable additional overwrite/delete options under Transfer options.

  7. Under Configure transfer, schedule your transfer job to Run now (one time) or Run daily at the local time you specify.

  8. Click Create.

在设置转移作业之前,请确保已将必要的角色分配给您的帐户,并具有here所述的所需权限。你知道吗

还要考虑到存储传输服务当前可用于某些Amazon S3区域,如设置传输作业Amazon S3选项卡所述

传输作业也可以通过编程方式完成。更多信息here

让我知道这是否有用。你知道吗

编辑

传输服务或gsutil命令当前都不支持“临时安全凭据”,即使AWS支持它们。解决方法是更改gsutil命令的源代码。你知道吗

我还代表您提交了一个Feature Request,我建议您启动它以获得程序的更新。你知道吗

相关问题 更多 >