我想从NIMH Data Archive下载一个公共数据集。在他们的网站上创建一个帐户并接受他们的数据使用协议后,我可以下载一个CSV文件,其中包含我感兴趣的数据集中所有文件的路径。每条路径的形式都是s3://NDAR_Central_1/...
。你知道吗
在NDA Github repository中,nda-toolsPython库公开了一些有用的Python代码,可以将这些文件下载到我自己的计算机上。假设我要下载以下文件:
s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz
给定我的用户名(USRNAME
)和密码(PASSWD
)(我用于在NIMH数据存档上创建帐户的密码),以下代码允许我将此文件下载到我个人计算机上的TARGET_PATH
:
from NDATools.clientscripts.downloadcmd import configure
from NDATools.Download import Download
config = configure(username=USRNAME, password=PASSWD)
s3Download = Download(TARGET_PATH, config)
target_fnames = ['s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz']
s3Download.get_links('paths', target_fnames, filters=None)
s3Download.get_tokens()
s3Download.start_workers(False, None, 1)
在引擎盖后面,s3Download
的^{USRNAME
和PASSWD
生成临时访问密钥、密钥和安全令牌。然后,^{
一切正常!你知道吗
现在,假设我在GCP上创建了一个项目,并希望直接将这个文件下载到GCP bucket中。你知道吗
理想情况下,我想做一些类似的事情:
gsutil cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket
为此,我在Cloud Shell中执行以下Python代码(通过运行python3
):
from NDATools.TokenGenerator import NDATokenGenerator
data_api_url = 'https://nda.nih.gov/DataManager/dataManager'
generator = NDATokenGenerator(data_api_url)
token = generator.generate_token(USRNAME, PASSWD)
这给了我访问密钥、密钥和会话令牌。事实上,在以下情况下
ACCESS_KEY
表示token.access_key
的值SECRET_KEY
表示token.secret_key
的值SECURITY_TOKEN
是指token.session
的值。你知道吗然后,我在Cloud Shell中将这些凭据设置为环境变量:
export AWS_ACCESS_KEY_ID = [copy-paste ACCESS_KEY here]
export AWS_SECRET_ACCESS_KEY = [copy-paste SECRET_KEY here]
export AWS_SECURITY_TOKEN = [copy-paste SECURITY_TOKEN here]
最后,我还在我家里设置了.boto
配置文件。看起来是这样的:
[Credentials]
aws_access_key_id = $AWS_ACCESS_KEY_ID
aws_secret_access_key = $AWS_SECRET_ACCESS_KEY
aws_session_token = $AWS_SECURITY_TOKEN
[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat
use-sigv4=True
host=s3.us-east-1.amazonaws.com
当我运行以下命令时:
gsutil cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket
我的结局是:
AccessDeniedException: 403 AccessDenied
完整回溯如下:
Non-MD5 etag ("a21a0b2eba27a0a32a26a6b30f3cb060-6") present for key <Key: NDAR_Central_1,submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz>, data integrity checks are not possible.
Copying s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz [Content-Type=application/x-gzip]...
Exception in thread Thread-2:iB]
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/google/google-cloud-sdk/platform/gsutil/gslib/daisy_chain_wrapper.py", line 213, in PerformDownload
decryption_tuple=self.decryption_tuple)
File "/google/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 353, in GetObjectMedia
decryption_tuple=decryption_tuple)
File "/google/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 590, in GetObjectMedia
generation=generation)
File "/google/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1723, in _TranslateExceptionAndRaise
raise translated_exception # pylint: disable=raising-bad-type
AccessDeniedException: AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A93DBEA60B68E04D</RequestId><HostId>Z5XqPBmUdq05btXgZ2Tt7HQMzodgal6XxTD6OLQ2sGjbP20AyZ+fVFjbNfOF5+Bdy6RuXGSOzVs=</HostId></Error>
AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A93DBEA60B68E04D</RequestId><HostId>Z5XqPBmUdq05btXgZ2Tt7HQMzodgal6XxTD6OLQ2sGjbP20AyZ+fVFjbNfOF5+Bdy6RuXGSOzVs=</HostId></Error>
I would like to be able to directly download this file from a S3 bucket to my GCP bucket (without having to create a VM, setup Python and run the code above [which works]). Why is it that the temporary generated credentials work on my computer but do not work in GCP Cloud Shell?
调试命令的完整日志
gsutil -DD cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket
可以找到here。你知道吗
您试图实现的过程称为"Transfer Job"
为了将文件从Amazon S3存储桶传输到云存储桶:
在设置转移作业之前,请确保已将必要的角色分配给您的帐户,并具有here所述的所需权限。你知道吗
还要考虑到存储传输服务当前可用于某些Amazon S3区域,如设置传输作业的Amazon S3选项卡所述
传输作业也可以通过编程方式完成。更多信息here
让我知道这是否有用。你知道吗
编辑
传输服务或
gsutil
命令当前都不支持“临时安全凭据”,即使AWS支持它们。解决方法是更改gsutil命令的源代码。你知道吗我还代表您提交了一个Feature Request,我建议您启动它以获得程序的更新。你知道吗
相关问题 更多 >
编程相关推荐