ibm与scikitlearn在M上的兼容性问题

2024-05-14 23:53:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个python3.6应用程序,它使用^{},部署到ibmcloud(cloudfoundry)上。它工作得很好。我的本地开发环境是macos High Sierra。在

最近,我向应用程序添加了IBM云对象存储功能(^{})。COS本身的功能很好。我可以使用ibm_boto3库上传、下载、列出和删除对象。在

奇怪的是,应用程序中使用scikit-learn的部分现在冻结了。在

如果我注释掉ibm的boto3import语句(和相应的代码),则scikit-learn代码可以正常工作。在

更令人困惑的是,这个问题只发生在运行OSX的本地开发机器上。在

我们目前唯一的假设是ibm_boto3库在scikit-learn中暴露了一个已知问题(请参见this——当numpy在OSX上使用加速器时,K-means算法的并行版本被破坏)。 请注意,只有在项目中添加ibm_boto3之后,我们才会遇到这个问题。在

但是,在部署到ibmcloud之前,我们需要能够在本地主机上进行测试。Mac OS上的ibm_boto3和{}之间是否存在已知的兼容性问题?在

关于如何在dev机器上避免这种情况有什么建议吗?在

干杯。在


Tags: 对象代码功能机器应用程序环境部署macos
1条回答
网友
1楼 · 发布于 2024-05-14 23:53:52

到目前为止,还没有任何已知的兼容性问题。:)

在某种程度上,OSX附带的普通SSL库存在一些问题,但是如果您能够读写数据,那就不是问题了。在

您正在使用HMAC credentials?如果是这样,我想知道如果使用原始的boto3库而不是IBM fork,这种行为是否会继续。在

下面是一个简单的示例,演示如何将pandas与原始boto3一起使用:

import boto3  # package used to connect to IBM COS using the S3 API
import io  # python package used to stream data
import pandas as pd  # lightweight data analysis package

access_key = '<access key>'
secret_key = '<secret key>'
pub_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
pvt_endpoint = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'
bucket = 'demo'  # the bucket holding the objects being worked on.
object_key = 'demo-data'  # the name of the data object being analyzed.
result_key = 'demo-data-results'  # the name of the output data object.


# First, we need to open a session and create a client that can connect to IBM COS.
# This client needs to know where to connect, the credentials to use,
# and what signature protocol to use for authentication. The endpoint
# can be specified to be public or private.
cos = boto3.client('s3', endpoint_url=pub_endpoint,
                   aws_access_key_id=access_key,
                   aws_secret_access_key=secret_key,
                   region_name='us',
                   config=boto3.session.Config(signature_version='s3v4'))

# Since we've already uploaded the dataset to be worked on into cloud storage,
# now we just need to identify which object we want to use. This creates a JSON
# representation of request's response headers.
obj = cos.get_object(Bucket=bucket, Key=object_key)

# Now, because this is all REST API based, the actual contents of the file are
# transported in the request body, so we need to identify where to find the
# data stream containing the actual CSV file we want to analyze.
data = obj['Body'].read()

# Now we can read that data stream into a pandas dataframe.
df = pd.read_csv(io.BytesIO(data))

# This is just a trivial example, but we'll take that dataframe and just
# create a JSON document that contains the mean values for each column.
output = df.mean(axis=0, numeric_only=True).to_json()

# Now we can write that JSON file to COS as a new object in the same bucket.
cos.put_object(Bucket=bucket, Key=result_key, Body=output)

相关问题 更多 >

    热门问题