如何使用Azure Function将xlsb转换为xlsx?
我开发了一个功能,可以把xlsb文件转换成xlsx格式,并且在本地测试成功了。但是当我尝试在Azure平台上运行时,出现了一个错误,内容是Result: Failure Exception: OSError: [Errno 30] Read-only file system: 'TEST.xlsx'
。我查了一下,发现因为Azure Function的Python环境是基于Linux的,所以文件只能保存到临时目录。我尝试修改我的功能,把文件保存到临时目录,但又出现了新的错误,内容是Result: Failure Exception: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/TEST.xlsb'
。有没有什么建议可以让我实现这个目标:一个触发blob的Azure函数,可以把blob容器中的xlsb文件转换成xlsx格式,并且保存回blob容器?下面是我第一次尝试和后续修改的内容,包括临时目录的发现:
import os
import logging
import pandas as pd
#from io import BytesIO
import azure.functions as func
from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
app = func.FunctionApp()
@app.blob_trigger(arg_name="myblob", path="{containerName}/{name}.xlsb",
connection="BlobStorageConnectionString")
def blob_trigger(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob"
f"Name: {myblob.name}"
f"Blob Size: {myblob.length} bytes")
accountName = "name"
accountKey = "key"
connectionString = f"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey};EndpointSuffix=core.windows.net"
containerName = "{containerName}"
inputBlobname = myblob.name.replace({containerName}, "")
outputBlobname = inputBlobname.replace(".xlsb", ".xlsx")
blob_service_client = BlobServiceClient.from_connection_string(connectionString)
container_client = blob_service_client.get_container_client(containerName)
blob_client = container_client.get_blob_client(inputBlobname)
blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=containerName, blob_name=outputBlobname)
df = pd.read_excel(blob_client.download_blob().readall(), engine="pyxlsb")
df.to_excel(outputBlobname, index=False)
with open(outputBlobname, "rb") as data:
blob.upload_blob(data, overwrite=True)
import os
import logging
import pandas as pd
#from io import BytesIO
import azure.functions as func
from azure.storage.blob import BlobServiceClient, ContainerClient, BlobClient
app = func.FunctionApp()
@app.blob_trigger(arg_name="myblob", path="{containerName}/{name}.xlsb",
connection="BlobStorageConnectionString")
def blob_trigger(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob"
f"Name: {myblob.name}"
f"Blob Size: {myblob.length} bytes")
accountName = "name"
accountKey = "key"
connectionString = f"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey};EndpointSuffix=core.windows.net"
containerName = "{containerName}"
inputBlobname = myblob.name.replace({containerName}, "")
localBlobname = "/tmp/" + inputBlobname
outputBlobname = inputBlobname.replace(".xlsb", ".xlsx")
blob_service_client = BlobServiceClient.from_connection_string(connectionString)
container_client = blob_service_client.get_container_client(containerName)
blob_client = container_client.get_blob_client(inputBlobname)
blob = BlobClient.from_connection_string(conn_str=connectionString, container_name=containerName, blob_name=outputBlobname)
df = pd.read_excel(blob_client.download_blob().readall(), engine="pyxlsb")
df.to_excel("/tmp/" + outputBlobname, index=False)
ROOT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
with open(file = os.path.join(ROOT_DIR, localBlobname), mode="rb") as data:
blob.upload_blob(data, overwrite=True)
1 个回答
* Executing task: .venv\Scripts\activate && func host start
Found Python version 3.10.11 (py).
Azure Functions Core Tools
Core Tools Version: 4.0.5030 Commit hash: N/A (64-bit)
Function Runtime Version: 4.15.2.20177
[2024-03-17T04:00:10.684Z] Host lock lease acquired by instance ID '000000xxxxxxxxxxxx'.
[2024-03-17T04:00:22.921Z] Worker process started and initialized.
Functions:
blob_trigger: blobTrigger
For detailed output, run func with --verbose flag.
[2024-03-17T04:00:43.865Z] Executing 'Functions.blob_trigger' (Reason='New blob detected(LogsAndContainerScan): kamcontainer/kamb.xlsb', Id=4c9d45e5xxxxxxxxxxxxxxxx)
[2024-03-17T04:00:43.870Z] Trigger Details: MessageId: a1416e18xxxxxxxxxxxxxx, DequeueCount: 1, InsertedOn: 2024-03-17T04:00:43.000+00:00, BlobCreated: 2024-03-17T04:00:39.000+00:00, BlobLastModified: 2024-03-17T04:00:39.000+00:00
[2024-03-17T04:00:44.005Z] Python blob trigger function processed blobBlob Size: None bytes
[2024-03-17T04:00:47.081Z] Request URL: 'https://kamblobstr.blob.core.windows.net/kamcontainer/kamb.xlsx'
Request method: 'PUT'
Request headers:
'Content-Length': '4976'
'x-ms-blob-type': 'REDACTED'
'x-ms-version': 'REDACTED'
'Content-Type': 'application/octet-stream'
'Accept': 'application/xml'
'User-Agent': 'azsdk-python-storage-blob/12.19.1 Python/3.10.11 (Windows-10-10.0.22631-SP0)'
'x-ms-date': 'REDACTED'
'x-ms-client-request-id': 'ef51c49exxxxxxxxxxxxxxx'
'Authorization': 'REDACTED'
A body is sent with the request
[2024-03-17T04:00:49.113Z] Response status: 201
Response headers:
'Content-Length': '0'
'Content-MD5': 'REDACTED'
'Last-Modified': 'Sun, 17 Mar 2024 04:00:49 GMT'
'ETag': '"0x8DC4636D53FDEE9"'
'Server': 'Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0'
'x-ms-request-id': '94236a36xxxxxxxxxxxxxxxxx'
'x-ms-client-request-id': 'ef51c49exxxxxxxxxxxxxxxxxx'
'x-ms-version': 'REDACTED'
'x-ms-content-crc64': 'REDACTED'
'x-ms-request-server-encrypted': 'REDACTED'
'Date': 'Sun, 17 Mar 2024 04:00:49 GMT'
[2024-03-17T04:00:49.167Z] Executed 'Functions.blob_trigger' (Succeeded, Id=4c9d45e5xxxxxxxxxxxxxxx, Duration=6129ms)
我尝试了以下代码,将一个.xlsb格式的文件转换成.xlsx格式,并把它存储在Azure的一个存储容器里,使用的是Azure Function应用。
代码:
import logging
import pandas as pd
import azure.functions as func
from azure.storage.blob import BlobServiceClient
from io import BytesIO
app = func.FunctionApp()
@app.blob_trigger(arg_name="myblob", path="<container_name>/<file_name>.xlsb",
connection="kamblobstr_STORAGE")
def blob_trigger(myblob: func.InputStream):
logging.info(f"Python blob trigger function processed blob"
f"Blob Size: {myblob.length} bytes")
accountName = "<storage_name>"
accountKey = "<strorage_key>"
connectionString = f"DefaultEndpointsProtocol=https;AccountName={accountName};AccountKey={accountKey};EndpointSuffix=core.windows.net"
containerName = "<container_name>"
outputBlobname = "<file_name>.xlsx"
blob_service_client = BlobServiceClient.from_connection_string(connectionString)
container_client = blob_service_client.get_container_client(containerName)
input_data = myblob.read()
df = pd.read_excel(BytesIO(input_data), engine="pyxlsb")
output_data = BytesIO()
df.to_excel(output_data, index=False)
output_data.seek(0)
blob_client = container_client.get_blob_client(outputBlobname)
blob_client.upload_blob(output_data.getvalue(), overwrite=True)
local.settings.json:
{
"IsEncrypted": false,
"Values": {
"AzureWebJobsStorage": "<storage_connec_string>",
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
"kamblobstr_STORAGE": "<storage_connec_string>"
}
}
输出结果:
这个blob触发器的功能代码正在运行,我把kamb.xlsb文件上传到了Azure的blob存储容器,下面是上传的情况:
我收到了这样的消息输出:“blob kamb.xlsb 转换为 kamb.xlsx”,如下所示:
之后,我成功地将我的项目部署到了Azure Function应用中,下面是相关截图:
我把kamb.xlsb文件上传到Azure的blob存储容器,它在Azure门户的Function应用中成功运行,下面是运行情况:
在存储容器中,kamb.xlsb文件成功转换为kamb.xlsx,如下所示。
kamb.xlsx的数据:
Function应用监控日志: