使用Azure OpenAI时出现未知文档类型错误

0 投票

1 回答

162 浏览

提问于 2025-04-14 16:31

我正在尝试按照文档中的代码进行操作：https://docs.llamaindex.ai/en/stable/examples/customization/llms/AzureOpenAI.html，但是在执行 index = VectorStoreIndex.from_documents(documents) 后遇到了以下错误：

raise ValueError(f"Unknown document type: {type(document)}")
ValueError: Unknown document type: <class 'llama_index.legacy.schema.Document'>

因为这些生成式人工智能库一直在不断更新，我需要更改 SimpleDirectoryReader 的导入方式，改成 from llama_index.legacy.readers.file.base import SimpleDirectoryReader。其他部分其实和教程是一样的（使用 llama_index==0.10.18 和 Python 版本 3.9.16）。我已经花了好几个小时在这上面，真的不知道该怎么继续了。如果有人能帮忙，那就太好了 :)

非常感谢！

编程错误导入方式 openai azure 文档类型错误生成式人工智能 llamaindex 自定义模型

1 个回答

这个错误发生是因为你传给 VectorStoreIndex.from_documents() 的文档类型不对。

当你从旧版模块导入 SimpleDirectoryReader 时，文档的类型是 llama_index.legacy.schema.Document。

enter image description here

你把这个文档传给了从核心模块导入的 VectorStoreIndex，代码是 from llama_index.core import VectorStoreIndex。

你提到的文档对于核心模块来说是正确的，你可以这样导入：from llama_index.core import VectorStoreIndex, SimpleDirectoryReader，这样一切都会正常工作。

如果你想使用旧版模块，那就用下面的代码。

from llama_index.legacy.llms.azure_openai import AzureOpenAI
from llama_index.legacy.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.legacy import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
import logging
import sys

logging.basicConfig(
    stream=sys.stdout, level=logging.INFO
)  # logging.DEBUG for more verbose output
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

api_key = "3c9xxxyyyyzzzzzssssssdb9"
azure_endpoint = "https://<resource_name>.openai.azure.com/"
api_version = "2023-07-01-preview"

llm = AzureOpenAI(
    model="gpt-4",
    deployment_name="gpt4",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="embeding1",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

documents = SimpleDirectoryReader(input_files=["./data/s1.txt"]).load_data()
type(documents[0])

service_context = ServiceContext.from_defaults(
    llm=llm, embed_model=embed_model
)

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

输出：

query = "What is the model name and who updated it last?"
query_engine = index.as_query_engine()
answer = query_engine.query(query)
print("query was:", query)
print("answer was:", answer)

enter image description here

在这里，当使用旧版模块时，所有的工具和模型都应该从同一个旧版模块导入，并且需要使用一个额外的服务上下文来处理向量存储索引。

回答于 2025-04-14 由 Python大师

分享举报

使用Azure OpenAI时出现未知文档类型错误

1 个回答

撰写回答