将Llama Index Vectorstoreindex与Langchain代理集成用于RAG应用程序
我今天一直在看文档,但就是搞不懂怎么用 llama_index 创建一个 VectorStoreIndex,并把生成的嵌入(embeddings)作为额外信息用在一个可以和用户对话的 RAG 应用/聊天机器人里。我想用 llama_index,因为它有一些很酷的高级检索技术,比如句子窗口检索和自动合并检索(老实说,我还没查 Langchain 是否也支持这些向量检索方法)。我想用 LangChain 是因为它在开发更复杂的提示模板方面功能强大(同样,我也没深入研究过 llama_index 是否支持这个)。
我的目标是评估这些不同的检索方法在应用/聊天机器人中的表现。我知道怎么用一个单独的评估问题文件来评估它们,但我想做一些比较,比如响应的速度和人性化程度、令牌使用情况等等。
一个最简单的可复现示例的代码如下:
1) LangChain ChatBot initiation
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ChatMessageHistory
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are the world's greatest... \
Use this document base to help you provide the best support possible to everyone you engage with.
""",
),
MessagesPlaceholder(variable_name="messages"),
]
)
chat = ChatOpenAI(model=llm_model, temperature=0.7)
chain = prompt | chat
chat_history = ChatMessageHistory()
while True:
user_input = input("You: ")
chat_history.add_user_message(user_input)
response = chain.invoke({"messages": chat_history.messages})
if user_input.lower() == 'exit':
break
print("AI:", response)
chat_history.add_ai_message(response)
- 使用 Llama index 的句子窗口检索
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.core.postprocessor import LLMRerank
class SentenceWindowUtils:
def __init__(self, documents, llm, embed_model, sentence_window_size):
self.documents = documents
self.llm = llm
self.embed_model = embed_model
self.sentence_window_size = sentence_window_size
# self.save_dir = save_dir
self.node_parser = SentenceWindowNodeParser.from_defaults(
window_size=self.sentence_window_size,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
self.sentence_context = ServiceContext.from_defaults(
llm=self.llm,
embed_model=self.embed_model,
node_parser=self.node_parser,
)
def build_sentence_window_index(self, save_dir):
if not os.path.exists(save_dir):
os.makedirs(save_dir)
sentence_index = VectorStoreIndex.from_documents(
self.documents, service_context=self.sentence_context
)
sentence_index.storage_context.persist(persist_dir=save_dir)
else:
sentence_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=save_dir),
service_context=self.sentence_context,
)
return sentence_index
def get_sentence_window_query_engine(self, sentence_index, similarity_top_k=6, rerank_top_n=3):
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
rerank = LLMRerank(top_n=rerank_top_n, service_context=self.sentence_context)
sentence_window_engine = sentence_index.as_query_engine(
similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
)
return sentence_window_engine
sentence_window = SentenceWindowUtils(documents=documents, llm = llm, embed_model=embed_model, sentence_window_size=1)
sentence_window_1 = sentence_window.build_sentence_window_index(save_dir='./indexes/sentence_window_index_1')
sentence_window_engine_1 = sentence_window.get_sentence_window_query_engine(sentence_window_1)
这两个代码块可以独立运行。但我的目标是,当进行一个需要从现有文档库中检索的查询时,我可以使用已经构建的 sentence_window_engine。我想我可以根据查询获取相关信息,然后把这些信息传递给聊天机器人的后续提示,但我希望尽量避免在提示中包含文档数据。
有什么建议吗?
1 个回答
0
我一直没找到通过 llama_index 获取信息的确切方法,正如我所希望的那样。不过,我基本上找到了一种变通办法,就是做我最初想避免的事情,通过查询我的文档库,然后把这些信息作为上下文添加到我的聊天机器人中。
#### Conversation Prompt Chain #####
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are the world's greatest...
You have access to an extensive document base of information.
Relevant Information to the user query is provided below. Use the information at your own discretion if it improves the quality of the response.
A summary of the previous conversation is also provided to contextualize you on previous conversation.
<<Relevant Information>>
{relevant_information}
<< Previous Conversation Summary>>
{previous_conversation}
<< Current Prompt >>
{user_input}
""",
),
MessagesPlaceholder(variable_name="messages"),
]
)
chat = ChatOpenAI(model=llm_model, temperature=0.0)
chain = prompt | chat
### Application Start ###
while True:
# Some code....
if route['destination'] == "data querying":
formatted_response = query_and_format_sql(username, password, host, port, mydatabase, query_prompt, model = 'gpt-4', client_name = client_name, user_input=user_input)
print(formatted_response)
chat_history.add_ai_message(AIMessage(f'The previous query triggered a SQL agent response that was {formatted_response}'))
else:
# Search Document Base
RAG_Context = sentence_window_engine_1.query(user_input)
# Inject the retrieved information into the chatbot's context
context_with_relevant_info = {
"user_input": user_input,
"messages": chat_history.messages,
"previous_conversation": memory.load_memory_variables({}),
"relevant_information": RAG_Context # ==> Inject relevant information from llama_index here
}
response = chain.invoke(context_with_relevant_info)
我还没有遇到过令牌的问题,但我可以想象,如果我的应用程序不断增长和扩展,可能会在尝试注入相关信息、消息历史和提示时遇到问题。我用 ConversationBufferMemoryHistory 限制了我的记忆,目前看起来还不错。