Implement #RAG (Retrieval Augmented Generation) with #langchain #chroma #OpenAI #MicrosoftFabric
Can you implement RAG (Retrieval-Augmented Generation architecture) in #Microsoft #Fabric without any Azure PaaS services? Absolutely!
Discover how #LangChain and open-source vector database #ChromaDB come to the rescue. Dive into the step-by-step guide in this video:
https://lnkd.in/eda-2wsy
#GenAI #LLM #RAG
In this video, we’re taking our RAG architecture to intergalactic levels! Building on our previous video (https://youtu.be/oCHinlZRsLU), we’re removing dependencies on PaaS components like Document Intelligence and Azure AI Search. Instead, we’re leveraging LangChain to process PDF documents and using the open-source vector database, Chroma DB, as our vector store.
this is the relevant code from the notebook. First let’s install our libraries.
%pip install langchain %pip install langchain-core %pip install langchain-experimental %pip install langchain_openai %pip install langchain-chroma %pip install langchainhub %pip install PyPDF2
now we set up our parameters:
import os, openai#, langchain, uuid from synapse.ml.core.platform import find_secret openai_key = find_secret(secret_name="openaikey", keyvault="yourservice-keys") openai_service_name = "yourservice" openai_endpoint = "https://yourservice.openai.azure.com/" openai_deployment_for_embeddings = "text-embedding-ada-002" openai_deployment_for_query = "gpt-35-turbo" openai_deployment_for_completions = "davinci-002" #"davinci-002" openai_api_type = "azure" openai_api_version = "2023-12-01-preview" os.environ["OPENAI_API_TYPE"] = openai_api_type os.environ["OPENAI_API_VERSION"] = openai_api_version #os.environ["OPENAI_API_BASE"] = """" os.environ["OPENAI_API_KEY"] = openai_key os.environ["AZURE_OPENAI_ENDPOINT"] = openai_endpoint base_path = "/lakehouse/default/Files/prohabits/"
now we have to delete the OPEN_API_BASE environment variable or our models won’t instantiate:
del os.environ['OPENAI_API_BASE']
now we import the stuff that we need:
import bs4 from langchain import hub from langchain_chroma import Chroma from langchain_community.document_loaders import WebBaseLoader from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough from langchain_openai import OpenAIEmbeddings from langchain.embeddings import AzureOpenAIEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain.llms import AzureOpenAI, OpenAI from langchain_openai import AzureOpenAIEmbeddings
now we read our PDF files:
from PyPDF2 import PdfReader from langchain.document_loaders import PyPDFLoader from langchain.schema import Document folder_path = base_path def load_pdfs_from_folder(folder_path): documents = [] for filename in os.listdir(folder_path): if filename.endswith('.pdf'): file_path = os.path.join(folder_path, filename) reader = PdfReader(file_path) text = "" for page in reader.pages: text += page.extract_text() document = Document(page_content=text, metadata={"document_name": filename}) documents.append(document) return documents # Load documents documents = load_pdfs_from_folder(folder_path) # Print the content of each document for doc in documents: print(f"Document Name: {doc.metadata['document_name']}") #print(doc.page_content) print("\n---\n")
then we chunk our PDFs an store the chunks in the vector store, open source vector database - Chroma:
Let’s test if it works:
query = "what is a prohabits?" answers = vectorstore.similarity_search(query) display(answers[0].page_content)
and now it all comes together!
from langchain_openai import AzureChatOpenAI from langchain.schema import HumanMessage import openai llm = AzureChatOpenAI(azure_deployment=openai_deployment_for_query) retriever = vectorstore.as_retriever() prompt = hub.pull("rlm/rag-prompt") message = HumanMessage( content="Tell me about solar eclipse." ) result = llm.invoke([message]) def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) rag_chain.invoke("What is Prohabits?")
Understanding the Costs of Fabric DW Queries: A Deep Dive
Welcome to our deep dive into Microsoft Fabric! If you're navigating the complexities of Microsoft Fabric, one critical aspect you'll want to master is understanding the cost of each activity, especially when it comes to Fabric Data Warehouse (DW) queries.
Tools You'll Need: Fabric Metric App and Query Insights
To gather the necessary information, you'll need to utilize two key tools:
Fabric Metric App: This app provides detailed metrics and insights into your Fabric usage, helping you track and analyze various activities.
Query Insights: This tool offers in-depth information on the performance and cost of your queries, allowing you to pinpoint exactly where your resources are being consumed.
Join Us on This Journey
By following these steps, you'll be well-equipped to figure out the costs associated with your Fabric DW queries. This understanding is crucial for optimizing your use of Microsoft Fabric and ensuring you're getting the most value from your resources.
Watch our detailed video guide here:
How to create a stacked chart with Better or Worse indicators
very inch of your dashboard real estate is precious. That’s why it has to be packed with as much insight as possible. I almost never use stacked charts because they don't make it easy to add useful context for each data point. Well, until now! In this video, I talk about how you can add a Better/Worse indicator to turn your stacked charts into an insight superpower!
very inch of your dashboard real estate is precious. That’s why it has to be packed with as much insight as possible. I almost never use stacked charts because they don't make it easy to add useful context for each data point. Well, until now! In this video, I talk about how you can add a Better/Worse indicator to turn your stacked charts into an insight superpower!