Andre Fomin 5/21/24 Andre Fomin 5/21/24

Implement #RAG (Retrieval Augmented Generation) with #langchain #chroma #OpenAI #MicrosoftFabric

Can you implement RAG (Retrieval-Augmented Generation architecture) in #Microsoft #Fabric without any Azure PaaS services? Absolutely!

Discover how #LangChain and open-source vector database #ChromaDB come to the rescue. Dive into the step-by-step guide in this video:

https://lnkd.in/eda-2wsy

#GenAI #LLM #RAG

In this video, we’re taking our RAG architecture to intergalactic levels! Building on our previous video (https://youtu.be/oCHinlZRsLU), we’re removing dependencies on PaaS components like Document Intelligence and Azure AI Search. Instead, we’re leveraging LangChain to process PDF documents and using the open-source vector database, Chroma DB, as our vector store.

this is the relevant code from the notebook. First let’s install our libraries.

%pip install langchain
%pip install langchain-core
%pip install langchain-experimental
%pip install langchain_openai
%pip install langchain-chroma
%pip install langchainhub
%pip install PyPDF2

now we set up our parameters:

import os, openai#, langchain, uuid
from synapse.ml.core.platform import find_secret


openai_key = find_secret(secret_name="openaikey", keyvault="yourservice-keys")
openai_service_name = "yourservice"
openai_endpoint = "https://yourservice.openai.azure.com/"
openai_deployment_for_embeddings = "text-embedding-ada-002"
openai_deployment_for_query = "gpt-35-turbo"
openai_deployment_for_completions = "davinci-002" #"davinci-002"
openai_api_type = "azure"
openai_api_version = "2023-12-01-preview"


os.environ["OPENAI_API_TYPE"] = openai_api_type
os.environ["OPENAI_API_VERSION"] = openai_api_version
#os.environ["OPENAI_API_BASE"] = """"
os.environ["OPENAI_API_KEY"] = openai_key
os.environ["AZURE_OPENAI_ENDPOINT"] = openai_endpoint

base_path = "/lakehouse/default/Files/prohabits/"

now we have to delete the OPEN_API_BASE environment variable or our models won’t instantiate:

del os.environ['OPENAI_API_BASE']

now we import the stuff that we need:

import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain.embeddings import AzureOpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.llms import AzureOpenAI, OpenAI
from langchain_openai import AzureOpenAIEmbeddings

now we read our PDF files:

from PyPDF2 import PdfReader
from langchain.document_loaders import PyPDFLoader
from langchain.schema import Document

folder_path = base_path

def load_pdfs_from_folder(folder_path):
    documents = []
    for filename in os.listdir(folder_path):
        if filename.endswith('.pdf'):
            file_path = os.path.join(folder_path, filename)
            reader = PdfReader(file_path)
            text = ""
            for page in reader.pages:
                text += page.extract_text()
            document = Document(page_content=text, metadata={"document_name": filename})
            documents.append(document)
    return documents

# Load documents
documents = load_pdfs_from_folder(folder_path)

# Print the content of each document
for doc in documents:
    print(f"Document Name: {doc.metadata['document_name']}")
    #print(doc.page_content)
    print("\n---\n")

then we chunk our PDFs an store the chunks in the vector store, open source vector database - Chroma:

embeddings = AzureOpenAIEmbeddings()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
splits = text_splitter.split_documents(documents)

vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

Let’s test if it works:

query = "what is a prohabits?"
answers = vectorstore.similarity_search(query)
display(answers[0].page_content)

and now it all comes together!

from langchain_openai import AzureChatOpenAI
from langchain.schema import HumanMessage
import openai

llm = AzureChatOpenAI(azure_deployment=openai_deployment_for_query)

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")

message = HumanMessage(
    content="Tell me about solar eclipse."
)
result = llm.invoke([message])

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Prohabits?")

Nikil Prabhakar 5/17/24 Nikil Prabhakar 5/17/24

Understanding the Costs of Fabric DW Queries: A Deep Dive

Welcome to our deep dive into Microsoft Fabric! If you're navigating the complexities of Microsoft Fabric, one critical aspect you'll want to master is understanding the cost of each activity, especially when it comes to Fabric Data Warehouse (DW) queries.

Tools You'll Need: Fabric Metric App and Query Insights

To gather the necessary information, you'll need to utilize two key tools:

Fabric Metric App: This app provides detailed metrics and insights into your Fabric usage, helping you track and analyze various activities.
Query Insights: This tool offers in-depth information on the performance and cost of your queries, allowing you to pinpoint exactly where your resources are being consumed.

Join Us on This Journey

By following these steps, you'll be well-equipped to figure out the costs associated with your Fabric DW queries. This understanding is crucial for optimizing your use of Microsoft Fabric and ensuring you're getting the most value from your resources.

Watch our detailed video guide here:

Andre Fomin 5/14/24 Andre Fomin 5/14/24

How to create a stacked chart with Better or Worse indicators

very inch of your dashboard real estate is precious. That’s why it has to be packed with as much insight as possible. I almost never use stacked charts because they don't make it easy to add useful context for each data point. Well, until now! In this video, I talk about how you can add a Better/Worse indicator to turn your stacked charts into an insight superpower!

Implement #RAG (Retrieval Augmented Generation) with #langchain #chroma #OpenAI #MicrosoftFabric

Understanding the Costs of Fabric DW Queries: A Deep Dive

Tools You'll Need: Fabric Metric App and Query Insights

Join Us on This Journey

How to create a stacked chart with Better or Worse indicators

YouTube

Legacy Blog

Contact