How to generate Business Friendly #PowerBI Data Dictionary using #OpenAI #SemanticLink #msfabric
In this video I will continue to explore how we can incorporate Azure OpenAI, LLM, GenAI, not even sure what right buzz words to use here 🙂 , in the world of Business Intelligence and Microsoft Fabric. I will use Fabric notebooks along with Semantic Link library to first create the data dictionary of our datasets deployed in a current workspace, listing the datasets, measures and table columns. Then we will use Azure OpenAI to take our DAX measure definitions and derive business audience friendly descriptions from them.
first, we use Semantic Link library to create our tables Datasets, Measures and Columns (the Columns table also have the corresponding table name, but the Measures and Columns tables do not have a corresponding data set name in them, so we have to add it ourselves.
datasets = fabric.list_datasets() spark_datasets = spark.createDataFrame(rename_columns(datasets)) spark_datasets.write.mode("overwrite").saveAsTable("pbi_dataset") spark.sql("drop table if exists lh_fabric_demo.pbi_measure") for index, row in datasets.iterrows(): dataset_name = row["Dataset_Name"] measures = fabric.list_measures(dataset = dataset_name) if len(measures)> 0: rename_columns(measures) spark_measures = spark.createDataFrame(measures) spark_measures = spark_measures.withColumn("Dataset_Name", F.lit(dataset_name)) spark_measures_ai = spark_measures.withColumn( \ "Measure_Definition", \ F.concat_ws(" = ", spark_measures["Measure_Name"], spark_measures["Measure_Expression"]) \ ).withColumn( \ "Measure_Description_AI", \ generate_measure_description_from_DAX(F.col("Measure_Definition"))) spark_measures_ai.write.mode("append").saveAsTable("pbi_measure")
spark.sql("drop table if exists lh_fabric_demo.pbi_table") for index, row in datasets.iterrows(): dataset_name = row["Dataset_Name"] tables = fabric.list_tables(dataset = dataset_name) if len(tables)> 0: rename_columns(tables) spark_tables = spark.createDataFrame(tables).withColumn("Dataset_Name", F.lit(dataset_name)) spark_tables.write.mode("append").saveAsTable("pbi_table") spark.sql("drop table if exists lh_fabric_demo.pbi_column") for index, row in datasets.iterrows(): dataset_name = row["Dataset_Name"] cols = fabric.list_columns(dataset = dataset_name) if len(cols)> 0: rename_columns(cols) spark_cols = spark.createDataFrame(cols).withColumn("Dataset_Name", F.lit(dataset_name)) spark_cols.write.mode("append").saveAsTable("pbi_column")
Now we can just query them using Spark SQL
%%sql select * from lh_fabric_demo.pbi_measure limit 10