Fix SQL Analytics Endpoint Sync Issues in Microsoft Fabric – Data Not Showing? Here's the Solution!

If you’ve ever written data to your Lakehouse in Microsoft Fabric, only to find it missing when querying through the SQL Analytics Endpoint—don’t worry, you’re not alone! 😬

This issue has been a common pain point for many users, and I’ve been working through it myself. Thankfully, there’s a straightforward way to fix the sync problem, and I’ve created a detailed video to walk you through it.

📽️ Check out the full explanation in my video here:https://youtu.be/toTKGYwr278

Here is the code that I go over in the video:

import json
import time
import logging
from datetime import datetime
import pytz
import sempy.fabric as fabric
from sempy.fabric.exceptions import FabricHTTPException, WorkspaceNotFoundException

# Setup logging to show only warnings and errors
logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s')

# Helper function to convert UTC time to Eastern Time
def convert_to_eastern(utc_time_str):
    # Eastern Time Zone (ET)
    eastern = pytz.timezone('US/Eastern')

    if utc_time_str is None:
        return 'N/A'
    utc_time = datetime.fromisoformat(utc_time_str.replace("Z", "+00:00"))
    return utc_time.astimezone(eastern).strftime('%Y-%m-%d %H:%M:%S %Z')

# Function to sync SQL endpoint with the Lakehouse
def sync_sql_endpoint(client, workspace_id, lakehouse_id):
    try:
        # Fetch SQL endpoint properties
        lakehouse_info = client.get(f"/v1/workspaces/{workspace_id}/lakehouses/{lakehouse_id}").json()
        sql_endpoint_id = lakehouse_info['properties']['sqlEndpointProperties']['id']
        
        # Set URI for the API call
        uri = f"/v1.0/myorg/lhdatamarts/{sql_endpoint_id}"
        payload = {"commands": [{"$type": "MetadataRefreshExternalCommand"}]}
        
        # Call REST API to initiate the sync
        response = client.post(uri, json=payload)
        if response.status_code != 200:
            logging.error(f"Error initiating sync: {response.status_code} - {response.text}")
            return
        
        data = json.loads(response.text)

        batch_id = data["batchId"]
        progress_state = data["progressState"]

        # URL for checking the sync status
        status_uri = f"/v1.0/myorg/lhdatamarts/{sql_endpoint_id}/batches/{batch_id}"
        
        # Polling until the sync is complete
        while progress_state == 'inProgress':
            time.sleep(1)  # Polling interval
            status_response = client.get(status_uri)
            status_data = status_response.json()
            progress_state = status_data["progressState"]
        
        # Check if the sync completed successfully
        if progress_state == 'success':
            table_details = [
                {
                    'tableName': table['tableName'],
                    'lastSuccessfulUpdate': convert_to_eastern(table.get('lastSuccessfulUpdate', 'N/A')),
                    'tableSyncState': table['tableSyncState'],
                    'sqlSyncState': table['sqlSyncState']
                }
                for table in status_data['operationInformation'][0]['progressDetail']['tablesSyncStatus']
            ]
            
            # Print extracted table details
            for detail in table_details:
                print(f"Table: {detail['tableName']}   Last Update: {detail['lastSuccessfulUpdate']}  "
                      f"Table Sync State: {detail['tableSyncState']}  SQL Sync State: {detail['sqlSyncState']}")
            
            #uncomment if you need to see all the details
            #display(status_data)
        
        # Handle failure
        elif progress_state == 'failure':
            logging.error(f"Sync failed: {status_data}")
    
    except FabricHTTPException as fe:
        logging.error(f"Fabric HTTP Exception: {fe}")
    except WorkspaceNotFoundException as we:
        logging.error(f"Workspace not found: {we}")
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}")

# Main function to execute the sync
def run_sync():
    # Initialize client
    client = fabric.FabricRestClient()
    
    # Fetching environment variables directly from Spark
    workspace_id = spark.conf.get("trident.workspace.id")
    lakehouse_id = spark.conf.get("trident.lakehouse.id")
    
    # Execute the sync process
    sync_sql_endpoint(client, workspace_id, lakehouse_id)

spark.sql("DROP TABLE IF EXISTS LH_AI_LAB.test1")
spark.sql("CREATE TABLE IF NOT EXISTS test1 AS SELECT * FROM LH_AI_LAB.EndPointTest")

run_sync()    

What's Going On?

Let’s first talk about what’s happening under the hood. The main challenge is that sometimes, even though data is correctly written to the Lakehouse, it’s not immediately visible through T-SQL queries in SQL Analytics. This can be frustrating, especially if you’re relying on SQL Analytics for reporting or downstream processing.

The root of this issue often lies in the synchronization between the two systems. While Spark within the Lakehouse environment might have access to the most recent data, SQL Analytics sometimes lags behind.

The Solution

To address this, the key is to manually trigger a sync operation using a simple Python script. This script interacts with the SQL Analytics Endpoint to refresh metadata and force a sync, ensuring that the Lakehouse data is up-to-date and queryable via SQL.

In my video, I explain the entire workflow, but here’s a high-level breakdown of the steps:

  1. Fetch SQL Endpoint properties – Identify the SQL Analytics Endpoint that needs to be refreshed.

  2. Trigger a Metadata Refresh – Using a Python script, send a command to the SQL Endpoint to refresh the metadata.

  3. Check Sync Status – Continuously monitor the sync process to ensure it completes successfully.

You don’t need to be an expert in Python to implement this, but it helps to understand the structure.

Credits:

Special thanks to Mark Pryce-Maher for his insightful post, which served as the foundation and inspiration for this video: https://medium.com/@sqltidy/delays-in-the-automatically-generated-schema-in-the-sql-analytics-endpoint-of-the-lakehouse-b01c7633035d

Previous
Previous

Transform Excel report to Pro-Level Dashboards: Inforiver visual!!!

Next
Next

Search functionality using Apply All