Fix SQL Analytics Endpoint Sync Issues in Microsoft Fabric – Data Not Showing? Here's the Solution!
If you’ve ever written data to your Lakehouse in Microsoft Fabric, only to find it missing when querying through the SQL Analytics Endpoint—don’t worry, you’re not alone! 😬
This issue has been a common pain point for many users, and I’ve been working through it myself. Thankfully, there’s a straightforward way to fix the sync problem, and I’ve created a detailed video to walk you through it.
📽️ Check out the full explanation in my video here:https://youtu.be/toTKGYwr278
Here is the code that I go over in the video:
import json import time import logging from datetime import datetime import pytz import sempy.fabric as fabric from sempy.fabric.exceptions import FabricHTTPException, WorkspaceNotFoundException # Setup logging to show only warnings and errors logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s') # Helper function to convert UTC time to Eastern Time def convert_to_eastern(utc_time_str): # Eastern Time Zone (ET) eastern = pytz.timezone('US/Eastern') if utc_time_str is None: return 'N/A' utc_time = datetime.fromisoformat(utc_time_str.replace("Z", "+00:00")) return utc_time.astimezone(eastern).strftime('%Y-%m-%d %H:%M:%S %Z') # Function to sync SQL endpoint with the Lakehouse def sync_sql_endpoint(client, workspace_id, lakehouse_id): try: # Fetch SQL endpoint properties lakehouse_info = client.get(f"/v1/workspaces/{workspace_id}/lakehouses/{lakehouse_id}").json() sql_endpoint_id = lakehouse_info['properties']['sqlEndpointProperties']['id'] # Set URI for the API call uri = f"/v1.0/myorg/lhdatamarts/{sql_endpoint_id}" payload = {"commands": [{"$type": "MetadataRefreshExternalCommand"}]} # Call REST API to initiate the sync response = client.post(uri, json=payload) if response.status_code != 200: logging.error(f"Error initiating sync: {response.status_code} - {response.text}") return data = json.loads(response.text) batch_id = data["batchId"] progress_state = data["progressState"] # URL for checking the sync status status_uri = f"/v1.0/myorg/lhdatamarts/{sql_endpoint_id}/batches/{batch_id}" # Polling until the sync is complete while progress_state == 'inProgress': time.sleep(1) # Polling interval status_response = client.get(status_uri) status_data = status_response.json() progress_state = status_data["progressState"] # Check if the sync completed successfully if progress_state == 'success': table_details = [ { 'tableName': table['tableName'], 'lastSuccessfulUpdate': convert_to_eastern(table.get('lastSuccessfulUpdate', 'N/A')), 'tableSyncState': table['tableSyncState'], 'sqlSyncState': table['sqlSyncState'] } for table in status_data['operationInformation'][0]['progressDetail']['tablesSyncStatus'] ] # Print extracted table details for detail in table_details: print(f"Table: {detail['tableName']} Last Update: {detail['lastSuccessfulUpdate']} " f"Table Sync State: {detail['tableSyncState']} SQL Sync State: {detail['sqlSyncState']}") #uncomment if you need to see all the details #display(status_data) # Handle failure elif progress_state == 'failure': logging.error(f"Sync failed: {status_data}") except FabricHTTPException as fe: logging.error(f"Fabric HTTP Exception: {fe}") except WorkspaceNotFoundException as we: logging.error(f"Workspace not found: {we}") except Exception as e: logging.error(f"An unexpected error occurred: {e}") # Main function to execute the sync def run_sync(): # Initialize client client = fabric.FabricRestClient() # Fetching environment variables directly from Spark workspace_id = spark.conf.get("trident.workspace.id") lakehouse_id = spark.conf.get("trident.lakehouse.id") # Execute the sync process sync_sql_endpoint(client, workspace_id, lakehouse_id) spark.sql("DROP TABLE IF EXISTS LH_AI_LAB.test1") spark.sql("CREATE TABLE IF NOT EXISTS test1 AS SELECT * FROM LH_AI_LAB.EndPointTest") run_sync()
What's Going On?
Let’s first talk about what’s happening under the hood. The main challenge is that sometimes, even though data is correctly written to the Lakehouse, it’s not immediately visible through T-SQL queries in SQL Analytics. This can be frustrating, especially if you’re relying on SQL Analytics for reporting or downstream processing.
The root of this issue often lies in the synchronization between the two systems. While Spark within the Lakehouse environment might have access to the most recent data, SQL Analytics sometimes lags behind.
The Solution
To address this, the key is to manually trigger a sync operation using a simple Python script. This script interacts with the SQL Analytics Endpoint to refresh metadata and force a sync, ensuring that the Lakehouse data is up-to-date and queryable via SQL.
In my video, I explain the entire workflow, but here’s a high-level breakdown of the steps:
Fetch SQL Endpoint properties – Identify the SQL Analytics Endpoint that needs to be refreshed.
Trigger a Metadata Refresh – Using a Python script, send a command to the SQL Endpoint to refresh the metadata.
Check Sync Status – Continuously monitor the sync process to ensure it completes successfully.
You don’t need to be an expert in Python to implement this, but it helps to understand the structure.
Credits:
Special thanks to Mark Pryce-Maher for his insightful post, which served as the foundation and inspiration for this video: https://medium.com/@sqltidy/delays-in-the-automatically-generated-schema-in-the-sql-analytics-endpoint-of-the-lakehouse-b01c7633035d