Fix SQL Analytics Endpoint Sync Issues in Microsoft Fabric – Data Not Showing? Here's the Solution!
If you’ve ever written data to your Lakehouse in Microsoft Fabric, only to find it missing when querying through the SQL Analytics Endpoint—don’t worry, you’re not alone! 😬
This issue has been a common pain point for many users, and I’ve been working through it myself. Thankfully, there’s a straightforward way to fix the sync problem, and I’ve created a detailed video to walk you through it.
📽️ Check out the full explanation in my video here:https://youtu.be/toTKGYwr278
Here is the code that I go over in the video:
import json
import time
import logging
from datetime import datetime
import pytz
import sempy.fabric as fabric
from sempy.fabric.exceptions import FabricHTTPException, WorkspaceNotFoundException
# Setup logging to show only warnings and errors
logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s')
# Helper function to convert UTC time to Eastern Time
def convert_to_eastern(utc_time_str):
# Eastern Time Zone (ET)
eastern = pytz.timezone('US/Eastern')
if utc_time_str is None:
return 'N/A'
utc_time = datetime.fromisoformat(utc_time_str.replace("Z", "+00:00"))
return utc_time.astimezone(eastern).strftime('%Y-%m-%d %H:%M:%S %Z')
# Function to sync SQL endpoint with the Lakehouse
def sync_sql_endpoint(client, workspace_id, lakehouse_id):
try:
# Fetch SQL endpoint properties
lakehouse_info = client.get(f"/v1/workspaces/{workspace_id}/lakehouses/{lakehouse_id}").json()
sql_endpoint_id = lakehouse_info['properties']['sqlEndpointProperties']['id']
# Set URI for the API call
uri = f"/v1.0/myorg/lhdatamarts/{sql_endpoint_id}"
payload = {"commands": [{"$type": "MetadataRefreshExternalCommand"}]}
# Call REST API to initiate the sync
response = client.post(uri, json=payload)
if response.status_code != 200:
logging.error(f"Error initiating sync: {response.status_code} - {response.text}")
return
data = json.loads(response.text)
batch_id = data["batchId"]
progress_state = data["progressState"]
# URL for checking the sync status
status_uri = f"/v1.0/myorg/lhdatamarts/{sql_endpoint_id}/batches/{batch_id}"
# Polling until the sync is complete
while progress_state == 'inProgress':
time.sleep(1) # Polling interval
status_response = client.get(status_uri)
status_data = status_response.json()
progress_state = status_data["progressState"]
# Check if the sync completed successfully
if progress_state == 'success':
table_details = [
{
'tableName': table['tableName'],
'lastSuccessfulUpdate': convert_to_eastern(table.get('lastSuccessfulUpdate', 'N/A')),
'tableSyncState': table['tableSyncState'],
'sqlSyncState': table['sqlSyncState']
}
for table in status_data['operationInformation'][0]['progressDetail']['tablesSyncStatus']
]
# Print extracted table details
for detail in table_details:
print(f"Table: {detail['tableName']} Last Update: {detail['lastSuccessfulUpdate']} "
f"Table Sync State: {detail['tableSyncState']} SQL Sync State: {detail['sqlSyncState']}")
#uncomment if you need to see all the details
#display(status_data)
# Handle failure
elif progress_state == 'failure':
logging.error(f"Sync failed: {status_data}")
except FabricHTTPException as fe:
logging.error(f"Fabric HTTP Exception: {fe}")
except WorkspaceNotFoundException as we:
logging.error(f"Workspace not found: {we}")
except Exception as e:
logging.error(f"An unexpected error occurred: {e}")
# Main function to execute the sync
def run_sync():
# Initialize client
client = fabric.FabricRestClient()
# Fetching environment variables directly from Spark
workspace_id = spark.conf.get("trident.workspace.id")
lakehouse_id = spark.conf.get("trident.lakehouse.id")
# Execute the sync process
sync_sql_endpoint(client, workspace_id, lakehouse_id)
spark.sql("DROP TABLE IF EXISTS LH_AI_LAB.test1")
spark.sql("CREATE TABLE IF NOT EXISTS test1 AS SELECT * FROM LH_AI_LAB.EndPointTest")
run_sync()
What's Going On?
Let’s first talk about what’s happening under the hood. The main challenge is that sometimes, even though data is correctly written to the Lakehouse, it’s not immediately visible through T-SQL queries in SQL Analytics. This can be frustrating, especially if you’re relying on SQL Analytics for reporting or downstream processing.
The root of this issue often lies in the synchronization between the two systems. While Spark within the Lakehouse environment might have access to the most recent data, SQL Analytics sometimes lags behind.
The Solution
To address this, the key is to manually trigger a sync operation using a simple Python script. This script interacts with the SQL Analytics Endpoint to refresh metadata and force a sync, ensuring that the Lakehouse data is up-to-date and queryable via SQL.
In my video, I explain the entire workflow, but here’s a high-level breakdown of the steps:
Fetch SQL Endpoint properties – Identify the SQL Analytics Endpoint that needs to be refreshed.
Trigger a Metadata Refresh – Using a Python script, send a command to the SQL Endpoint to refresh the metadata.
Check Sync Status – Continuously monitor the sync process to ensure it completes successfully.
You don’t need to be an expert in Python to implement this, but it helps to understand the structure.
Credits:
Special thanks to Mark Pryce-Maher for his insightful post, which served as the foundation and inspiration for this video: https://medium.com/@sqltidy/delays-in-the-automatically-generated-schema-in-the-sql-analytics-endpoint-of-the-lakehouse-b01c7633035d