Azure Synapse – Data warehousing on Steroids
Building end-to-end analytics solutions represents a challenge for any company regardless of the size. IT departments end up managing multiple systems resulting in disconnected experiences, various query languages, and sometimes subpar performance.
Microsoft announced a couple of weeks ago, Azure Synapse Analytics, a limitless analytics service that blends data analytics, data warehousing, and data integration using a unified language and a robust security architecture.
Synapse is advertised as the “next generation of the Azure SQL Datawarehouse” which we can describe as “Data warehousing on steroids” because, at its core, Synapse uses a massively parallel processing database that has some of the following features:
- The UI allows and simplifies the creation of the pipelines required to connect heterogeneous systems that bring the data needed by your users.
- You can integrate data from different data lakes and retrieve results from petabytes of data in seconds. At Ignite, during the demo, it took 9 seconds to run a complex query against a Petabyte of data
- Synapse combines both relational and non-relational data, and it supports up to 10k concurrent users.
- It comes with native integration with Power-BI for users to gain the insights they need to make their business decisions.
Synapse integrates with Azure Data Share that makes internal and external data sharing a breeze. - Apache Spark and ANSI SQL are supported, but the real power comes with the integration between Synapse and Azure Machine Learning, which allow users to write complex models for predictions within the Synapse with a simple SQL query.
- According to GigaOm’s benchmark, Azure synapse is less expensive and faster than its main competitors
Related concepts:
Data Lakes and Data warehouses are terms that nowadays are used interchangeably; however, they have significant differences. Let’s add them to this article as a reference.
A data lake, a term credited to James Dixon describes, in simple words a vast pool of raw data, “a body of water in its natural state that flows from the source to the lake and that users have access to examine when needed and for various purposes.” A data warehouse is a pre-processed, structured, and filter repository of data for specific purposes.
Competitor’s products:
Note: I’m loyal to Microsoft, but I always add this section to understand the offering of the competitors
Google BigQuery
Amazon Redshift