This article looks at the new GA release of Azure Synapse Analytics platform, where it came from, what it does and why it offers new possibilities to help deliver value from data. what you can do to realise the value from your data.
Every so often there comes a development in technology that changes the game, changes the way people work and brings new perspectives to the market that enable a way of working that just wasn’t possible before. It’s the much clichéd paradigm shift - a phrase often scoffed at as meaningless corporate jargon, but in the world of big data processing a new paradigm is in fact beginning, being heralded by the introduction of Synapse Analytics from Microsoft. This recent addition to the Azure platform is a new and exciting technology set that offers a mature end to end modern analytics technology platform in one comprehensive suite of capabilities. Azure Synapse Analytics consolidates existing technologies into a single service that promises to deliver value from data faster and more effectively. Sound interesting? Read on.
The world of data warehousing at scale on the Microsoft platform has come a long way since Microsoft first acquired DATAllegro in 2008. Originally a data warehouse solution that, alike many engineered solutions in the market place, initially ran on commodity OEM hardware bringing Massively Parallel Processing (MPP) to the Microsoft platform. Back in those days (it’s amazing how 12 years seems like an eternity in today’s fast-moving technology landscape), SQL Server 2012 Parallel Data Warehouse (PWD) as it became, was shipped as an appliance, a data warehouse in a box that ran in your data centre.
Fast forward to the cloud era, and now as SQL Server Data Warehouse, this technology was made available in Azure as a platform as a service (PaaS), removing the overheads of managing the hardware and introducing the concept of flexible and elastic compute to data warehousing workloads (customers no longer needed to buy monolithic massive appliances hoping that they had estimated the size of their requirement just at the sweet spot to exploit the tech without hitting the ceiling). With some significant performance upgrades in the second generation of the SQL DW, Microsoft had developed a leading product in terms of cost and performance for processing vast quantities of data at blazing speeds – customers could now govern their systems use by the direct business benefit it drove rather than by the depreciation of the physical hardware.
Now the latest generation rebranded to Azure Synapse Analytics offers a new approach to data warehousing and the analytics process in general.
Other notable acquisitions that have bolstered the Microsoft Analytics capability include ProClarity in 2006, that gave us the origins of the decomposition tree visual in Power BI; Revolution Analytics in 2015 which brought us high-performance statistical data modelling in R and Python; and Datazen also in 2015, which helped develop Microsoft’s mobile BI capabilities
Released into general availability (GA) today is effectively the 3rd generation of Synapse Analytics and it provides a new end-to-end analytics platform as a single cloud-based unified service. Microsoft already provided the components needed for a Modern data platform with the key components of Azure Data Lake Store gen 2, Azure Data Factory and the Synapse Analytics data warehouse, but this new release of Synapse Analytics brings everything together and more.
The most noticeable new element is the Synapse Studio that provides a single user interface to access data sources, pipelines and any code for analytics and transformation all in one place. It is not all cosmetic however, looking a little deeper there is now a choice of analytics runtimes, SQL and Apache Spark, allowing a greater architectural choice of approaches to solving business challenges. The SQL runtime is available as dedicated provisioned resources or serverless that is always available for ad hoc queries and unplanned workloads. Lastly the interoperability of both SQL and Spark with the data lake, allowing ad hoc exploration and analysis of both structured and unstructured data in Parquet, CSV, JSON file format brings everything needed to develop a modern data platform into a single unified environment.
Separation of Compute and storage
The way Synapse Analytics is architected is another key feature that sets it apart. Compute and storage workloads are effectively functionally separated meaning that you only use (and therefore pay for) the compute services as and when you need them and when they are not in use they can be paused. This is especially necessary for analytics where workload demands are often spikey.
In addition, Synapse makes use of technologies like Polybase that allow data to be accessed in external tables, greatly increasing the overall scalability of the solution. Storage when using Azure Data Lake store effectively becomes practically limitless, removing the challenge of investing capex in storage for the future demand. This dynamic management of costs will make large scale analytics much more accessible to many businesses, particularly important as the worlds data boom continues.
Simply put, Synapse Analytics is fast, no very fast at returning queries on your data. One of the main reasons is its parallel data processing engine (MPP) where query execution is distributed over many computing nodes that run in parallel and results collated by a central control node. The data needed for those calculations can be also be distributed, allowing for queries to be optimised further enhancing performance for very large scale data warehousing workloads. Moreover the nodes in Synapse Analytics can be automatically scaled to meet dynamic performance demands. Need higher performance at busy load times, simply turn up the dial and more power is used to run the queries
As data professionals, we typically build data warehouses to provide a single source of truth for an organisation. This helps provide a consistent and universally understood language to describe business performance and solve business challenges with data. The trend we have seen in recent years is the continuation of this aim, but with the convergence of the data warehouse, data lake, and data integration and all this now happening in the cloud. Having access to all of your data and analytics from a single platform to support multiple business needs is the ultimate aim as insights unlocked by the data science initiatives coexist with the analytics for informing business as usual.
The challenge of doing this in a single platform is actually harder than it sounds, as highly curated relational data has to be brought together with a broader array of variable and semi-structured data and joined together meaningfully. Over and above this, another challenge is the data platform typically needs to address multiple cohorts of data practitioners in an organisation e.g. data engineers, analysts, data scientists, data journalists, etc, all with their own differing skillsets and requirements of that platform. These demands are often quite opposed to each other and providing a single toolset with the capability to support all can be very difficult. This is where the Azure Synapse Analytics platform with its unified interface and flexible delivery options has really delivered.
At Hitachi Solutions, we’ve been providing modern data platform solutions for our customers for many years. Using the benefits of the cloud to scale up and outperformance whilst control costs is central to our modern data platform accelerator, and since Synapse Analytics was announced in late 2019, we have been working to update our delivery framework and tools to the Synapse Analytics platform. This means that we can continue to help our customers get more value and insight from their data but also to do it even more quickly with projects often taking weeks to run rather than months or years. Very often we work alongside our customers internal BI teams helping them to adapt to the new technology or to learn new ways of working. There is definitely a journey here for most Synapse users; you don’t have to be experts in Python data analytics to use Synapse, choosing SQL is perfectly valid, but what is important is selecting the right technology within the Synapse platform to meet your requirements and then adopting the best practice designs to implement a robust solution that delivers true business value.
We also offer a one day course that lets attendees get hands-on with the new technology and learn best practices from our team of experts. We’re one of a select group of Partners working with Microsoft to deliver this training on their behalf so if this sounds like something that your business would benefit from then please get in touch.