top of page
Writer's pictureDigital Hive

A comparison of Spark pools in Synapse and Fabric

Updated: Oct 16

In this blogpost we try to show the main features of each option. This might help to make the decision that best fits your needs. 


Synapse is a Platform-as-a-service (PAAS) that combines former standalone Azure services including ADF, KQL(Azure data explorer), Data Lake, Apache spark, Azure SQL DW. The services are oriented around big data and data warehousing. It provides one experience for ingesting, transforming, managing and serving data to other azure services such as PowerBI and Azure Ml. 


Microsoft Fabric is the latest of Microsoft SAAS, it encompasses the functionalities of Synapse and builds further on those to bring together with analytics and management tools such as PowerBI, Azure ML, Purview and the latest AI features via Copilot. Besides these, Fabric comes with a novel data storage solution, One Lake, which serves as the single source of truth for all Fabric services. 


Capacities, SCUs, vCores, SKUs and Fs 

Regardless of the process one wants to run, be it spark pools or any other form of compute instance, capacity is needed. To be able to compare how capacity is measured we need to understand what SCUs, vCores, SKUs and Fs are and how they relate to each other. 


SCUs (Synapse Commit Units) are specific to Synapse Analytics. They represent a combination of CPU, memory, and I/O resources that can be purchased and work like credits. 


vCores are used across various Azure services to measure compute power and represent a virtual CPU core. 


SKUs (Stock Keeping Units) or Fs define a specific configuration of compute resources, including vCores and memory. In the end Fs are just a direct representation of CUs(Capacity Units). The following table might help set the relation between these terms. 


Fabric 

Reserved 

In Fabric you can opt for reserving compute or pay- as- you- go. By resolving the compute you can save 41% on compute costs compared to pay-as-you-go. This also means that if you do not require compute for more than 60% of a month then it will always be cheaper to opt for a pay-as-you-go subscription. 

SKU 

Capacity unit (CU) 

vCores 

Pay-as-you-go 

Reservation 

 ~41% savings 

F 2 

0.25 

€0.407/hour 

€0.242/hour 

F 4 

0.50 

€0.814/hour 

€0.484/hour 

F 8 

€1.628/hour 

€0.968/hour 

F 16 

16 

€3.256/hour 

€1.936/hour 

F 32 

32 

€6.511/hour 

€3.872/hour 

F 64 

64 

€13.021/hour 

€7.743/hour 

F 128 

128 

16 

€26.041/hour 

€15.485/hour 

F 256 

256 

32 

€52.081/hour 

€30.970/hour 

F 512 

512 

64 

€104.162/hour 

€61.939/hour 

F 1024 

1024 

128 

€208.323/hour 

€123.878/hour 

F 2048 

2048 

256 

€416.646/hour 

€247.756/hour 

For the most up to date prices turn to the Fabric pricing page: https://azure.microsoft.com/en-us/pricing/details/microsoft-fabric/ 


Starter Pool 

When you run a notebook without a configured Spark pool, it will default to the Spark configuration and runtime environment provided by Fabric. This means you won't be able to customize the Spark version, node size, or other configuration options. 


Additionally, certain features such as automatic pausing, high concurrency, and concurrency limits may be unavailable or function differently without a configured Spark pool. 


Custom Spark Pool 

A custom Spark pool allows users to specify dependencies, size nodes, auto scale, automatic pause, and dynamically allocate executors based on Spark job requirements. When enabled, autoscaling acquires new nodes within the max node limit specified by the user and retires them after job execution. Dynamic allocation allocates an optimal number of executors based on the data volume for better performance.


Synapse 

SCUs 

In Synapse there is also a discount on the compute when it is reserved in the form of SCUs. By paying for compute in this way you can save up to 28% on compute costs. For example if one buys 5000 SCUs at the price of € 4346,22 the SCUs will be used as if they represent the currency with which you would pay in a pay as you go situation. However, since you acquired the SCUs at a lower rate this becomes cheaper. 


Keep in mind that the SCUs expire after 12 months if they are not used. 


The SCUs can be used for below Synapse services: 

  • Azure Synapse Analytics Dedicated SQL Pool 

  • Azure Synapse Analytics Managed VNET 

  • Azure Synapse Analytics Pipelines 

  • Azure Synapse Analytics Serverless SQL Pool 

  • Azure Synapse Analytics Serverless Apache Spark Pool 

  • Azure Synapse Analytics Data Flow - Basic 

  • Azure Synapse Analytics Data Flow – Standard 

Tier 

SCUs 

Discount % 

Price 

5000 

6% 

€4,346.218

10000 

8% 

€8,507.491 

24000 

11% 

€19,752.174 

60000 

16% 

€46,606.252 

150000 

22% 

€108,193.084 

360000 

28% 

€239,689.292 

For more up to date prices turn to: Pricing - Azure Synapse Analytics | Microsoft Azure 


Pay-as-you-go 

The pay-as-you-go option is more expensive than the SCUs per compute. However this approach can be interesting in two situations.  The first situation is when you are setting up a new project and you are still figuring out how much compute you are going to need for the processes that you run. The second situation is when you know you will consume less than 5000 SCUs a year. Since the bottom line of SCU purchases is 5000 you will end up not using the remaining SCUs. 

Type 

Price 

Memory Optimized 

€0.143 per vCore-hour 

GPU accelerated (public preview) 

€0.157 per vCore-hour 

 

Options for creating a spark pool. 

Node size 

Memory optimized 

Instances count 

Price/hour 

Price/month 

Small (4 vCores / 32GB) 

€0.57 

€417.34 

Medium (8 vCores / 64 GB) 

€1.14 

€834.69 

Large (16 vCores / 128 GB) 

€2.29 

€1,669.37 

XLarge (32 vCores / 256 GB) 

€4.57 

€3,338.75 

XXLarge(64 vCores / 432 GB) 

€9.15 

€6,677.50 

 

For the most up to date prices turn to the synapse pricing page: https://azure.microsoft.com/en-us/pricing/details/synapse-analytics/ 

 

The Pools 

After having looked at the pricing we can zoom in on the features that distinguish the spark pools in Synapse from the ones in Fabric. Fabric Spark pools offer both Starter and Custom pool options. Synapse Spark pools, on the other hand, are exclusively Custom pools which requires some educated choices to be made regarding node sizes and scale depending on the jobs. Synapse also supports high concurrency and has a configurable auto pause feature, whereas Fabric's auto pause duration is fixed. Additionally, Fabric's Spark pools benefit from novel features like V-Order and Spark autotune, which are not available in Synapse. 

Feature 

Azure Synapse Spark 

Fabric Spark 

Spark Pool Types 

Custom pool 

Starter pool, Custom pool 

Spark Versions (runtime) 

2.4, 3.1, 3.2, 3.3, 3.4 

3.3, 3.4, 3.5 (experimental) 

Autoscaling 

Yes 

Yes 

Dynamic Allocation of Executors 

Yes, up to 200 nodes 

Yes, based on capacity 

Adjustable Node Sizes 

Yes, 3-200 nodes 

Yes, 1-based on capacity 

Node Size Family 

Memory Optimized, GPU Accelerated 

Memory Optimized 

Node Sizes 

Small-XXXLarge 

Small-XXLarge 

Auto pause 

Yes, customizable minimum 5 minutes 

Yes, non customizable 2 minutes 

High Concurrency 

No 

Yes 

V-Order 

No 

Yes 

Spark Autotune 

No 

Yes 

Concurrency Limits 

Fixed 

Variable based on capacity 

Multiple Spark Pools 

Yes 

Yes (environments) 

Intelligent Cache 

Yes 

Yes 

API/SDK Support 

Yes 

No 

Primary Storage 

ADLS Gen2 

OneLake 

Notebook Languages 

Python, Scala, Spark SQL, R, .NET 

Python, Scala, Spark SQL, R 

Notebook Concurrency 

No 

Yes 

Pipeline Activity Support 

Yes 

Yes 

Built-in Scheduled Runs 

No 

Yes 

Retry Policies 

No 

Yes 

 

Conclusion 

In the end choosing between Spark pools in Synapse and Fabric depends on the needs of your project and its cost considerations. Synapse is tailored for users who need a comprehensive, integrated analytics service with flexible compute options via SCUs. Fabric, with its enhanced features and unified storage solution, caters to those looking for an all-encompassing analytics platform that integrates advanced AI and management tools.


Both platforms offer significant advantages, and understanding their distinct features and pricing models can help you make an informed decision that aligns with your organizational goals and budget constraints. Since Fabric is newer, it is clear that this will always be the more expensive option compared to similar amounts of compute in Synapse. However using Fabric does leave you more ready for the future, since there will very little new development in Synapse, with the focus from Microsoft being fully on Fabric. 


Still feeling unsure about your Microsoft setup? We can help! Contact us for advice about a new setup or to take a loot at your existing configurations and cloud costs.

Mathieu Dessers CRM Business Unit Manager at Digital Hive



Aslan Hattukai

Data Engineer

 

Comentários


bottom of page