Wednesday, April 8, 2015

Database Scalability simplified with Microsoft Azure Elastic Scale (Preview) for SQL Azure

Database Multi Tenancy in enterprises is a less known fact, but it’s a must and very familiar topic among SaaS Application developers. It’s also very hard to develop and manage Multi-Tenant application & database infrastructure.
SaaS Developers usually build custom Sharding architectures because of the deep limitation and inflexibility of SQL Server Federation services. Custom Sharding and database scalability architectures are more manual and consists lots of moving parts. One such manual architecture usually suggested Application Architects are creating fixed set of ideal schema/DB and manage a Master Connection string table to manage the mappings between the Sharding ID/Customer ID with the right shard (Refer Illustration below).

As such, there’s nothing wrong in the architecture above, but there are lots of challenges imposed by it such as

1. Infrastructure Challenges

  • a)Maintaining and Managing Shard Meta DB infra-structure
  • b)Splitting 1 noisy customer data from one shard to another shard
  • c)Scaling up a particular shard on need basis
  • d)Querying data from multi shard Databases
  • e)Merging shards to cut down costs

2. OLAP Challenges (Database Analysis & Warehousing)

  • a)Developers will not be able to issue single query to fetch data from 2 different shards
  • b)Conducting Data Analysis is hard

Introducing Azure Elastic Scale

Azure introduced “SQL Azure Elastic Scale SDK” to overcome these challenges. To get started with Azure Elastic scale, you can download the Sample App which will be a good starting point to understand the SDK inside out.  Azure Elastic Scale has 3 Key APIs that makes the Sharding simple, they are
  1. Shard Map Manager (SMM)
  2. Data Dependent Routing (DDR)
  3. Multi Shard Query(MSQ)
  4. Split & Merge

Shard Map Manager

This is the Key part of the SQL Azure Elastic Sharding function. Shard Map Manager is essentially a Master Database to hold the shard mapping details along with shard range key (Customer ID/Dept. ID/Product ID) (Refer the below screenshot).  When you add/remove shard, it creates/removes an entry into the SMM database.

Data Dependent Routing

After the shard has been defined in the Sharding Manager DB, Data Dependent Routing API takes care of routing the customer request to the appropriate shard. Data dependent routing takes the shard ID to identify the right shard of the specific customer. Shards can also span Azure regions which helps us to keep the shard very next to the customer and reduce the network latency as well as compliancy requirement.
DDR also cache the shard details received from SMM db to avoid unwanted round trip on each request. However, this cache will be invalidated as when you change the shard details.

Multi Shard Query

Multi Shard Query is the Key API which allows the querying of data from multiple Shards and helps us in joining the result set. Under the hood, it actually queries the data individually from the different shards and applies join on the received result set to get a unified data in return. Multi Shard Query is only ideal when you have ideal schema and suitable for custom schema in the shards.  Click here to view the complete list of APIs that’s part of the Microsoft.Azure.SqlDatabase.ElasticScale namespace.


public static void ExecuteMultiShardQuery(RangeShardMap shardMap, string credentialsConnectionString)
            // Get the shards to connect to
            IEnumerable shards = shardMap.GetShards();

            // Create the multi-shard connection
            using (MultiShardConnection conn = new MultiShardConnection(shards, credentialsConnectionString))
                // Create a simple command
                using (MultiShardCommand cmd = conn.CreateCommand())
                    // Because this query is grouped by CustomerID, which is sharded,
                    // we will not get duplicate rows.
                    cmd.CommandText = @"
                            c.Name AS CustomerName, 
                            COUNT(o.OrderID) AS OrderCount
                            dbo.Customers AS c INNER JOIN 
                            dbo.Orders AS o
                            ON c.CustomerID = o.CustomerID
                        GROUP BY 
                        ORDER BY 

                    // Append a column with the shard name where the row came from
                    cmd.ExecutionOptions = MultiShardExecutionOptions.IncludeShardNameColumn;

                    // Allow for partial results in case some shards do not respond in time
                    cmd.ExecutionPolicy = MultiShardExecutionPolicy.PartialResults;

                    // Allow the entire command to take up to 30 seconds
                    cmd.CommandTimeout = 30;

                    // Execute the command. 
                    // We do not need to specify retry logic because MultiShardDataReader will internally retry until the CommandTimeout expires.
                    using (MultiShardDataReader reader = cmd.ExecuteReader())
                        // Get the column names
                        TableFormatter formatter = new TableFormatter(GetColumnNames(reader).ToArray());

                        int rows = 0;
                        while (reader.Read())
                            // Read the values using standard DbDataReader methods
                            object[] values = new object[reader.FieldCount];
                            // Extract just database name from the $ShardLocation pseudocolumn to make the output formater cleaner.
                            // Note that the $ShardLocation pseudocolumn is always the last column
                            int shardLocationOrdinal = values.Length - 1;
                            values[shardLocationOrdinal] = ExtractDatabaseName(values[shardLocationOrdinal].ToString());
                            // Add values to output formatter

                        Console.WriteLine("({0} rows returned)", rows);

Download the sample application from here.

Elastic Scale Split & Merge

As the name suggests, it helps the developers to Split or Merge the DB shards based on the needs. There are 2 Key scenarios when you would need Split & Merge functionality. They are
  1. Moving data from heavily growing hotspot database to a new Shard
  2. Merge 2 databases if the size of the provisioned DB is less to reduce the database cost.
Split & Merge is a combination .Net APIs + Web API + Powershell package. Refer the below links for an introduction and Step by Step Implementation Guide.

Step by Step Guide:

Note: This post is originally posted at 8KMiles blog on April 08, 2104.