Stretch Database – Remote Data Archiving

Introduction

One of the many new features of SQL Server 2016 is called a Stretch Database, sometimes referred to by the technical term Remote Data Archiving (RDA). In a nutshell, this feature allows you to copy data to the Microsoft Azure Cloud. It can be useful for data archiving by reducing the amount of data local to the database, while at the same time making that data readily available for querying.

Stretch DB is available for all editions of SQL Server 2016 and requires that you have an active Azure Cloud subscription in order to work. It’s mainly designed for offloading large amounts of “cold data” out of just a few large tables. Auditing and logging tables are a good example, as they can get very large. If you have certain business requirement that prevents you from properly maintaining these tables, Stretch DB may be the appropriate solution for you.

The easiest way to verify if you can use Stretch DB and determine which tables would qualify is to use the Stretch DB Analyzer, which can be accessed by following these steps:

  • Make sure you are on SQL 2016 and have the latest SMSS Installed
  • Right click on your database
  • Tasks
  • Stretch
  • Enable

This will engage the wizard which will walk you through the steps of setting up the Stretch DB as well as run the analyzer and give you a report which tables are eligible for archiving.

Table Requirements

Not all tables will be supported by Stretch DB, in fact you may find the opposite to be true. Again, the main purpose is to archive cold data. As such, there are a handful of limitations of Stretch DB you need to be aware of.  As you go through the list of the tables in the Analyzer, you’ll see some tables which have exclamation marks on them. This will tell you that maybe you have a constraint that will not be enforced, such as Unique or Primary Key. It may also have a red X, which means the table doesn’t qualify.  Here’s an elaborate list of Stretch DB Limitations:

  • Constraint Limitations
    • UNIQUE and Primary Key constraints ARE NOT enforced on the archived data in the Azure cloud.
  • INSERT, UPDATE, DELETE Limitations
    • Rows cannot be UPDATED or DELETED if they’re:
      • In the cloud,
      • Eligible for migration in a table that has Stretch enabled,
      • In a view of such table(s).
    • Cannot INSERT rows in a Stretched table on Linked server.
  • Index Limitations
    • Cannot create an index on a view of stretched tables
    • Filters (filtered indexes) are not applied to remote tables

Limitations above will still allow you to use Stretch albeit with limited functionality. The following set of limitations prevents Stretch from working altogether, so if your tables fall in these categories, you will not be able to enable Stretch on them.

  • Tables with:
    • More than 1023 columns or more than 998 indexes
    • Tables containing FILESTREAM data
    • Replicated tables, or tables using Change Tracking or Change Data Capture
    • Memory-optimized tables
  • Data types:
    • Text, ntext and image
    • Timestamp
    • Sql_variant
    • Xml
    • CLR data types, as well as geometry and geography types
  • Column Types:
    • COLUMN_SET
    • Computed Columns
  • Constraints
    • DEFAULT
    • CHECK
    • FOREIGN KEY
      • Stretch CAN be enabled for child table but not the parent table!
    • Indexes
      • Full Text
      • XML
      • Spatial
      • Indexed Views referencing the table

 

Phew! Alright, that does look like a lot of limitations and almost makes the feature seem unusable. Just remember that the whole purpose of the feature is to reduce the size of your biggest tables that contain cold data, such as audit and log tables. If your business plan allows you to purge these tables instead, that would be the better approach. Otherwise, if you have to keep them, you have an option of using Stretch DB or archiving that data locally in another database. The downside of local archiving is that it may require code changes to grab data from different locations.

Implementation

So let’s say that, after all those limitations you have indeed found tables which are eligible and can benefit from Stretch. How do we set it up? We can use the same wizard as before, since it’s intended to not only analyze the database, but also turn on the Stretch functionality.

Before you start, make sure you have an active Azure subscription, as you’ll need to log in during the setup through the wizard.

In your table list, pick the table(s) for which you wish to enable Stretch. In my example, I’m choosing the [expedite_audit_tbl] which can get very large. Suppose that a business decision requires 2 years of data in this table; however, it’s 30GB. This 30GB table is making our maintenance jobs run longer, and our backups larger. While we can archive it to another database, code in the application would have to change in order to find the archived data. We won’t have this problem with Stretch, as it will handle the linking for us. To the application, this will look like the same old [expedite_audit_tbl] it always expected to see.

stretchdb_tableselection

Notice that the table has an exclamation point next to it. If we click on it, it tells us that [expedite_audit_ident] column is either part of a PK or UQ Index and will not be enforced on Azure end. For our purposes, we’ll say that’s fine. You may want to consider your business needs and decide if this limitation will be a showstopper.

On the next screen, you’ll be asked to log in with your Microsoft account. Once you log in, your subscription(s) will be shown to you and you can select it from the list if you have multiple ones.

You get a choice of either selecting an existing SQL server or creating a new one. Either method is fine. If you prefer to stop here, sign in to your Azure cloud and spin up a new SQL server, you can certainly do that. Then just simply provide the settings of an existing server. If you’d like the wizard to create all that for you, we’ll simply fill in the blanks of the Create New Server fields.

stretchdb_server

Whether you’re creating a server in the wizard or directly on the Azure dashboard, try to pick the location that best matches your physical location. In my case, East US would be the best option as I’m located in Ohio.

After you’ve filled out your desired credentials, the next step would be to either create or unlock a Database Encryption Key (DMK). I’ve already had a key created and protected by a password, which I simply type in. If you have never created a DMK before, you’ll be able to do that now and provide a password.

Next, there are important settings related to the Firewall. Azure cloud has the firewall enabled by default and you’ll need to specify your server’s IP or a range that you want to open up on the Cloud side. You have an option of either using the IP of your SQL server or a range of IPs. An example of that range would be 10.10.131.160 – 10.10.131.235

Lastly you’ll be presented with the summary. Let’s have a look:

stretchdb_summar

I covered up some of the more sensitive information but you can basically see all of your settings, ip ranges, and even estimated pricing. Once you click next, it will take a little while for the server to be spun up, created and Stretch DB to start. If you run into errors during creation of the server, you can always opt for the 2nd method, which is to create the resources directly from your Azure Portal.

The Wizard certainly makes the configuration simple and straight forward. If this is your first time configuring the Stretch DB, I would recommend going this route. If you run into any errors or you prefer to do this manually, see the following article that describes how to enable Stretch DB without the wizard: https://msdn.microsoft.com/library/mt163698.aspx

Stretching Selected Rows

So far, we’ve seen how to stretch the whole table, and configure the Azure Portal using the Stretch DB Wizard. You can also stretch specific rows of the table, instead of the entire table. This works by writing a predicate function which you’ll use to filter the rows. For example, we can do filtering by date.

Let’s look at a function that will do just that:

-- Predicate Function

CREATE FUNCTION dbo.StretchFn (@archive_date datetime)

RETURNS TABLE

WITH SCHEMABINDING

AS

RETURN

       SELECT 1 AS stretched

       WHERE @archive_date < '01/01/2016'

GO

We’re making a table function, we’re also making it schema-bound and we won’t be able to alter it once created. This is one of few limitations of using the predicate filter functions. We’re using a FIXED date value. This is due to another limitation which is that the expressions in the function must be deterministic, or in other words, values must be known at runtime. It would be nice to use something like ‘GETDATE()-180’ to allow us to compute the date, however that’s not allowed here.

If we needed to change the date, we can make a new function and assign it. Be careful though, because doing this also has some limitations. The new function cannot be any more restrictive than the old function and it has to gather all the same data.

Once we have our desired function, we can apply it to the table using the FILTER_PREDICATE.

ALTER TABLE [expedite_audit_tbl]

       SET (REMOTE_DATA_ARCHIVE = ON (

              FILTER_PREDICATE = dbo.StretchFn(updated_dt),

              MIGRATION_STATE = OUTBOUND))

GO

We’re ensuring RDA is ON and we’re referencing the function with the date column, in our case [updated_dt]. One important setting here is the direction of the migration. In this example, we’re setting the direction to be OUTBOUND, or to the cloud. If we wanted to get those rows back at some point, we would change the setting to INBOUND. Normally this is a step you would do right before disabling Stretch DB.

Depending on the number of rows that will become eligible for migration, it may take some time for the migration to complete.

Verifying the status of Stretch

How do you make sure your rows are getting sent to Azure? Well, you can certainly log into the portal and run your query there to check your rows, but you can also directly connect to your server from SSMS. However, neither one of these is recommended as there are tools inside of SSMS to give us that information without having to connect to RDA server.

EXEC sp_spaceused @objname = 'expedite_audit_tbl'

EXEC sp_spaceused @objname = 'expedite_audit_tbl', @mode + 'LOCAL_ONLY'

EXEC sp_spaceused @objname = 'expedite_audit_tbl', @mode + 'REMOTE_ONLY'

We can use the sp_spaceused procedure to identify the number of records in the table. With the @mode variable we can also specify where those rows reside. LOCAL_ONLY will return number of local records and REMOTE_ONLY, you guessed it, will return the rows from Azure.

You can also get a nice generic view of all of your migrated rows by using the dynamic management view called sys.dm_db_rda_migration_status like this:

SELECT * FROM sys.dm_db_rda_migration_status WHERE migrated_rows > 0

This shows a list of objects and their associated databases, as well as migration timestamps, any errors that may have occurred, and of course, the number of rows migrated.

Execution Plans with RDA

After enabling stretch, the way SQL Server will retrieve data will depend on our search predicates. For example, if we simply get all data from our [expedite_audit_tbl] table like this:

SELECT * FROM expedite_audit_tbl

Then our execution plan would include a remote query:

query_plan

However, if we said:

SELECT * FROM expedite_audit_tbl WHERE updated_dt > '01/10/2016'

Then SQL Server would know, using our DETERMINISTIC function that it doesn’t have to go out to Azure and get rows because there just wouldn’t be any.

Disable Stretch

To disable Stretch DB, you first want to change the migration direction to INBOUND, so that you get your rows back.

ALTER TABLE [expedite_audit_tbl]

       SET (REMOTE_DATA_ARCHIVE = ON (

              MIGRATION_STATE = INBOUND))

GO

Give it some time to get the rows back and then you can disable the stretch:

ALTER TABLE [expedite_audit_tbl]

       SET (REMOTE_DATA_ARCHIVE = OFF)

GO

Final Thoughts

Stretch DB is relatively simple to configure and implement on your server. It may be a viable and cost effective option to off-load cold data remotely to minimize maintenance costs, storage costs, etc. Your decision to implement Stretch DB will likely be based on Azure pricing, performance, and your overall

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s