Quantcast
Channel: MSSQL Tiger Team
Viewing all 194 articles
Browse latest View live

SQL Server Migration Assistant (SSMA) v7.3 is now available

$
0
0

 

Overview

SQL Server Migration Assistant (SSMA)  for Oracle, MySQL, SAP ASE (formerly SAP Sybase ASE), DB2 and Access lets users convert database schema to Microsoft SQL Server schema, upload the schema, and migrate data to the target SQL Server (see below for supported versions).

What is new?

  • Improved quality and conversion metric with targeted fixes based on customer feedback.
  • SSMA extensibility framework exposed via
    • Export functionality to a SQL Server Data Tools (SSDT) project
      • You can now export schema scripts from SSMA as an SSDT project and use that to make additional schema changes and deploy your database.
        exporttossdt
    • Libraries that can be consumed by SSMA for performing custom conversions
      • You can now construct code that can handle custom syntax conversions and conversions that weren’t previously handled by SSMA
        • Instructions on how to construct a custom converter can be found here.
        • Sample project for conversion can be downloaded here.

For more information see this post.

 

Ajay Jagannathan (@ajaymsft)

Principal Program Manager


SQL Server vNext Management Pack Demo

$
0
0

We recently released CTP2 for SQL Server vNext Management Pack that can be used to monitor SQL Server vNext versions both on Linux and Windows. We are introducing new monitoring modes in the vNext MP. In addition to the standard agent monitoring where you have a SCOM agent on the server you are monitoring, you can also have agentless monitoring and mixed mode monitoring.

We have prepared a short video for you that goes over the new monitoring modes and does a demo for how to set them up. It also talks about available features in CTP2.

You can see our release announcement at Released: Public Preview for SQL Server vNext Management Pack (CTP2) for details about the release.

We are looking forward to hearing your feedback about the vNext MP, the new monitoring modes, and ideas on what areas you would like to see demo’s on.

Released: Data Migration Assistant (DMA) v3.1

$
0
0

Overview

Data Migration Assistant (DMA) enables you to upgrade to a modern data platform by detecting compatibility issues that can impact database functionality on your new version of SQL Server. It recommends performance and reliability improvements for your target environment. It also allows you to not only move your schema, data, but also uncontained objects from your source server to your target server.

DMA replaces all previous versions of SQL Server Upgrade Advisor and should be used for upgrades for most SQL Server versions (see below for supported versions).

What is new with this release?

DMA v3.1 is a minor version update and includes following additions:

  • Improved assessment recommendations for Azure SQL Databases around database collations, use of unsupported system stored procedures, and CLR objects.
  • Added assessment guidance for compatibility levels 130, 120, 110 and 100 migrating to Azure SQL Databases.

For more information see this post.

 

Ajay Jagannathan (@ajaymsft)

Principal Program Manager

Columnstore Index – List of blogs on columnstore index published by SQL Server Product Team

$
0
0

SQL product team has made significant implements in columnstore index functionality, supportability and performance during SQL Server 2016 based on the feedback from customers. SQL Server product team has created a large set of blogs across multiple columnstore index scenarios. This blog consolidates the links as a one stop shop to access these. We will keep updating it as new blogs are added.  If you identify areas of columnstore that needs additional information, please feel free to contact us for suggestions

Columnstore index

 

Columnstore Index Presentations at conferences

 

Columnstore index Performance

Bulk importing data

JSON in Columnstore Index

Real Time Operational Analytics (HTAP)

 

Thanks,

Sunil Agarwal

SQL Server Tiger Team

 Twitter | LinkedIn

Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

Columnstore Index – How to Estimate Compression Savings

$
0
0

SQL product team has made significant improvements in columnstore index functionality, supportability and performance during SQL Server 2016 based on the feedback from customers. Please refer to  List of Blogs  for all blogs published by SQL Tiger Team on columnstore index.

Issue
When SQL Server team released ROW and PAGE compression in SQL Server 2008, customers could invoke sp_estimate_data_compression_savings  stored procedure to estimate the storage savings for ROW and PAGE compression. Note, the compression savings was just an estimate based on sampling a subset of rows from the source table and loading them into a temporary table and then measuring the size of this temporary table before/after the compression. For most cases, the compression savings estimate was good except when the data in the source table was skewed. Most customers found it useful as it was a convenient way to see the storage benefits. However, as some of you have found out, this stored procedure has not been extended to estimate storage savings from columnstore index. This is something we could consider for the future.

Workaround
For now, to estimate compression savings for columnstore index, we recommend the following steps

  1. Create a staging table with identical schema
  2. Load 2 million rows into the staging table. Note, I have chosen 2 million arbitrarily but it needs to be at least 1 million.
  3. Use sp_spaceused to find the size of the table
  4. Now create columnstore index on the table
  5. Measure the storage using sp_spaceused.
  6. Compare the numbers in (3) and (5)

The storage savings for the staging table as computed above will be a good estimate to work with.  Here is a simple example to show case this

use AdventureWorksDW2016CTP3
go
— create a staging table
select * into ccitest_temp from dbo.FactResellerSalesXL where 1=2
go
— load 2 million rows
insert into ccitest_temp select top 2000000 * from dbo.FactResellerSalesXL
— find the spaceused. Note it down
sp_spaceused ‘ccitest_temp’
— Create clustered columnstore index on the staging table
create clustered columnstore index ccitest_temp_cci on ccitest_temp
— find the spaceused in the compressed state and then compare
sp_spaceused ‘ccitest_temp’
— Now you can drop the staging table
drop table ccitest_temp

Nonclustered Columnstore Index (NCCI)
Creating an NCCI does not save storage; in fact, it takes additional storage. If you are interested in finding the size of NCCI, you can follow the same step as above but instead CCI, create an NCCI and the see the increased size.

Thanks,
Sunil Agarwal
SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

 

Easy way to get statistics histogram programmatically

$
0
0

Statistics being the building blocks on which the Query Optimizer reasons to compile a good enough plan to resolve queries, it’s very common that anyone doing query performance troubleshooting needs to use DBCC SHOW_STATISTICS to understand how statistics are being used, and how accurately they represent data distribution.

Let’s see how to use this. Take the following simple query:

USE [AdventureworksDW2016CTP3]
GO
SELECT * FROM FactResellerSales
WHERE OrderDate BETWEEN '20110101' AND '20110606'
GO
This is what we get as an execution plan, we can see how many rows were read and estimated to be read (red arrows), plus the estimated and actual rows after predicate is applied (blue arrows):
image

Let’s say you wanted to understand where the estimation came from, then naturally you will want to look for whatever stats object references that single column (using  a single column predicate for simplicity sake). If you need to programmatically access this data, then usually you would dump DBCC SHOW_STATISTICS … WITH HISTOGRAM to a table, and then use it from there. That is not ideal.

With the latest release of SQL Server 2016 SP1 CU2, we added a new Dynamic Management Function (DMF) sys.dm_db_stats_histogram, which is similar to running the above DBCC statement. Note: this DMF is also available in SQL Server vNext CTP 1.3.

SELECT * FROM sys.dm_db_stats_histogram(OBJECT_ID('[dbo].[FactResellerSales]'), 2)
image
This further completes the story we started with sys.dm_db_stats_properties, which has a similar output to running DBCC SHOW_STATISTICS … WITH STATS_HEADER.
SELECT * FROM sys.dm_db_stats_properties(OBJECT_ID('[dbo].[FactResellerSales]'), 2)
image
But to make it more interesting, here’s one example on how you can leverage these DMFs inline, to get information on which stat and respective histogram steps cover my predicate, in the scope of my table and column:
SELECT ss.name, ss.stats_id, shr.steps, shr.rows,
    shr.rows_sampled, shr.modification_counter, shr.last_updated,
    SUM(sh.range_rows+sh.equal_rows) AS predicate_step_rows
FROM sys.stats ss
INNER JOIN sys.stats_columns sc
    ON ss.stats_id = sc.stats_id AND ss.object_id = sc.object_id
INNER JOIN sys.all_columns ac
    ON ac.column_id = sc.column_id AND ac.object_id = sc.object_id
CROSS APPLY sys.dm_db_stats_properties(ss.object_id, ss.stats_id) shr
CROSS APPLY sys.dm_db_stats_histogram(ss.object_id, ss.stats_id) sh
WHERE ss.[object_id] = OBJECT_ID('FactResellerSales')
    AND ac.name = 'OrderDate'
    AND sh.range_high_key BETWEEN CAST('20110101' AS DATE) AND CAST('20110606' AS DATE)
GROUP BY ss.name, ss.stats_id, shr.steps, shr.rows,
    shr.rows_sampled, shr.modification_counter, shr.last_updated
The output below let’s me see that my predicate matches 3949.79 rows in the affecting histogram steps, which is exactly what I observed in the actual execution plan.
image

Note: because the column range_high_key is a sql_variant data type, you may need to use CAST for proper comparison as seen above.

Pedro Lopes (@sqlpto) – Senior Program Manager

Understanding data security in cloned databases created using DBCC CLONEDATABASE

$
0
0

DBCC CLONEDATABASE feature was first introduced in SQL Server with SQL Server 2014 SP2 and was later added to SQL Server 2016 with SP1. The primary design goal for DBCC CLONEDATABASE which the SQL Product team had in mind is to provide mechanism to create fast, minimally invasive and transaction ally consistent database clones, useful for query tuning. Database schema, statistics and query store are commonly required data for query tuning or troubleshooting sub optimal query plans and plan regressions. To make database cloning fast, minimally invasive and consistent, the copying of metadata objects is performed at the storage engine layer by taking a transient snapshot of the production database. Database cloning have proved to be significantly useful in reducing the troubleshooting time for dbas, developers and Microsoft CSS by extracting only the data required for troubleshooting from the production databases. In addition, cloning a database also help minimize the risk of providing access to production databases or sharing business data directly with developers or support teams. Although user tables & indexes data is not copied in the cloned database, user data is still available and exposed in cloned database via statistics and query store. As the primary scenario for dbcc clonedatabase is troubleshooting, the default database clone contains the copy of schema, statistics and query store data from source database. Query store data is contained only in SQL Server 2016 instances provided if query store was turned ON in source database prior to running DBCC CLONEDATABASE.

Note: To copy the latest runtime statistics as part of Query Store, you need to execute sp_query_store_flush_db to flush the runtime statistics to the query store before executing DBCC CLONEDATABASE.

–Default database clone with target database clone contains the schema, statistics and query store data copy from source database
DBCC CLONEDATABASE (source_database_name, target_database_name)

Another scenario where cloned databases are useful is in source depot and schema compare of production database schema with dev schema. In this scenario, only copy of production database schema is desired in the database clone to compare it with that in dev environment. For some businesses, especially in healthcare, finance, data privacy is critical and no user data (including statistics and query store) can be shared with developers, vendors or support teams. For this scenario, the following syntax introduced in SQL Server 2016 SP1 can be used to allow users to create schema only database clones with no user data.

— Creates Schema only database clone with no user data
DBCC CLONEDATABASE (source_database_name, target_database_name) WITH NO_STATISTICS, NO_QUERYSTORE

There are also scenarios where DBAs are required to share the schema and statistics database clones with developers, vendors or support teams for troubleshooting but some tables or columns within the source database contains business sensitive data (for e.g. SSN or creditcard columns) which cannot be shared with anyone. Currently, DBCC CLONEDATABASE doesn’t support selectively including or excluding objects from the source database in the cloned database. If your requirement falls in this category, you can use any of the following techniques described below to protect data in cloned databases before it is shared with anyone.

Drop statistics on tables or columns containing sensitive business data in database clone

I have uploaded a TSQL stored procedure script in our Tiger Github repository which can be used to drop all the statistics from the specified table or column on the table. You can download and run the script against the cloned database. The stored procedure needs to be executed for each table or column containing sensitive data whose stats you would like to purge. The script purges the user as well as index statistics including indexes on primary constraints, however if there are any foreign key references, it should be dropped manually.

If you would like to enhance or improvise the script, feel free to send a pull request on github for the benefit of SQL Community

— Create a database clone with no query store
DBCC CLONEDATABASE(‘AdventureWorks2014’,‘AdventureWorks2014_Clone’) WITH NO_QUERYSTORE

— set the cloned database in read write mode
ALTER DATABASE AdventureWorks2014_Clone SET READ_WRITE

— create the stored procedure usp_DropTableColStatistics in cloned database
USE AdventureWorks2014_Clone
GO

create procedure usp_DropTableColStatistics — copy script from here

— Drops all the statistics on column CardNumber on table Sales.CreditCard
exec usp_DropTableColStatistics ‘Sales.Creditcard’,‘CardNumber’

— iterate again for other tables
— If no column name is specified and only table name is specified, all the statistics on that table is dropped
— Drop all the statistics on table Sales.CreditCard

exec usp_DropTableColStatistics ‘Sales.Creditcard’

— Backup database clone with compression
BACKUP DATABASE AdventureWorks2014_Clone TO DISK = ‘c:\backup\clonedb.bak’ WITH COMPRESSION

— DROP CLONED Database post backup
DROP DATABASE DATABASE AdventureWorks2014_Clone

Note: The newly generated database generated from DBCC CLONEDATABASE isn’t supported to be used as a production database and is primarily intended for troubleshooting and diagnostic purposes. We recommend detaching the cloned database after the database is created.

Column-level Encryption

If the columns in the source database is encrypted by using column level encryption, the statistics inside the source database are also encrypted which also ensures statistics copied in cloned database is encrypted. The following script validates that behavior

USE [AdventureWorks2014]
GO

CREATE MASTER KEY ENCRYPTION BY PASSWORD=‘$ecretP@ssw0rd’;

CREATE SYMMETRIC KEY TestSymKey
WITH ALGORITHM = TRIPLE_DES
ENCRYPTION BY PASSWORD = ‘$ecretP@ssw0rd’;

OPEN SYMMETRIC KEY TestSymKey DECRYPTION BY PASSWORD = ‘$ecretP@ssw0rd’;

— added a column to encrypt CreditCardNumber
ALTER TABLE [Sales].[CreditCard] ADD CreditCardNumber varbinary(max)

— Updating the new column and encrypting it by symmetric key
UPDATE [Sales].[CreditCard] SET CreditCardNumber = ENCRYPTBYKEY(KEY_GUID(‘TestSymKey’),CardNumber)

–creating statistics on encrypted columns
CREATE STATISTICS encryptedcreditcardno ON [Sales].[CreditCard](CreditCardNumber)

— Validate if statistics are encrypted
DBCC SHOW_STATISTICS(“Sales.CreditCard”,encryptedcreditcardno)

— Creating a database clone with no query store
DBCC CLONEDATABASE(‘AdventureWorks2014’,‘AdventureWorks2014_Clone’) WITH NO_QUERYSTORE

USE [AdventureWorks2014_Clone]
GO
DBCC SHOW_STATISTICS(“Sales.CreditCard”,encryptedcreditcardno)

In addition to encrypted columns and statistics, if there are other encrypted objects like stored procedure, function etc in the source database, it will be copied in the database clone but the execution of the stored procedure will fail since encrypted objects is not supported in database clones.

Always Encrypted Columns

DBCC CLONEDATABASE currently doesn’t support Always encrypted objects. Thus, if the columns in the source database is encrypted using Always Encrypted encryption, DBCC CLONEDATABASE will exclude those objects present in the source database.

Note: There is a known issue where if the source database contains always encrypted objects, running DBCC CLONEDATABASE against the database results into a AV causing the client session to terminate. We will be fixing the issue in upcoming CUs for SQL Server 2016. The fix for the issue will avoid AV while creating database clone. by excluding the metadata and data for always encrypted objects.

Transparent Data Encryption (TDE)

If you use TDE to encrypt data at rest on the source database, DBCC CLONEDATABASE supports cloning of the source database but the cloned database is not encrypted by TDE. Thus, the backup of the cloned database will be unencrypted. If it is desired to encrypt and protect cloned database backup, you can enable TDE on cloned database before it is backed up as shown below

USE master;
GO

CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘<UseStrongPasswordHere>’;
go

CREATE CERTIFICATE MyServerCert WITH SUBJECT = ‘My DEK Certificate’;
go

BACKUP CERTIFICATE MyServerCert TO FILE = ‘MyServerCert’
WITH PRIVATE KEY
(
FILE = ‘SQLPrivateKeyFile’,
ENCRYPTION BY PASSWORD = ‘*rt@40(FL&dasl1’
);
GO

— Create a database clone with no query store

DBCC CLONEDATABASE(‘AdventureWorks2014’,‘AdventureWorks2014_Clone’) WITH NO_QUERYSTORE

— set the cloned database in read write mode
ALTER DATABASE AdventureWorks2014_Clone SET READ_WRITE

USE AdventureWorks2014_Clone;
GO

CREATE DATABASE ENCRYPTION KEY
WITH ALGORITHM = AES_128 ENCRYPTION BY SERVER CERTIFICATE MyServerCert;
GO

ALTER DATABASE AdventureWorks2014_Clone SET ENCRYPTION ON;
GO

BACKUP DATABASE AdventureWorks2014_Clone TO DISK = ‘c:\backup\clonedb.bak’ WITH STATS=5
GO

— DROP DATABASE CLONE 
DROP DATABASE DATABASE AdventureWorks2014_Clone

Hope the above article helps you understand data security and provide guidance on protecting user data with databases cloned with DBCC CLONEDATABASE.

 

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam


Considerations when tuning your queries with columnstore indexes on clone databases

$
0
0

As discussed in my previous blog post, one of the primary scenario for DBCC CLONEDATABASE is to assist dbas, developers and support teams in troubleshooting sub-optimal query plans by creating fast, minimally invasive and transaction ally consistent database clones of their production databases. The database clone created using DBCC CLONEDATABASE contains the copy of schema and statistics which allows the optimizer to generate same query plan as observed on the production database without the actual data. While this is true for queries involving traditional rowstore indexes, there are some special considerations for queries involving Columnstore indexes due to difference in the way statistics are generated for these indexes. In this blog post, I will explain you this behavior and in the end, provide you with the scripts required to handle this scenario to generate same query plan in database clone as observed on the production databases.

Unlike traditional Btree indexes, when a columnstore index is created, there is no index statistics created on the columns of the columnstore indexes. However, there is an empty stats object created with the same name as columnstore index and an entry is added to sys.stats at the time of index creation. The stats object is populated on the fly when a query is executed against the columnstore index or when executing DBCC SHOW_STATISTICS against the columnstore index, but the columnstore index statistics aren’t persisted in the storage. The index statistics is different from the auto created statistics on the individual columns of columnstore indexes which is generated on the fly and persisted in the statistics object. Since the index statistics is
not persisted in storage, the clonedatabase will not contain those statistics leading to inaccurate stats and different query plans when same query has run against database clone as opposed to production database.

Let me illustrate this behavior with a sample script below

set nocount on
go
create database db1
go
use db1
go
create table t1(a varchar(8000));
go
insert t1 values(replicate(newid(), 200));
go 1000
create clustered Columnstore index cci on t1
go

— Initial stats object with name cci created at the time of index creation
select * from sys.stats where object_id=OBJECT_ID(‘t1’)
go

— Adding more 3000 rows to the table 
insert t1 values(replicate(newid(), 200));
go 3000

— Query with predicate to generate auto created statistics on the column
select a from t1 where a = ‘aaaaaaaaaaaaaa’
GO

–Verify if auto created statistics got added to the table 
— Here you will see 2 statistics objects. 1 with the index name and other auto created statistics on column a due to the earlier query

select * from sys.stats where object_id=OBJECT_ID(‘t1’)
go

— Turn on Actual Exec Plan and it accurately displays 4000 rows 
— before cloning
select a from t1
GO


— Run DBCC SHOW_STATISTICS against the index statistics of columnstore index and it accurately reflects 4000 rows & 6000 Data pages (generated on the fly)

dbcc show_statistics(‘t1’,‘cci’) with stats_stream
go


dbcc clonedatabase(‘db1’,‘db2’)
go

use db2
go

— Both the stats are copied in clone 
select * from sys.stats where object_id=OBJECT_ID(‘t1’)
go

/* If you turn on Actual Execution Plan only 1000 rows are displayed in statistics clone which is the same as the number of
rows at the time of creation of cci but never updated.*/
select a from db2.dbo.t1
GO


— Run DBCC SHOW_STATISTICS against the index statistics and it reflects 1000 rows & 0 Data pages which was at the time of index generation (not updated)
dbcc show_statistics(‘t1′,’cci’) with stats_stream
go


— DROP DATABASE after tests

use master
go
DROP DATABASE db1
DROP DATABASE db2

This is by design behavior of Columnstore indexes in SQL Server. To handle this behavior and to be able to accurately capture the columnstore index statistics in the clone database, we have created and shared a script in our Tiger Github repository, which can be used to update the columnstore index statistics on the source database before running DBCC CLONEDATABASE.

The script usp_update_CI_stats_before_cloning.sql should be run on source production database which needs to be cloned before running DBCC CLONEDATABASE. The script basically performs the following

  1. Runs DBCC SHOW_STATISTICS WITH STATS_STREAM against all the columnstore indexes on the source database to capture the up to date stats blob generated on the fly.
  2. Updates the persisted stats object with the most recent stats blob captured in step 1.

Note: The above is script is required to be run only if you would like to clone columnstore index statistics provided the query plan on the database clone is different from the query plan on source database. This script is not required to be run otherwise since database engine is designed to generate and handle the non-persisted columnstore index statistics for efficient query plans.

 

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

 


CDC functionality may break after upgrading to the latest CU for SQL Server 2012, 2014 and 2016

$
0
0

Microsoft SQL Server Product team has identified a potential issue with the latest Servicing Releases for SQL 2012, 2014 and 2016, where in Change Data Capture functionality might break if

  • The databases enabled for CDC are part of Always On availability group, OR
  • SQL Server replication components are not installed on the server

Details

Microsoft introduced a CDC related fix in below mentioned releases (see section, affected releases): KB 3030352. As part of the fix, a new column was introduced to the change tables to correctly order the operations within the change table. This schema change is applied to the change tables through the sp_vupgrade_replication stored procedure, which is executed during the CU upgrade.

The following scenarios will cause the change tables to be NOT updated after CU upgrade.

  • If a CDC enabled database is part of an Always On availability group and users follow the general recommendations of upgrading the secondary replica first, sp_vupgrade_replication will not run in such databases during upgrade because secondary replica databases are not in read/write mode. This is a known behavior and is by design.
  • If the server does not have replication component installed, sp_vupgrade_replication will not be executed at CU upgrade time. Microsoft is working on a potential fix for this situation.

Additionally, this issue may also impact SSIS packages which use the CDC flow components (CDC Source component) to extract changes from the CDC enabled database. Microsoft SQL Server product team is currently investigating the impact on such packages, and will be updating the blog post with the findings and potential fix or workaround.

Workaround

  • As a recommended resolution for the first scenario, users can perform either of the following:
    • After a secondary replica is upgraded, perform a failover to make it the primary, and run sp_vupgrade_replication.
    • Disable automatic failover and perform upgrade at the primary replica. If automatic failover is needed, it can be re-enabled after upgrade. Please note that this approach will result in database unavailability during upgrade.
  • To work around the second scenario, users can run “sp_cdc_vupgrade” or “sp_vupgrade_replication” against the database(s) enabled for CDC after the upgrade.

Affected Releases

The following SQL Server servicing releases can cause the CDC functionality to break.

  • SQL 2012 SP3 CU8
  • SQL 2014 SP1 CU10
  • SQL 2014 SP2 CU4
  • SQL 2016 RTM CU5
  • SQL 2016 SP1 CU2

DEA 2.0 Technical preview: Release Overview – Database Experimentation Assistant

$
0
0

Overview

Database Experimentation Assistant (DEA) is a new A/B testing solution for SQL Server upgrades. It will assist in evaluating a targeted version of SQL for a given workload. Customers who are upgrading from previous SQL Server versions (SQL Server 2005 and above) to any new version of the SQL Server will be able to use these analysis metrics provided, such as queries that have compatibility errors, degraded queries, query plans, and other workload comparison data, to help them build higher confidence, making it a successful upgrade experience.

What is new?

DEA 2.0 is a major version update and includes the following improvements:

  • Bundled installation of DEA dependencies: Installation is simplified by bundling all dependencies (barring R-Interop and CRAN) with DEA installer.  Note that DReplay set up is assumed to be available to run replay.
  • Support for multiple captures and replay from the UI: DEA UI now supports ability to start multiple captures and replay. Please refer to How to capture workload using DEA for details.
  • Simplified replay through DEA UI: Number of steps required to start a replay is reduced from 3 to 1. DEA will also show status from DReplay controller as well as all the clients. How to replay workload using DEA for details
  • Revamped user interface for analysis: this version includes more intuitive UI for tool and especially analysis reports.
  • Bug fixes from DEA 1.0: Many customer reported bugs are fixed as part of this release. This includes fix for error while capturing in 2005, errors seen while replay and analysis.
  • Feedback UI: Customers can now submit feedback through a simple UI in DEA.
 

Other documents/tutorials?

The following documents provides step by step guides to leverage DEA 2.0 for workload comparison:

Installation

You can install from Microsoft Download Center. Run DatabaseExperimentationAssistant.exe to install Database Experimentation Assistant tool.

Existing features

Database Experimentation Assistant (DEA) v1.0

Supported sources and target versions

Source: SQL Server 2005 and above

Target: SQL Server 2005 and above

Analysis : SQL Server 2008 and above

 

Ajay Jagannathan (@ajaymsft)

Principal Program Manager

Troubleshooting High HADR_SYNC_COMMIT wait type with Always On Availability Groups

$
0
0

HADR_SYNC_COMMIT indicates the time between when a transaction ready to commit in the primary replica, and all secondary synchronous-commit replicas have acknowledged the hardening of the transaction commit LSN in an AG. It means a transaction in the primary replica cannot be committed, until the primary replica received greater hardened LSNs from all secondary synchronous-commit replicas. If transactions in the primary replica are slower than usual, and HADR_SYNC_COMMIT is unusually long, it means there is some performance issue in at least one Primary-Secondary replica data movement flow, or at least one secondary replica is slow in log hardening.

This blog post drills down on all related monitoring metrics including performance counters and XEvents, and provides guidelines on how to utilize them to troubleshoot the root cause of the performance issue.

All the metrics explained below are listed as per the following log block movement sequence:

1.      A transaction is initialized in a primary replica

2.      Primary replica capture transaction logs and send to a secondary replica

3.      The secondary replica receives and harden log block and eventually send new hardened lsn to Primary.

Availability Group Performance Counters

Some of the performance counters included below are applicable to both the primary and secondary, and hence are included twice in the list below.

Perf Counter AG Replica SQL release Explanation and Usage
Log Bytes Flushed/sec Primary SQL 2012+ Total number of log bytes flushed per second in the primary replica. It represents the volume of logs that are available to be captured in primary replica and sent to secondary replica(s).
Log Pool LogWriter Pushes/sec Primary SQL 2016+ Total number of log bytes are captured by LogWriter mode. LogWriter mode was introduced in SQL Server 2016 and it directly captures logs from memory log cache instead of log file. When LogWriter mode is on, it means there is almost no lag between log flush and log capture.
Bytes Sent to Replica/sec Primary SQL 2012+ Total number of bytes sent to the internal message Queue of a single secondary replica (for both replica and database level data). The internal in-memory message queue which holds the captured and formatted data that will be sent to the targeted secondary replica. By default, the data in the internal message queue is compressed for both sync and async secondary replicas in SQL12 and SQL14. But only for an async secondary replica from SQL16. When data compression is enabled, the data volume of this perf counter is less than the corresponding value in Log Bytes Flushed/sec.
Flow Control/sec Primary SQL 2012+ Number of flow control initiated in the last second. AG has internal throttling gates to control data flow. When the number of sequential messages that has not be acknowledged by secondary replica exceeds the gate value, data flow will be paused until receiving more acknowledge messages from secondary replica. More details can be found in “Flow Control Gates” section in https://msdn.microsoft.com/en-us/library/dn135338(v=sql.120).aspx with one correction: Database level gate values are 1792 for x64 and 256 for x86 environments.
Flow Control Time (ms/sec) Primary SQL 2012+ Time in milliseconds messages waited on flow control in the last second.
Bytes Sent to Transport/sec Primary SQL 2012+ Total bytes sent to transport (UCS) for the availability replica. It represents the volume of data dequeued for XmitQueue and sent to transport layer. Its values should be very close to “Bytes Sent to Replica/sec” with a very slight lag.
Bytes Received from Replica/sec Secondary SQL 2012+ Total bytes received from the primary replica. It represents the volume of data received from transport (UCS) before any processing in secondary replica. Its values should be very close to “Bytes Sent to Replica/sec” on primary, with a lag influenced by the current network latency between this primary-secondary replica pair.
Log Bytes Flushed/sec Secondary SQL 2012+ Total number of log bytes flushed per second in the secondary replica. When secondary receives new log packages from primary, it decompresses and hardens them before sending a progress message to primary for its new hardened lsn. The trend of this perf counter should align with the same perf counter in primary replica with some lags.
Bytes Sent to Replica/sec Secondary SQL 2012+ Total number of bytes sent to the XmitQueue for delivering to the primary replica. They are all AG control messages, and data volume is very low.
Flow Control/sec Secondary SQL 2012+ Number of flow control initiated in the last second. Although secondary replica only sends AG control messages to primary, it may hit transport level flow control when it has hundreds of databases, and primary replica cannot send acknowledge message in time.
Flow Control Time (ms/sec) Secondary SQL 2012+ Time in milliseconds messages waited on flow control in the last second.
Bytes Sent to Transport/sec Secondary SQL 2012+ Total bytes sent to transport (UCS) for the availability replica.
Bytes Received from Replica/sec Primary SQL 2012+ Total bytes received from the secondary replica.

In Perfmon, except “Log Bytes Flushed/sec” and “Log Pool LogWriter Pushes/sec” which is in SQLServer:Databases object, all other counters are in the SQLServer:Availability Replica object.

Network Performance Counters

There are a few performance counters in “Network Interface” object can be used for network performance monitoring:

Perf Counter Description
Bytes Received/sec Shows the rate at which bytes are received over each network adapter. The counted bytes include framing characters. Bytes Received/sec is a subset of Network Interface\Bytes Total/sec.
Bytes Sent/sec Shows the rate at which bytes are sent over each network adapter. The counted bytes include framing characters. Bytes Sent/sec is a subset of Network Interface\Bytes Total/sec.
Bytes Total/sec Shows the rate at which bytes are sent and received on the network interface, including framing characters. Bytes Total/sec is the sum of the values of Network Interface\Bytes Received/sec and Network Interface\ Bytes Sent/sec.
Current Bandwidth Shows an estimate of the current bandwidth of the network interface in bits per second (BPS). For interfaces that do not vary in bandwidth or for those where no accurate estimation can be made, this value is the nominal bandwidth.

When a host machine has a network dedicated for SQL, the network “Bytes Sent/sec” should share the same trend as AG “Bytes Sent to Transport/sec” with slightly larger values in the primary replica. Same similarity is observed between network counter “Bytes Received/sec” and AG counter “Bytes Received from Replica/sec” on the secondary replica.

Please note that “Current Bandwidth” value is in BPS (bit per second). When comparing it to the “Bytes Total/sec”, the “Current Bandwidth” value needs to be divided by 8 first. When “Bytes Total/sec” is consistently close or equal to (“Current Bandwidth”/8), the network bandwidth can be identified as the performance bottleneck.

Availability Group Extended Events

Although performance counters can present the overall performance in each AG data movement stage, they are not sufficient to find out which stage is slow when there is a performance issue because the data flow in AG is like a closed-loop system, the slowest node decides the final throughput of the system. When trying to study available performance counters after a performance issue has started, all perf counter values may have already been slowed down.

It will be useful to capture extended events and correlated them by common field(s) to track a single block log movement in the whole data flow. It will provide details for identifying which stage took most of time and narrowing down the root cause analysis scope.

Below is a list of extended events that can be evaluated for investigating performance issues related to high HADR_SYNC_COMMIT.

Event Name Replica Description Symptom Key & Value Correlation Key(s)
log_flush_start Primary Occurs when asynchronous log write starts log_block_id,

database_id

log_flush_complete Primary Occurs when log write complete. It can happen after hadr_capture_log_block because of its asynchronous nature of writes. log_block_id,

database_id

hadr_log_block_compression Primary Occurs when log is ready to be captured and log compression is enabled. (If log compression is not enabled, this XEvent is not logged in SQL12 and SQL14, but it is always logged from SQL16. For SQL16, check Boolean value of property is_compressed.) log_block_id,

database_id

hadr_capture_log_block Primary Occurs right after primary has captured a log block. mode=1 log_block_id, database_replica_id
hadr_capture_log_block Primary Occurs right before primary enqueues the captured log block to an internal message Queue of DbMgrPartner. A DbMgrPartner maps a database in the remote replica. No processing operation between mode 1 and 2. mode=2 log_block_id,

database_replica_id

hadr_capture_log_block Primary Occurs after dequeuing a log block from the internal message Queue of DbMgrPartner and before sending to transport (UCS) mode=3 log_block_id,

database_replica_id

hadr_capture_log_block Primary Occurs after the dequeued message reaches Replica layer and before sending to transport (UCS). Only message routing actions between mode 3 and 4. mode=4 log_block_id,

database_replica_id,

availability_replica_id

hadr_transport_receive_log_block_message Secondary Occurs when receiving new log block message from transport (UCS) mode=1 log_block_id,

database_replica_id

hadr_transport_receive_log_block_message Secondary Occurs after enqueuing a new RouteMessageTask to DbMgrPartner for this new received log block. No process operations yet. mode=2 log_block_id,

database_replica_id

hadr_log_block_decompression Secondary Occurs after decompressing log block buffer. Boolean value of “is_compressed” means the incoming log block buffer is compressed or not. log_block_id,

database_id

log_flush_start Secondary Occurs when asynchronous log write starts log_block_id,

database_id

log_flush_complete Secondary Occurs when log write complete log_block_id,

database_id

hadr_apply_log_block Secondary Occurs right after log block is flushed and new redo target LSN is calculated in secondary log_block_id,

database_replica_id

hadr_send_harden_lsn_message Secondary Occurs when SyncLogProgressMsg with new hardened lsn is constructed and before pushing this message to the internal message Queue of the database’s DbMgrPartner. At this point, the previous received log block has been flushed to disk, but the current log block may not. mode=1 log_block_id (*Need to choose the immediate next log_block_id of this database),

hadr_database_id

hadr_send_harden_lsn_message Secondary Occurs when SyncLogProgressMsg is dequeued from the internal message Queue and before sending to transport (UCS) mode=2 log_block_id (*Need to choose the immediate next log_block_id of this database),

hadr_database_id

hadr_send_harden_lsn_message Secondary Occurs after the dequeued SyncLogProgressMsg reaches Replica layer and before sending to transport (UCS). No processing operation between mode 2 and 3. mode=3 log_block_id (*Need to choose the immediate next log_block_id of this database),

hadr_database_id

hadr_receive_harden_lsn_message Primary Occurs when receiving new SyncLogProgressMsg which contains new hardened lsn of database in secondary replica from transport (UCS) mode=1 log_block_id (*Need to choose the immediate next log_block_id of this database),

database_replica_id

hadr_receive_harden_lsn_message Primary Occurs after enqueuing a new RouteSyncProgressMessageTask to DbMgrPartner for this new received SyncLogProgressMsg. No processing operations yet. mode=2 log_block_id (*Need to choose the immediate next log_block_id of this database),

database_replica_id

hadr_db_commit_mgr_harden Primary Occurs after minimal hardened lsn among primary replica and all synchronous-commit secondary replicas exceeds the lsn of the expected log block. wait_log_block,

database_id,

ag_database_id

hadr_db_commit_mgr_harden_still_waiting Primary Occurs when a committed lsn in primary has not been notified for the hardening from all synchronous-commit secondary replicas for more than 2 seconds. This Xevent will be logged every 2 seconds until logging hadr_db_commit_mgr_harden.

In normal cases, the harden wait time is in tens of millisecond range, and this XEvent is not logged.

wait_log_block,

database_id,

ag_database_id

*Regarding the comment “Need to choose the immediate next log_block_id of this database” for hadr_send_harden_lsn_message and hadr_receive_harden_lsn_message, it is because the ChangeApply side always returns a hardened lsn which is one block less than the currently received log block lsn. To ensure the primary replica received the expected hardened LSN from synchronous-commit secondary replica, the immediate next log block lsn is needed to trace the complete end-to-end logic flow for eventually reaching hadr_db_commit_mgr_harden.

To trace these, two XEvent queries are needed:

1.      Get the immediate next log block lsn which is greater than the value in property “wait_log_block” of Xevent hadr_db_commit_mgr_harden

Query in secondary replica (because that the max size of a log block is 60KB, and 1 log block unit represents 512 bytes, the [next log_block_id] will not be 120 more than the current log_block_id)

log_block_id > [current log_block_id] && log_block_id <= [current log_block_id] + 120

pick the first log_block_id from the query result as the [next log_block_id]

2.      Use current log block and next log blocks to get the end-to-end Xevent flow of a log block that contains the expected commit lsn.

Query in primary replica:

(name != hadr_receive_harden_lsn_message && log_block_id = [current log_block_id]) ||

(name == hadr_receive_harden_lsn_message && log_block_id = [next log_block_id]) ||

(name == hadr_db_commit_mgr_harden && wait_log_block = [current log_block_id])

Query in secondary replica:

(name != hadr_send_harden_lsn_message && log_block_id = [current log_block_id]) ||

(name == hadr_send_harden_lsn_message && log_block_id = [next log_block_id])

Screenshot for a sample XEvent flow in Primary replica (highlighted records):

        log_block_id (3925600606568) is got from property “wait_log_block” of Xevent hadr_db_commit_mgr_harden

Screenshot for a sample XEvent flow in Secondary replica (highlighted records):

Based on the “wait_log_block” value in a hadr_db_commit_mgr_harden, the full timeline of this log_block data movement sequence can be tracked back following above table by combining the captured Xevents from primary replica and all synchronous-commit secondary replicas.

Optionally, the “database_id” and “ag_database_id” (which is database replica id in AG) can be combined with “wait_log_block” for the related Xevent logs lookup.

In addition, the mapping among database_id, ag_database_id and replica_id can be found from the output of sys.dm_hadr_database_replica_states.

Troubleshooting

There are multiple possible reasons to cause HADR_SYNC_COMMIT to be unusually long. With above performance counters and XEvents, it is possible to narrow down the root cause. If none of the resources has performance issue, please involve Microsoft Customer Support Team for further investigation

Slow Disk IO

With AG Xevents, the duration between log_flush_start and log_flush_complete for the same log_block_id in the secondary replica should be long when there is a disk IO issue. Another way to look for this value is to check value (in millisecond) of property “duration” in log_flush_complete.

When a secondary replica of an AG has compatible hardware configuration as the primary replica (which is recommended in MSDN – section “Recommendations for Computers That Host Availability Replicas (Windows System)” in https://msdn.microsoft.com/en-us/library/ff878487.aspx), and the host computer of the secondary replica is dedicated to this AG, it is not expected to see secondary replica has disk IO issue before the primary replica hit it. One exception is that the secondary replica enables read-only access, and it receives IO intensive reporting workloads. Frequent log backup and copy backup can be another potential cause to be looked at.

High CPU

When a secondary replica of an AG has compatible hardware configuration as the primary replica, and its host computer is for this AG only, it is not expected to see Secondary replica hit CPU issue. But it is possible when there is a heavy reporting workload on a read-only enabled secondary replica.

In addition, log compression in primary replica and log decompression in secondary replica can be CPU heavy operations. If compression is enabled (based on XEvent hadr_log_block_compression is logged in primary, or property is_compressed of XEvent hadr_log_block_decompression is true or not), it can be a possible cause for high CPU in both primary and secondary replicas.  When hadr_log_block_compression is logged in primary replica with is_compressed=true, if the duration between log_flush_start and hadr_log_block_compression is long, while DISK IO is still fast, data compression can be identified as the cause. Similarly, the duration between hadr_transport_receive_log_block_message (mode=2) and hadr_log_block_decompression can be measured for the same detection.

When compression/decompression is identified to be the root cause of High CPU, for SQL Server 2012 and SQL Server 2014, disabling compression with TF1462 is a workaround option by sacrificing some network efficiency (bigger data packages to network). For SQL Server 2016 and above, disabling parallel compression (TF 9591) is another option.

Network Issues

After rooting out the disk IO and high CPU as the root cause of long HADR_SYNC_COMMIT, network performance needs to be checked.

With the “Network Performance Counters” mentioned above, the first thing is to check if “Bytes Total/sec” is close to “Current Bandwidth”/8 for the related Network Adapter in primary replica and all synchronous-commit secondary replicas. When any replicas show this situation, it means this network adapter has reached its network throughput capacity.

To check if the network throughput is mainly from AG data movement, the values of Network perf count “Bytes Sent/sec” and AG perf counter “Bytes Sent to Transport/sec” can be compared in Primary replica for a specific secondary replica as the sync partner, and the values of network perf counter “Bytes Received/sec” and AG perf counter “Bytes Received from Replica/sec” can be compared in Secondary replica. Their values and trend should be very close to each other.

XEvents can be applied here to examine the network latency. For the same log_block_id, the duration between hadr_capture_log_block (mode=4 in primary with the matching availability_replica_id to the secondary replica) and hadr_transport_receive_log_block_message (mode=1 in secondary) means the network latency for a log block to transport from primary to secondary replica. The duration between hadr_send_harden_lsn_message (mode=3 in secondary) and hadr_receive_harden_lsn_message (mode=1 in primary) represents the network latency for a log block lsn hardening message to move from secondary to primary.

Content Creator – Dong Cao, Principal Software Engineer, Data Platform Group, Microsoft

SQL Saturday 613: Tiger Team sessions

$
0
0

 

Tomorrow there’s an exciting SQL Saturday event at the Microsoft Campus in Redmond, where the community gets together to share insights, and meet new and not so new friends. The Tiger Team is excited to be a part of the event, and will be presenting a few sessions.

At 9:45AM PST, Sunil Agarwal will be talking about “ColumnStore Index in SQL Server 2016 and Azure SQL Database”.

Session Abstract: The updateable clustered columnstore in Microsoft SQL Server 2016 offers a leading solution for your data warehouse workload with order of magnitude better data compression and query performance over traditional B-Tree based schemas. This session assumes that you understand columnstore technology basics. We will do a deep dive into (a) strategies to load data leveraging minimal logging with concurrency (b) identifying/solving performance issues with queries (c) real-world best practices of keeping columnstore humming along.

At 2:45PM PST, join Parikshit Savjani and I for a fast-paced, demo-filled session “Enhancements that will make your SQL database engine roar!” where we will be talking about the latest enhancements on scale, performance and diagnostics for the SQL Server Database Engine.

Session Abstract: Do you want to make your SQL Server engine roar and make it one of the fastest in the planet? If so, then come to this session to learn about new enhancements and advanced tuning and diagnostic options available in SQL Server 2016 database engine. As hardware capabilities increase rapidly, knowing when to use these enhancements and configuration options is extremely important to scale SQL Server for the required application workload throughput rates. In the last couple of years, SQL Tiger Team has worked closely with many customers running Tier-1 mission critical workloads to scale their SQL Server with several configuration parameters and application patterns. In this session, you will hear directly from experienced members of the Microsoft SQL Server Tiger Team, about these specific enhancements, configurations, and approaches for scaling SQL Server on the largest hardware systems available in the planet.

At the same time, our colleague Amit Banerjee will be presenting on “Building 1 million predictions per second using SQL-R”.

Looking forward to meet everyone tomorrow!

FILESTREAM issues with SQL Server on Windows 10 creators update

$
0
0

If you have SQL Server installed on Windows 10 and if you have enabled the Filestream feature at the instance level and created databases that have filestream containers, after applying the Windows 10 creators update [RS2] you will notice that the filestream feature does not work and you encounter unexpected errors.

In Windows 10 creators update, a change was made in the IO Manager code that deals with NtCreateFile to tighten the ACL checks for specific file create disposition. SQL Server engine uses this API to connect to Filestream filesystem filter driver called RsFx driver. The change in NtCreateFile changed the behavior of FILE_OPEN_IF (create a new file if needed) function to prevent users or services using this API to write on the storage if the calling token doesn’t have write permissions on the storage. Because of this change, SQL process (which runs as a service with a virtual service account, e.g., NT Service\MSSQL$SQL2016), if it doesn’t have administrator permissions, cannot open a handle to the RsFx driver and fails with STATUS_ACCESS_DENIED error. This behavioral change leads to unexpected filestream errors and causes SQL Server filestream database to fail to start if SQL Service account didn’t have write permissions on the filestream store.

Following are some of the error messages related to filestream, you may encounter when

  • Windows 10 creators update is applied on an existing installation of SQL Server using filestream feature OR
  • New installation of SQL Server with database using filestream created on Windows 10 creators update build 15048.

You restart SQL Server or attempt to bring the database online. You will notice the database does not come online and end up in [Recovery Pending] state.

SQL Server error log will show the following information:

2017-04-14 10:39:20.69 spid26s [INFO] HkHostDbCtxt::Initialize(): Database ID: [10] ‘Archive’. XTP Engine version is 0.0.
2017-04-14 10:39:20.69 spid26s Starting up database ‘Archive’.
2017-04-14 10:39:21.25 spid26s [INFO] HkHostDbCtxt::Initialize(): Database ID: [10] ‘Archive’. XTP Engine version is 0.0.
2017-04-14 10:39:21.57 spid26s Error: 5591, Severity: 16, State: 5.
2017-04-14 10:39:21.57 spid26s FILESTREAM feature is disabled.
2017-04-14 10:39:21.57 spid26s Error: 5105, Severity: 16, State: 14.
2017-04-14 10:39:21.57 spid26s A file activation error occurred. The physical file name ‘C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016\MSSQL\DATA\Archive_fs’ may be incorrect. Diagnose and correct additional errors, and retry the operation.

When you attempt to create a database with FileStream container, you will encounter the following error:

CREATE DATABASE Archive
ON PRIMARY ( NAME = Arch1,
FILENAME = ‘archdat1.mdf’),
FILEGROUP FileStreamGroup1 CONTAINS
FILESTREAM( NAME = Arch3,
FILENAME = ‘filestream1’)
LOG ON ( NAME = Archlog1,
FILENAME = ‘archlog1.ldf’)
GO

Msg 5591, Level 16, State 1, Line 1
FILESTREAM feature is disabled.

 

Even though the error message indicates the feature is disabled, when you look in the service properties in SQL Server Configuration Manager, you will notice the following:

 

When you attempt to restore a backup that contains filestream containers, you will encounter the following message:

When SQL Server starts you will notice the following messages in the SQL Server errorlog:

2017-04-14 10:25:22.34 Server Microsoft SQL Server 2016 (RTM-GDR) (KB3210111) – 13.0.1728.2 (X64)
Dec 13 2016 04:40:28
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows 10 Enterprise 6.3 <X64> (Build 15063: ) (Hypervisor)
2017-04-14 10:25:22.35 Server The service account is ‘NT Service\MSSQL$SQL2016’. This is an informational message; no user action is required.
<{7715B5FC-837B-46C9-A28B-A7867FC86023}>RsFxFt.Dll::RsFxNsoInitialize failed: Error 0x80070005 (-2147024891)
<{C580416B-A13E-4ECD-B61B-AAFAE39E5E35}>Failed to initialize the CFsaShareFilter interface
<{1038F43D-3391-45F7-B1B3-BADF26459429}>Failed to initialize CFsaShareFilter: Error 0x80070005 (-2147024891)
2017-04-14 10:25:23.38 spid4s FILESTREAM: effective level = 0, configured level = 2, file system access share name = ‘SQL2016’.

Workaround

Following are some of the workaround identified which will enable you to overcome the above errors on Windows 10 creators update.

  • Change the SQL Server service startup account to built-in account LocalSystem
  • Change the SQL Server service startup account to a domain user account with local admin privileges on the system
  • If you use virtual account [NT SERVICE\MSSQL$InstanceName] as service startup account, please make this account a member of the local administrators group
  • Uninstall Creators Update and fall back to the previous Windows build

The Windows team is working on the fix to prevent this breaking change on FILE_OPEN_IF api. If you have SQL Server installed on Windows 10 with databases using filestream feature, we recommend you defer applying the Windows 10 Creators update until the fix is made available.

We will update the blog post once the fix is made available by Windows team

 

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

SQL Server Performance Baselining Reports Unleashed for Enterprise Monitoring !!!

$
0
0

In my previous blog post, I shared how you can leverage SQL Server Performance Dashboard SSRS Reports for monitoring and point in time troubleshooting for large deployments of SQL Server. In this post, we will talk about why dbas should establish SQL Server Performance baselines for their environment and how you can leverage the SQL Server Performance Baselining reporting solution shared in our Tiger Toolbox Github repository to achieve that. The solution is already being deployed and leveraged by some of our customers and further contributed by SQL community members like SQLAdrian in GitHub.

Why SQL Server Performance Baselining is important?

  • Performance Tuning – DBAs, Consultants and Support team often get called in a situation when the business application running on SQL Server is running slow. Before troubleshooting, one of the first question to ask is how is the performance measured and what was the performance previously ? In majority of the situations, there are changes in the workload, application codes which leads to changes in performance but one can prove it only when the previous state of the SQL Server was captured. In such situations, performance baselining can assist you in learning from the historical data and trends to detect the anomalies or changes on the pattern on the server which may have caused performance issues observed.
  • Capacity Planning – For DBAs managing large scale deployments and mission critical instances of SQL Server, it is important to proactively keep an eye on resource utilization (CPU, Memory, IO and storage) and workload trend over a period, to forecast and plan for more capacity as the workload trend and resource utilization is changing. For capacity planning, performance baselining reports are the key to forecast and predict capacity required for future by analyzing the historical trends.
  • Consolidation Planning – As shadow IT and number of business applications running SQL Server grows in enterprises, companies can save cost by consolidating some of their databases under single instance for efficiently utilizing their hardware and resources. To plan resources and consolidate SQL Server databases, again performance baselining is required.
  • Virtualization\Migration to Azure – Most enterprises today are looking to reduce their IT capital and operational expenses and overheads by migrating to cloud. When migrating to cloud, it is important to identify right VM size or performance tier to run your SQL Server databases which is easy when you have performance baselines captured and established.

In general, Performance Baselining is capturing last known good state metrics which can be used as a starting point for comparison. Performance monitoring is the first step for baselining. In a production environment, the goal of performance monitoring should be to be non-invasive with minimal overhead. Baselining always requires all the metrics to be captured across time dimension to perform historical time analysis. Historical analysis of performance data is useful for anomaly detection or to forecast future performance based on current workload.

In SQL Server, at a very high level, we have 3 types of performance monitoring data available to capture the state of SQL Server

  • Perfmon
  • DMVs (dynamic management views)
  • Xevents

In SQL Server, one needs to capture at least following details at minimum over time to successfully establish comprehensive performance baseline for SQL Server instance.

Processor Information

  • Processor(*)% Processor Time
  • Process(sqlservr)% Processor Time
  • Processor(*)% Privileged Time
  • Process(sqlservr)% Privileged Time

Memory

  • Available Mbytes
  • Memory Pages/sec
  • Process(sqlservr)Private Bytes
  • Process(sqlservr)Working Set
  • SQLServer: Memory Manager Total Server Memory
  • SQLServer: Memory Manager Target Server Memory

Physical Disk

  • PhysicalDisk(*)Avg. Disk sec/Read
  • PhysicalDisk(*)Avg. Disk sec/Write
  • PhysicalDisk Avg. Disk Queue Length
  • Disk Bytes/sec
  • Avg Disk Bytes/Transfer
  • Process(sqlservr)IO Data Operations/sec

Network

  • Network InterfaceBytes Received/sec
  • Network InterfaceBytes Sent/sec
  • Network Interface(*)Output Queue Length

SQL Server Workload

  • SQL Server: SQL Statistics Batch Requests/sec
  • SQL Server: SQL Statistics SQL Compilations/sec
  • SQL Server: SQL Statistics SQL Recompilations/sec
  • sys.dm_exec_requests
  • Sys.dm_exec_sessions
  • sys.dm_exec_connections
  • Max Workers
  • Workers Created
  • Idle Workers
  • Deadlocks

Waits

  • SQL Server: Wait Statistics
  • Sys.dm_os_waiting_tasks
  • Sys.dm_os_wait_stats
  • Latch Waits > 15 sec
  • Locks > 30 sec
  • IO Latch Timeouts
  • Long IO Status

Database Indexes

  • Sys.dm_db_index_physical_stats
  • Sys.dm_db_index_operational_stats
  • Sys.dm_db_index_usage_stats

When monitoring large deployments of SQL Server instances, if you capture all the above data on a central location or central SQL Server database, the central database becomes a bottleneck or the point of contention. To scale the performance baselining solution for large deployments of SQL Server, we use the same architecture as discussed in SQL Server Performance Dashboard SSRS Reports as shown below. In this architecture, the performance data is captured locally in a database called dba_local with performance monitoring SSRS reports hosted on a central monitoring server. The SSRS reports are parameterized and have dynamic connection string which connects to SQL Server instance specified in the report parameter. The report queries the dba_local database from the specified SQL Server instance to report performance data captured over time.

 

If you download SQL Performance Baselining Reporting Solution from GitHub, you will find scripts to create a database and monitoring jobs which needs to be ran on each instance of SQL Server you wish to monitor. Following are steps to deploy SQL Performance Baselining Reporting solution,

Data Collection Steps for each SQL Instance to Monitor

  1. Connect to SQL instance to monitor
  2. Run CREATEDATABASE.sql
  3. Run CREATEOBJECTS.sql
  4. Run CREATECOLLECTIONJOB.sql
  5. Check SQL Agent JOBs History to see if it runs successfully
  6. Repeat for each SQL Instance you want to monitor

Setting up & Deploying Reporting

To deploy Performance baselining SSRS reports, you can leverage the same central monitoring SSRS server hosting SQL Server Performance dashboard report. As mentioned in SQL Server Performance dashboard blog post, you can use the same steps to setup and configure SSRS server by deploying reports using SSDT or Report Manager. Alternatively, you can use PowerShell script created by Aaron (b|t) with some modifications mentioned in the previous blog post as well.

Once the reports are deployed, you can use the following reports to establish your performance baseline

The above report is drillthrough report so when you click on the chart for any specific data, it takes you to the daily performance trend report for that day as shown below

In the previous report, you observed that around 8PM, the sessions in suspended state increases which indicates high blocking causing performance slowness or degradation. Around same time, most sessions are waiting on LCK_M_SCH_M and disk write latency is high around the same. And later we found

Ohh, our reindexing job runs at around 8PM and we don’t use WITH ONLINE=ON while reindexing which explains the blocking.

The Memory Utilization report shows you the trend of SQL Server Memory and Available Memory on the server. From the report, you can easily deduce, Available Memory on the server went down at around 2:30PM which caused external memory pressure causing target server memory to fall below total server memory in SQL Server. SQL Server responded to the memory by increasing the lazy writer activity to free up pages which in turn drops the page life expectancy on the server.

Finally, the database storage report gives you a quick glance of the free space on all the databases hosted on the SQL Server.

Hope you find the reports useful and if you are a SSRS developer, I would encourage you to design some cool reports for performance monitoring using perfmon, DMV or Xevents data and contribute back to the community by sending a pull request to our solution in GitHub repository.

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

HADR Virtual Chapter – Monitoring Always On Availability Groups

$
0
0

This past week SQL Server Tiger Team delivered a webinar session for the HADR PASS virtual chapter. During the session, I talked about some of the enhancements which were introduced in SQL Server 2016 and SQL Server 2016 Service Pack 1. The enhancements which I covered during the session fall in three broad categories

  1. Fundamental Changes to Always On Availability Groups
  2. Performance and Usability enhancements
  3. Troubleshooting and Monitoring enhancements.

During the session, I talked about the Automatic Seeding capability with SQL Server 2016 Always On Availability Group and the new latency dashboard which we are adding to SQL Server Management Studio vNext. The latency dashboard is currently in development and is expected to be available sometime in the next semester. The session recording, the slide deck and the scripts used during the demo are included below.

Session Recording

Presentation

Demo Scripts

The Demo Scripts for the session can be downloaded here.

 

Sourabh Agarwal
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam


SQL Server Mysteries: The Case of the Not 100% RESTORE…

$
0
0

I recently visited a customer onsite and presented to them topics on SQL Server 2016. After the talk, I opened up the floor for the audience to ask me questions. One question I got went like this “I’ve tried to restore a database on SQL Server using the WITH STATS option. When I run the RESTORE the progress shows 100% but the restore is not complete and takes longer to actually finished. Do you know why?”.  I thought about this for a second and my answer was “It is probably taking a long time to run recovery after the RESTORE but I’m not sure”. I didn’t commit to investigating this but it bugged me on my plane ride back to the great state of Texas (sorry if you haven’t figured out by now I love living in Texas).

When I got back to my office, I remembered my colleague from the Tiger Team, Ajay Jagannathan, had blogged about new extended events we introduced in SQL Server 2016 for backup and restore. You can read more about these on his blog post at https://blogs.msdn.microsoft.com/sql_server_team/new-extended-event-to-track-backup-and-restore-progress/. OK, this is very cool. We now have a new way of instrumenting the backup/restore process for any database or transaction log.

I have a problem now to solve my mystery. I don’t know the cause of the problem so I don’t know how to create a scenario to see if these new events will provide the answer.

The path to find the answer is more really to find out possible answers and draw conclusions. My thought was to setup the extended events trace as Ajay suggests in his article, run a RESTORE DATABASE … WITH STATS and then in the extended event trace analyze possible instrumentation points that occur after 100% is reported (which itself is an event) and then determine could they “take a long time”?.

First, I copied the XEvent session script that Ajay documented in his blog, executed it, and started that session.

Next, I decided to use the WideWorldImporters sample database available on github to test out the RESTORE. Here is the script I used to restore that backup I downloaded from github

restore database wideworldimporters from disk = ‘c:\temp\wideworldimporters-full.bak’
with move ‘wwi_primary’ to ‘d:\data\wideworldimporters.mdf’,
move ‘wwi_userdata’ to ‘d:\data\wideworldimporters_userdata.ndf’,
move ‘wwi_log’ to ‘d:\data\wideworldimporters.ldf’,
move ‘wwi_inmemory_data_1’ to ‘d:\data\wideworldimporters_inmemory_data_1’,
stats

The resulting stats come through as Messages and looked like this in SSMS

10 percent processed.
20 percent processed.
30 percent processed.
Processed 1464 pages for database ‘wideworldimporters’, file ‘WWI_Primary’ on file 1.
Processed 53096 pages for database ‘wideworldimporters’, file ‘WWI_UserData’ on file 1.
Processed 33 pages for database ‘wideworldimporters’, file ‘WWI_Log’ on file 1.
Processed 3862 pages for database ‘wideworldimporters’, file ‘WWI_InMemory_Data_1’ on file 1.
100 percent processed.
RESTORE DATABASE successfully processed 58455 pages in 8.250 seconds (55.354 MB/sec).

I then went to look at the result of my XEvent session file. The results grid looked something like this around the 100% progress reported message. And in my example, the RESTORE finished very quickly around 6 seconds and the 100% message was provided back to my SSMS query window just around the time the restore completed

image

If you look at this output, perhaps the only thing that has the potential to take time after the 100 percent processed message is writing out the backup history to msdb. Hmmmm. While that is possible, how probable is it? I’m not convinced that the writing of the history table is the only “work” to be done past posting this “100% percent processed” message.

Based on my knowledge of SQL Server and looking at these events, I then thought “What are the major phases of RESTORE and which ones could take a long time if say the database was fairly large?” I then created these scenarios and looked at the XEvent trace:

1) Create the database files – Making the assumption that most people use Instant File Initialization I decided to not go down this path

2) Copying in all the database pages from the backup – Very plausible but I tested fairly large backups of data (2Gb+) and 100% message only was displayed very close to when the backup was done. I’ll show you that in a minute.

3) Create the transaction log files – Probably not an issue but I’m running out of options so I will come back to that.

4) Copying in the transaction log blocks from the backup – This also a candidate if there were a high number of transactions during a long, large backup. Not sure how likely though this would be a huge amount of time but the messages at the end of the restore tell you how many “pages” we had to copy for the log.

5) Running Recovery – This is similar to #4 but if there were many concurrent transactions during the backup and the backup took a long time, there could be a certain amount of recovery to do, so this is a candidate.

If you look at my list of possible reasons, they all seem to be related to the transaction log.  Given the needs of today’s database user, I think #3 has possibilities with larger transaction log sizes. Why? Because we can’t use Instant File Initialization for the transaction log and part of RESTORE is creating the actual transaction log file (see more on why no IFI on tlog from Paul Randal’s blog post). Another possibility here is a large number of Virtual Log Files but I thought we made advances here in our latest releases. I’ll come back to that one too later.

So any good case can be solved with some experimentation. I decided to create a scenario to test out theory #3 above: A “large” transaction log creation because I know since we can’t use IFI that can simply take some time.

I created a database with a 10Gb data file, a 56b transaction log, and populated the data with around 13Mb of data. I then backed it up, turned on my XEvent session, and ran RESTORE.

The output of the RESTORE hit 100% very fast and then just sat there “Executing Query” in the SSMS window for another ~2 minutes and then came back with the final message that the RESTORE had finished. So the SSMS Messages Window it looked something like this

15 percent processed.
23 percent processed.
31 percent processed.
47 percent processed.
55 percent processed.
63 percent processed.
71 percent processed.
87 percent processed.
95 percent processed.
100 percent processed.

< Delay for about 2 minutes >

Processed 1608 pages for database ‘imabigdb’, file ‘imabigdb’ on file 1.
Processed 2 pages for database ‘imabigdb’, file ‘imabigdb_log’ on file 1.
RESTORE DATABASE successfully processed 1610 pages in 143.812 seconds (0.087 MB/sec).

The XEvent results looked like this. I’ve highlighted a few areas

image

Notice the “100 percent…” message has detailed about “bytes processed”. Since my data is around 13Mb this tells me that the progress indicators are all about the data transferred step of RESTORE. Notice the time gap in the messages to “Waiting for Log zeroing…” and “Log Zeroing is complete”. That gap in time is around 2 minutes. Exactly the time it took between the the 100% Complete message in the SSMS Window and the final restore message.

From this evidence I can conclude that having a transaction log file size that takes a long time to initialize can be a possible cause of seeing the behavior of getting the 100% Complete Message quickly but the overall RESTORE takes longer to complete.

One thing that puzzled me though when I looked back at a previous RESTORE I tried where the database “had more data”.  In this case the 100% message in SSMS was very close to the time I received the final RESTORE message. There was almost no delay (or at least it was not 2 minutes). But when I went back and looked at my database definition I was using a 5Gb transaction log. Huh? Why didn’t I see a big delay of 2 minutes as in the previous example. My suspicion was that the copying of data into the database from the backup is done concurrently with creation of the transaction log file. So if the time it takes to copy in your data is say around 2 minutes and the tlog takes 2.5 minutes to create and zero out that file, the delay between seeing the 100% and the final restore completion would be around 30 seconds. It makes sense if you think about it. Why should the engine wait to create the transaction log file until after copying the data. Why not create it at the same time because once the data copying is completed, we can copy in the log blocks and run recovery.

I have this new phrase I’ve been using lately. “There’s an XEvent for that”. And indeed there is to prove our theory…almost<g>. I decided to add in XEvents from the “debug” channel because I need some pretty low-level stuff to figure this out. Here is my new session definition:

CREATE EVENT SESSION [Backup_Restore_Trace] ON SERVER
ADD EVENT sqlos.async_io_completed(
    ACTION(package0.event_sequence,sqlos.task_address,sqlserver.session_id)),
ADD EVENT sqlos.async_io_requested(
    ACTION(package0.event_sequence,sqlos.task_address,sqlserver.session_id)),
ADD EVENT sqlos.task_completed(
    ACTION(package0.event_sequence,sqlserver.session_id)),
ADD EVENT sqlos.task_started(
    ACTION(package0.event_sequence,sqlserver.session_id)),
ADD EVENT sqlserver.backup_restore_progress_trace(
    ACTION(package0.event_sequence,sqlos.task_address,sqlserver.session_id)),
ADD EVENT sqlserver.file_write_completed(SET collect_path=(1)
    ACTION(package0.event_sequence,sqlos.task_address,sqlserver.session_id))
ADD TARGET package0.event_file(SET filename=N’Backup_Restore_Trace’)
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

I’ve added in events for async_io, tasks,  file writes, and added in actions to see session_id, event_sequence, file_handle, and task_address. I’ll show you why these are all important. Using my database where the difference was only about 30 seconds between the 100% processed and the final restore completion, I see results like this (I’m going to show only pieces of the result because it gets big).

image

This looks complicated so let me explain the results

1. The session_id is 57 for all of these evens so this is the RESTORE session

2. We know file_handle 4832 is for the database file

3. The task_address is the same for all of these events so we know this is one task that has finished copying the data and is waiting for the log to be zeroed out

4. We know file_handle 4268 is for the transaction log file

5. We can see the gap here in time between finishing copying the pages and the log zeroing complete which is 30 seconds. But remember this is a 5Gb tlog which I had already proved takes ~2 minutes to zero out

And it looks like my theory of this being done concurrently doesn’t fly because these events are all from the same task.

I went back and ran the RESTORE again, this time looking at sys.dm_os_waiting_tasks. I got these results

image

You can clearly see that three tasks are working on this same session with different wait types. But how does this help me line up the XEvents to prove my theory?

The waiting_task_address for the wait_type BACKUPIO is 2138696527016 in decimal so that matches exactly with the events I showed above. BACKUPIO is the wait type for a thread that is reading from the backup file and writing out data to the database file (and apparently also writes out something to the tlog file). Another waiting_task_address is waiting on PREEMPTIVE_OS_WRITEFILEGATHER.  I’ve seen this exact same wait type for auto grow scenarios for the transaction log. So I believe this separate task is responsible for creating and zeroing out the transaction log file. Taking the waiting_task_address for this task, I searched my XEvent results and found this

image

Let’s look at this deeper.

1. The “Preparing Containers” task translates to the same task_address as the one waiting on BACKUPTHREAD. So you can see this is the main “coordinator” of the backup.

2. Soon after this a new task is started with a task address that matches the task waiting on PREEMPTIVE_OS_WRITEFILEGATHER. Ah. So this is the concurrent operation to zero out the transaction log. The coordinator task creates a “subprocess” to zero out the tlog. (I confirmed this further because in the task_started event is a field called entry_point. That is a number in decimal which is the actual function pointer to the code to run the task. In this specific case the name of that function pointer using symbols is sqlmin!SubprocEntrypoint()).

3. And then you can see that same task complete some ~2 minutes later, which is just about the time it takes to zero out the tlog. One thing that puzzled me is that I never saw any async_io or file_write_completed events for this task to zero the log. That is because we don’t use the “normal” routines in SQL Server that track async_io and file writes. We simply call the Windows routine WriteFileGather() in a loop and don’t use the “normal” routines for I/O in SQL Server. To ensure we don’t cause any scheduling problems, we do this in “preemptive mode”, hence the wait type PREEMPTIVE_OS_WRITEFILEGATHER. So I know that the parent coordinator task for the RESTORE created a subprocess to zero the log at the same time another subprocess task was copying in the database pages to the database files.

Wow. That was quite a bit just to show you how a RESTORE can take longer to complete “than you expect” and why 100% doesn’t necessarily mean 100% complete with the entire RESTORE process. A few closing thoughts:

1. The % process messages are only for the copying of database pages from the backup to database files. You can see their byte progress more in the XEvent trace. I suspect we do this because the copy process is a more predictable process we determine a “% complete” progress

2. The BACKUPIO subprocess task writes to the tlog as seen in the XEvent trace after the log is zeroed because that is where we copy in the log pages from the log portion of the backup. That does use “normal” I/O routines so will show up in async_io and file write events.

3. It is very possible that other scenarios I mentioned earlier regarding the transaction log (copying in the log blocks, going through VLFs, and running recovery) can make the completion take even long past the 100% mark because we wouldn’t try these operations until all the database pages are copied to the database files. Now you know though how to use XEvent to determine which part of applying the log could be taking extra time.

That is the end of this mystery. I hope you were able to understand how the combination of XEvents, DMVs, and paying attention to details of how the RESTORE executed helped turn a simple but mysterious question into case solved.

Bob Ward
Principal Architect
Microsoft Data Group
SEALS Tiger Team

Community driven Enhancements in SQL Server 2017

$
0
0

While SQL Server 2016 runs faster, SQL Server 2017 promises to run faster and empower customers to run smarter with intelligent database features like the ability to run advanced analytics using Python in a parallelized and highly scalable way, the ability to store and analyze graph data, adaptive query processing and resumable online indexing allowing customers to deploy it on platform of their choice (Windows or Linux). SQL Server is one of the most popular DBMS among SQL Community and is a preferred choice of RDBMS among customers and ISVs owing to its strong community support. In SQL Server 2017 CTP 2.0, we have released several customer delighters and community driven enhancements based on the learnings and feedback from customers and community from in-market releases of SQL Server.

Smart Differential Backup – A new column modified_extent_page_count is introduced in sys.dm_db_file_space_usage to track differential changes in each database file of the database. The new column modified_extent_page_count will allow DBAs, SQL Community and backup ISVs to build smart backup solution which performs differential backup if percentage changed pages in the database is below a threshold (say 70-80%) else perform full database backup. With large number of changes in the database, cost and time to complete differential backup is similar to that of full database backup so there is no real benefit of taking differential backup in this case but it can rather increase the restore time of database. By adding this intelligence to the backup solutions, customers can now save on restore and recovery time while using differential backups.

Consider a scenario where you previously had a backup plan to take full database backup on weekends and differential backup daily. In this case, if the database is down on Friday, you will need to restore full db backup from Sunday, differential backups from Thursday and then T-log backups from Friday. By leveraging modified_extent_page_count in your backup solution, you can now take full database backup on Sunday and lets say by Wednesday, if 90% of pages have changed, the backup solution should take full database backup rather than differential backup. Now, if the database goes down on Friday, you can restore the full db backup from Wednesday, small differential backup from Thursday and T-log backups from Friday to restore and recover the database quickly compared to the previous scenario. This feature was requested by customers and community in connect item 511305.

USE <database-name>
GO

select CAST(ROUND((modified_extent_page_count*100.0)/allocated_extent_page_count,2)
as decimal(2,2))
from sys.dm_db_file_space_usage
GO

select CAST(ROUND((SUM(modified_extent_page_count)*100.0)/SUM(allocated_extent_page_count),2)
as decimal(2,2))
as ‘% Differential Changes since last backup’
from sys.dm_db_file_space_usage

Smart Transaction Log Backup – In upcoming release of SQL Server 2017 CTP, a new DMF sys.dm_db_log_stats(database_id) will be released which exposes a new column log _since_last_log_backup_mb. The column log _since_last_log_backup_mb will empower DBAs, SQL Community and backup ISVs to build intelligent T-log backup solutions which takes backup based on the transactional activity on the database. This intelligence in the T-log backup solution will ensure the transaction log size doesn’t grow due to high burst of transactional activity in short time if the T-log backup frequency is too low. It will also help avoid situation where scheduled transaction log backup creates too many T-log backup files even when there is no transactional activity on the server adding to the storage, file management and restore overheads. The monitoring solutions and ISVs can also setup alerts based on the transaction activity of the T-log to avoid and avert T-log growth issues caused due to high transactional activity on the database.

SELECT INTO … ON FileGroup – One of the highly voted connect items and highly requested feature ask from SQL community to support loading tables into specified filegroups while using SELECT INTO is now made available in SQL Server 2017 CTP 2.0. SELECT INTO is commonly used in DW scenario for creating intermediate staging tables and inability to specify filegroup was one of the major pain points to create and load tables in filegroups different from the default filegroup of the user loading the table. Starting SQL Server 2017 CTP 2.0, SELECT INTO T-SQL syntax supports loading a table into a filegroup other than a default filegroup of the user using the ON <Filegroup name> keyword in TSQL syntax shown below

ALTER DATABASE [AdventureWorksDW2016] ADD  FILEGROUP FG2

select * from  sys.database_files

ALTER DATABASE [AdventureWorksDW2016]
A
DD FILE
(
NAME=‘FG2_Data’,
FILENAME = ‘/var/opt/mssql/data/AdventureWorksDW2016_Data1.mdf’
)
TO FILEGROUP FG2;
GO

SELECT INTO [dbo].[FactResellerSalesXL] ON FG2  from [dbo].[FactResellerSales];

Tempdb Setup improvements – One of the constant feedback from customers, SQL community and field after doing the SQL Server 2016 setup improvements is to uplift the maximum initial file size restriction of 1GB for tempdb in setup. For SQL Server 2017, the setup will allow initial tempdb file size up to 256 GB (262,144 MB) per file with a warning to customers if the file size is set to value greater than 1GB and if IFI is not enabled. It is important to understand the implication of not enabling instant file initialization (IFI) where setup time can increase exponentially depending on the initial size of tempdb data file specified. IFI is not applicable to transaction log size so specifying larger value of transaction log can invariably increase the setup time while starting up tempdb during setup irrespective of the IFI setting for SQL Server service account.

sql2017

 

Tempdb Monitoring and Planning – Few months back, SQL Server Tiger team surveyed SQL Community to identify common challenges experienced by customer with tempdb. Tempdb space planning and monitoring were found to be top challenges experienced by customers with tempdb. As a first step to facilitate tempdb space planning and monitoring, a new performant DMV sys.dm_tran_version_store_space_usage is introduced in SQL Server 2017 to track version store usage per database. This new DMV will be useful in monitoring tempdb for version store usage for dbas who can proactively plan tempdb sizing based on the version store usage requirement per database without any performance toll or overheads of running it on production servers.

Transaction Log Monitoring and diagnostics – One of highly voted connect items and highly requested ask in the community is to expose transaction log VLF information in DMV. T-log space issues, high VLFs and log shrink issues are some of the common challenges experienced by DBAs. Some of our monitoring ISVs have asked for DMVs to expose VLF information and t-log space usage for monitoring and alerting. A new DMV sys.dm_db_log_info is introduced in SQL Server 2017 CTP 2.0 to expose the VLF information similar to DBCC LOGINFO to monitor, alert and avert potential Tlog issues experienced by customers.

In addition to sys.dm_db_log_info, a new DMF sys.dm_db_log_stats(database_id) will released in upcoming CTP release of SQL Server 2017 which will expose aggregated transaction log information per database. We will share more details on this DMF once it is released.

Improved Backup performance for small databases on high end servers – After migrating existing in market release of SQL Server to high end servers, customers may experience dip in backup performance when taking backups of small to medium databases. This happens as we need to iterate the buffer pool to drain the on-going I/Os. The backup time is not just the function of database size but also a function of active buffer pool size. In SQL Server 2017, we have optimized the way we drain the on-going I/Os during backup resulting in dramatic gains in backup performance for small to medium databases. We have seen more than 100x improvement when taking system database backups on a 2TB machine. More extensive performance testing results on various database sizes is shared below. The performance gain reduces as the database size increases as the pages to backup and backup IO takes more time compared to iterating buffer pool. This improvement will help improve the backup performance for customers hosting multiple small databases on a large high end servers with large memory.

DB Size

Older SQL Server releases

SQL Server 2017

Improvement

8MB

107

0.4

642x

256MB

108

1

108x

1GB

110

4

27.5

8GB

139

24

5.79x

16GB

168

59

2.85x

32GB

216

108

2.12x

64GB

332

200

66%

128GB

569

469

21.32%

256GB

1055

953

10.70%

 

Processor Information in sys.dm_os_sys_info – Another highly requested feature among customers, ISVs and SQL community to expose processor information in sys.dm_os_sys_info is released in SQL Server 2017 CTP 2.0. The new columns will allow you to programmatically query processor information for the servers hosting SQL Server instance useful in managing large deployments of SQL Server. New columns exposed in sys.dm_os_sys_info DMV are socket_count, core_count, cores_per_socket.

Capturing Query Store runtime statistics in DBCC CLONEDATABASE – DBCC CLONEDATABASE has proved to be extremely useful in exporting the query store metadata for regression testing and tuning. DBCC CLONEDATABASE didn’t capture the runtime statistics which is flushed every 15 mins which required customers to manually execute sp_query_store_flush_db before running DBCC CLONEDATABASE to flush and capture query store runtime statistics in database clone. Starting SQL 2016 SP1 CU2 and SQL Server 2017 CTP 2.0, DBCC CLONEDATABASE will flush runtime statistics while cloning to avoid missing query store runtime statistics in database clone. In addition to this, DBCC CLONEDATABASE is further enhanced to support and clone fulltext indexes. We also fixed several bugs caused when cloning database using some of latest features (AlwaysEncrypted, RLS, Dynamic data masking, Temporal) and released the fixes in SQL Server 2016 SP1 CU2 and SQL Server 2017 CTP 2.0.

Thank you to all the SQL Community members in sharing your valuable feedback and making SQL Server 2017 smarter and a preferred choice of RDBMS for customers.

 

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

Programmatically find SQL Server TCP ports

$
0
0

There are many instances in SQL Server management where one person would like to find all the TCP ports that SQL Server is binding to. Finding all the TCP ports that SQL Server is listening on might be for security related auditing, connectivity troubleshooting, or for some form of automation. One such example is used in BPCheck.

In the past, attempting to find what ports SQL Server was a tedious task. Common methods were to either parse the event log (if it was still available since last startup), or scan the registry. You can still find them in several blogs and other sources. In SQL Server 2008 and SQL Server 2008 R2, you would scan the registry using a query like so:

SELECT MAX(CONVERT(VARCHAR(15),value_data)) FROM sys.dm_server_registry WHERE registry_key LIKE '%MSSQLServer\SuperSocketNetLib\Tcp\%' AND value_name LIKE N'%TcpPort%' AND CONVERT(float,value_data) > 0;
SELECT MAX(CONVERT(VARCHAR(15),value_data)) FROM sys.dm_server_registry WHERE registry_key LIKE '%MSSQLServer\SuperSocketNetLib\Tcp\%' AND value_name LIKE N'%TcpDynamicPort%' AND CONVERT(float,value_data) > 0;

This is not a very clean way of doing it, but it works for the most part. There is at least one issue with this approach however, if you make changes (either in the registry or in a config tool) without restarting SQL Server, this data will be incorrect. There might also be additional concerns with querying the registry from T-SQL that has someone not willing to run this query.
 
But there is a better way. In SQL Server 2012, a new DMV was added to query the TCP ports that SQL Server is currently using. The DMV can be invoked by querying sys.dm_tcp_listener_states. To get the same information as from the queries above (only IP v4 port, not DAC), you can simply execute:

SELECT port FROM sys.dm_tcp_listener_states WHERE is_ipv4 = 1 AND [type] = 0 AND ip_address <> '127.0.0.1'

image
This gets both static and dynamic ports in a single query. If you only allow DAC from local connections, filtering by 127.0.0.1 will hide the DAC specific ports.
Here’s the full output for the DMV on a different machine:

SELECT * FROM sys.dm_tcp_listener_states

clip_image002

Content creator: David Barajas – Software Engineer, Microsoft

Pedro Lopes (@sqlpto) – Senior Program Manager

New in SSMS: Query Performance Troubleshooting made easier!

$
0
0

The community already uses tools that can make it easier to read and analyze query plans (including SSMS), but these require significant expertise in understanding query processing and plans in order to be able to actually find and fix root causes.

In the latest version of SSMS that released last week, we debut a new scenario-based issue identification feature for Comparison-based and Single Plan Analysis.

Based on common trends in the query performance troubleshooting space, and on years of our own experience troubleshooting query plan issues, we have been working on a functionality that implements some degree of automation in the task of query plan analysis, especially for those large and complex plans. The purpose is to make it easier to find common scenarios where plan choice may be inefficient, and get some recommendations on next steps to take.

In this first release, we added a “Inaccurate Cardinality Estimation” scenario. One of the most important inputs for the Query Optimizer to choose an optimal execution plan is the estimated number of rows to be retrieved per operator. These estimations model the amount of data to be processed by the query, and therefore drive cost estimation. The models used by the process of estimating number of rows, called Cardinality Estimation, may have limitations. The accuracy of those models depends on how closely they correspond to the actual data distribution, correlation, chosen parameters, and how closely statistics, the main input for Cardinality Estimation, model all aspects of actual data.

This scenario helps you to find significant inaccuracies in Cardinality Estimation for your actual execution plan, and suggests possible causes for those inaccuracies, as well as possible workarounds to improve the estimates. Note that this automation may not identify all possible root causes and workarounds. So while the information displayed here is a tentative mitigation opportunity to resolve an issue identified by this scenario, it should still help in understanding and improving efficiency of the query plan choice. Please, make sure to test any proposed workarounds before applying on your production system.

Let’s see what this new feature allows us know about our query execution plans using 3 approaches:

  1. Single plan analysis
  2. Plan comparison between two previously saved plans
  3. Using Query Store

1. Single Plan Analysis

Let’s use a plan I captured and used in a previous blog post, and use this approach in a few simple steps:

  1. This is what we get opening in SSMS:
     
    exec sp_executesql N'exec Sales.SalesFromDate @P1',N'@P1 datetime2(0)','2004-7-31 00:00:00'

     

    image

     

  2. Now right-click anywhere in a blank area of the plan and you can choose to “Analyze Actual Execution Plan”.
     
    image

     

  3. Notice a new panel opens. Under the Scenarios tab you can see the operators with a significant difference between estimations and actual rows. In this case I’m focusing on the SEEK, and in the Finding Details (right-side) I can see a few possible reasons for that difference.
    For example, in 1) we see that the “(…) predicate for this operator depends on parameter @StartOrderDate. The compile-time value was unknown or different from the runtime value (…)”. Let’s investigate this one.

     image

     

  4. Clicking on the root node (SELECT) I can see it’s properties, namely information about parameters.
     
    image

     
    There it is, so compiled and runtime values are different indeed. This is a case of parameter sniffing hurting me, where a previously cached plan that was deemed good enough for the compiled parameter may not be good for other parameter values.

  5. Do they represent that much of a difference in performance? Given that we have both the compiled and runtime parameters, let’s use Plan Comparison to check the differences and similarities.
    • Plans are definitely the same (as expected).

       

      exec sp_executesql N'exec Sales.SalesFromDate @P1',N'@P1 datetime2(0)','2004-7-31 00:00:00'
      exec sp_executesql N'exec Sales.SalesFromDate @P1',N'@P1 datetime2(0)','2004-3-28 00:00:00'

       

      image

       

  • Yes, so plan with same compiled/runtime value works great (Bottom Plan), unlike the second execution (Top Plan). See Cpu Time and ElapsedTime below, as well as the different number of Actual Rows:
     
    image

     

  • From this point on you would troubleshoot this class of issues using some known strategies. Here’s a few examples on how to deal with “bad” parameter sniffing, changing the stored procedure:

     

    --Fix 1 - RECOMPILE
    ALTER PROCEDURE Sales.SalesFromDate (@StartOrderdate datetime) AS
    SELECT * FROM Sales.SalesOrderHeaderBulk AS h
    INNER JOIN Sales.SalesOrderDetailBulk AS d ON h.SalesOrderID = d.SalesOrderID
    WHERE (h.OrderDate >= @StartOrderdate)
    OPTION (RECOMPILE)
    GO
    --Fix 2 - OPTIMIZE FOR
    ALTER PROCEDURE Sales.SalesFromDate (@StartOrderdate datetime) AS
    SELECT * FROM Sales.SalesOrderHeaderBulk AS h
    INNER JOIN Sales.SalesOrderDetailBulk AS d ON h.SalesOrderID = d.SalesOrderID
    WHERE (h.OrderDate >= @StartOrderdate)
    --OPTION (OPTIMIZE FOR(@StartOrderDate = 'xxxx'))
    OPTION (OPTIMIZE FOR UNKNOWN)
    GO
    --Fix 3 - local variable
    ALTER PROCEDURE Sales.SalesFromDate (@StartOrderdate datetime) AS
    DECLARE @date datetime
    SELECT @date=@StartOrderDate
    SELECT * FROM Sales.SalesOrderHeaderBulk AS h
    INNER JOIN Sales.SalesOrderDetailBulk AS d ON h.SalesOrderID = d.SalesOrderID
    WHERE (h.OrderDate >= @date)
    GO
  • Or even turn off parameter sniffing at the database level if most of your workload has this class of issues:

    ALTER DATABASE SCOPED CONFIGURATION SET PARAMETER_SNIFFING = OFF;
    GO

2. Plan comparison between two previously saved plans

Now for a scenario where I already have a couple of rather complex query plans to compare. One I know works well, the other has perceived bad performance. Let’s use plan comparison to check the differences and similarities.

  1. Plans are definitely the different (zooming out to see the overall plan shape). Notice the position of the highlighted Clustered Index Scan on PhoneNumberType table between the slow plan (top) and the fast plan (bottom):

     image

     

    Use the Showplan Analysis panel (below) to navigate through the several matching operators and where they sit in the plan:
    Tip: click on the operator line pattern, not the operator name.

     

    image

     

    There are other differences as we explore both plans, such as the presence of Table Spools in the slow plan, and a series of Nested Loops and Merge joins in the slow plan, whereas the fast plan is using Hash joins. Why the difference?

  2. Moving to the Scenarios tab, there’s an entry here, with some interesting information in Details:

     image

     

  3. Now we know a few details that allows us to proceed:
    1) The difference lies primarily in the two plans using different CE versions.
    2) The slow plan uses TF 9481, which sets the CE model to the SQL Server 2012 and earlier versions, irrespective of the compatibility level of the database.
    5) Another evidence that estimations are very skewed in the slow plan (top).
  4. In this case, simply stop using the TF and you’re done.

3. Using Query Store

Same experience as above.

  1. For example, using the Top Resource Consuming Queries report, I see a top consumer (query 15) has a couple plans.
     

    image

     

  2. Plan 15 consistently takes longer time, plan 69 consistently less. So we can select them both (CTRL + click on each plan) and click on the Compare button.

     

    image

     

  3. Output is same as before. Plan 69 is the good plan (now on top) and Plan 15 is bad plan (bottom). Same exercise where I can see very different query plan shapes.

     

    image

     

  4. Moving to the Scenarios tab, and same entry here, with same interesting information in Details, as we saw in the previous “saved plans” approach:image

     

More scenarios will come in future SSMS releases, and while we have some ideas on what scenarios will follow, we welcome the community feedback on what those scenarios should be. So please share your ideas with us, either opening a Connect item (so others can also vote on it), using the contact form in the right section of this blog, via Twitter or any other means you can reach out – feedback is always welcomed!

Pedro Lopes (@sqlpto) – Program Manager

Azure Database Migration Service now available for preview

$
0
0

Azure Database Migration Service

Today at //BUILD, Microsoft announced a limited preview of the Azure Database Migration Service which will streamline the process for migrating on-premises databases to Azure.  Using this new database migration service simplifies the migration of existing on-premises SQL Server, Oracle, and MySQL databases to Azure, whether your target database is Azure SQL Database, Azure SQL Database Managed Instance or Microsoft SQL Server in an Azure virtual machine.

The automated workflow with assessment reporting, guides you through the necessary changes prior to performing the migration. When you are ready, the service will migrate the source database to Azure.  For an opportunity to participate in the limited preview of this service, please sign up.

The compatibility, feature parity assessment, schema conversion and data migration are enabled through limited preview for the scenarios below.

On-Premises Database Target Database on Azure
Azure SQL Database
SQL Server Azure SQL Database Managed Instance
SQL Server on Azure virtual machines
Azure SQL Database
Oracle Database Azure SQL Database Managed Instance
SQL Server on Azure virtual machines
Azure SQL Database
MySQL Azure SQL Database Managed Instance
SQL Server on Azure virtual machines

 

For more information about all the announcements we made today, get the full scoop in this //BUILD blog. You can also watch videos from the event and other on-demand content at the //BUILD website.

Ajay Jagannathan (@ajaymsft)

Principal Program Manager

Viewing all 194 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>