Concurrency Exchange

Partition switching needs to be smarter

July 5, 2019July 6, 2019by jmanskeLeave a Comment

A lot has been said already on how to do partition switching in SQL Server. I would suggest reading this blog post by Cathrine Wilhelmson, which is my go-to whenever I forget the syntax for partition switching.

Partition switching needs to be smarter. Or at least, it needs to not give me an error that is a false negative.

See the example below. SQL Sever version is SQL Server 2017 CU15.

DROP TABLE IF EXISTS PartitionSwitchTarget;
DROP TABLE IF EXISTS PartitionSwitchSource;
GO
IF EXISTS ( SELECT 1
              FROM sys.partition_schemes
              WHERE name = N'PartitionSwitchScheme' )
BEGIN
  DROP PARTITION SCHEME PartitionSwitchScheme;
END;
GO
IF EXISTS ( SELECT 1
              FROM sys.partition_functions
              WHERE name = N'PartitionSwitchFunction' )
BEGIN
  DROP PARTITION FUNCTION PartitionSwitchFunction;
END;
GO

CREATE PARTITION FUNCTION PartitionSwitchFunction( bigint )
  AS RANGE RIGHT FOR VALUES( 1,
                             100,
                             1000 );
GO
CREATE PARTITION SCHEME PartitionSwitchScheme
  AS PARTITION PartitionSwitchFunction
  ALL TO ( [PRIMARY] );
GO

CREATE TABLE PartitionSwitchSource
(
  Id bigint
) ON PartitionSwitchScheme (Id);
GO
CREATE TABLE PartitionSwitchTarget
(
  Id bigint
);
GO


ALTER TABLE dbo.PartitionSwitchTarget
ADD CONSTRAINT MyCheckConstraint CHECK( Id >= 1
                                   AND Id <= 99
                                   AND Id IS NOT NULL );

GO
ALTER TABLE dbo.PartitionSwitchSource 
  SWITCH PARTITION 2 
  TO dbo.PartitionSwitchTarget

This is a RANGE RIGHT bigint partition function. This means that the left endpoints are included in each range and the right endpoints are excluded. When I run this on my SQL Server instance, I get this message:
Msg 4972, Level 16, State 1, Line 47 ALTER TABLE SWITCH statement failed. Check constraints or partition function of source table 'TestDb.dbo.PartitionSwitchSource' allows values that are not allowed by check constraints or partition function on target table 'TestDb.dbo.PartitionSwitchTarget'.

Since this is a bigint partition function, the “less than or equal to 99” predicate is logically equivalent to the “strictly less than 100” predicate. In fact, if I change the check constraint above to say “< 100", the partition switch works without issue.

The SQL Server optimizer also has problems with these types of filters. This blog post by Paul White has a pretty good write up about it:

“The SQL Server 2008 (and later) optimizers do not quite get the internal logic right when an interval references, but excludes, a boundary value belonging to a different partition. The optimizer incorrectly thinks that multiple partitions will be accessed, and concludes that it cannot use the single-partition optimization for MIN and MAX aggregates.”

It goes beyond MIN/MAX aggregates indeed. Look at this query using the same partitioned table above:

SELECT COUNT( * )
  FROM PartitionSwitchSource
  WHERE Id >= 1
    AND Id < 100;

The optimizer accesses two partitions:

partitioncount2

If I change the filter to “less than or equal 99”, I get 1 partition:

SELECT COUNT( * )
  FROM PartitionSwitchSource
  WHERE Id >= 1
    AND Id <= 99;

partitioncount1

If you’re a developer like me and you are trying to write code to return the “appropriate” filter for table that has partitions, it can be tricky. On the one hand, I apparently must use “strictly less than 100” so that I can do a partition switch later. On the other hand, I have to use “less than or equal to 99” so that SQL Server won’t access the partition to the right when it isn’t necessary to do so.

I have had success writing the filter like this:

  WHERE Id >= 1
    AND ( Id < 100
          OR Id <= 100 - 1 );

This filter in the check constraint allows for partition switching as desired. It also convinces the optimizer that it is OK to only look at 1 partition. For now, this may be the workaround.

deadlock

Deadlock anatomy – transactionname

October 1, 2018October 1, 2018by jmanskeLeave a Comment

I want to have a series of blog posts where I talk about different parts in the deadlock XML report and when they might matter. There is not a lot of documentation about how to deal with deadlock XML. There definitely are a few great tools out there for visualizing and diagnosing deadlocks. Even so, I don’t think there’s any one tool out there right now that handles every single case. I very frequently find myself looking at the XML when I need to diagnose and fix a deadlock.

Today I’ll talk briefly about the transactionname attribute. This is an attribute of the “process” element. Normally this doesn’t matter too much. Some typical descriptors you see here (that actually describe what’s happening):
DROPOBJ
SELECT
UPDATE
COND WITH QUERY
INSERT

Usually the query that follows uses that construct. This is why it doesn’t usually matter very much. You will understand the deadlock better by looking at the execution stack anyway. The execution stack may reveal that you are dealing with a It might be a DROP TABLE statement (transactionname=”DROPOBJ”). It may be a IF EXISTS ( SELECT […] ) construct (transactionname=”COND WITH QUERY”). The important thing is that it is something that accurately describes what the process is actually doing.

I have seen some interesting ones:
SQL Diagnostic Manager Collection Service
MdView

In both cases, these were the transactionname of another vendor’s product. Once I realized this, the rest of the deadlock report was easy to understand. Be aware of the applications that are running on your system. If you are encountering some interesting deadlocks that might not make sense or are hard to diagnose, take a look at the transactionname and see if it makes sense in the context of the deadlock report.

metadata

Built-in metadata functions and blocking

August 12, 2018August 12, 2018by jmanskeLeave a Comment

I have seen code in the wild that uses built-in metadata functions in SQL Server, like OBJECT_ID, to determine if an object exists, or OBJECT_NAME, to get the name of an object. I’m talking about code patterns like:

IF EXISTS( SELECT 1 FROM sys.objects WHERE OBJECT_ID = OBJECT_ID( 'dbo.TableName' ) )
BEGIN
  [...]
END

Depending on what else your application is doing, this can cause a lot of unnecessary blocking. I have seen applications that do DDL on the fly, doing things like:

Adding columns to existing tables

creating “temporary” tables that are not true SQL Server temporary tables

Rebuilding or reorganizing indexes during an ETL process

These kinds of actions require exclusive locks on underlying SQL Server metadata objects. We won’t ever have total control over how SQL Server interacts with these objects, so it is my opinion that we reduce blocking on these objects whenever possible.

This is one of the few times where I think that using the READ UNCOMMITTED isolation level can be safe and effective. I don’t claim to be treading new ground here. This has been blogged about before, maybe most succinctly by Aaron Bertrand in this blog post.

My suggestion would be to write your own APIs that enforce the READ UNCOMMITTED isolation level. A first attempt at doing this might look like this:

CREATE OR ALTER FUNCTION dbo.ObjectId
( @SchemaName sysname,
  @ObjectName sysname )
RETURNS int
AS
BEGIN
  SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

  DECLARE @ObjectId int;

  SELECT @ObjectId = Objs.object_id
    FROM sys.objects Objs
      INNER JOIN sys.schemas Schs
        ON Objs.schema_id = Schs.schema_id
    WHERE Schs.name = @SchemaName
      AND Objs.name = @ObjectName;

  RETURN @ObjectId;
END;
GO

But we get this compile-time error:

Msg 443, Level 16, State 15, Procedure ObjectId, Line 8 [Batch Start Line 89]
Invalid use of a side-effecting operator 'SET TRANSACTION ISOLATION LEVEL' within a function.

If you want to use a function for this purpose, you will need to use WITH( NOLOCK ) on every catalog view in your query. You could also elect to use a procedure with an OUTPUT variable if that seems cleaner to you, it might look something like this:

CREATE OR ALTER PROCEDURE dbo.ObjectId
(
  @SchemaName sysname,
  @ObjectName sysname,
  @ObjectId int OUTPUT )
AS
BEGIN
  SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

  SELECT @ObjectId = Objs.object_id
    FROM sys.objects Objs
      INNER JOIN sys.schemas Schs
        ON Objs.schema_id = Schs.schema_id
    WHERE Schs.name = @SchemaName
      AND Objs.name = @ObjectName;
END;
GO

-- call the SP
DECLARE @ObjectId int;
EXECUTE dbo.ObjectId @SchemaName = N'dbo',
                     @ObjectName = N'ExampleObject',
                     @ObjectId = @ObjectId OUTPUT;
SELECT @ObjectId;
GO

By writing your own APIs, you can also control what return values you get if the object doesn’t exist. You can return -1 instead of NULL to make consuming code easier to write, for example.

Why even bother?

Aren’t these metadata queries typically very fast? Who cares about blocking in these cases?

If you are writing targeted SQL Server metadata queries (as in, limited to a single object), I think it is best to

first write a no-blocking query to get the object_id,

then to write a second query that targets the catalog view that contains the information that you need.

Sometimes larger metadata queries that access a few different catalog views are victimized by optimizer timeouts when a plan is being generated because the underlying structure of the views might be a lot more complicated than your query is letting on. What you end up with is a suboptimal plan that might do a scan in production where it might have been doing a seek in development. This leads to unexpected blocking. Blocking can lead to deadlocks. Deadlocks can be show-stopping problems depending on where they occur in the application.

It can be quite an undertaking to write a complete family of APIs to get every piece of metadata about every kind of object you might want. I think it makes more sense to follow good rules like the above than trying to tackle that larger project. This will allow you query SQL Server metadata in the safest possible way.

locking hints

TABLOCK is not always the table lock you were expecting

July 1, 2018July 1, 2018by jmanskeLeave a Comment

In order to take advantage of minimal logging and parallel insert (into heaps or clustered columnstore indexes), you have probably seen that it is a prerequisite to write your insert statements like this:

INSERT INTO DestTable WITH( TABLOCK )
( [Column list ... ] )
SELECT [...]

If you are moving a lot of data in batches across multiple sessions, you might expect it is OK to run statements like this concurrently with each other because execution of the statements will serialize; they all need a table-level exclusive (X) lock to run.

That is not the case. SQL Server will attempt to get an IX lock first, then convert that lock to a BU, or “bulk update” lock. This can cause problems when multiple statements targeting the same destination table run at the same time, even though the BU lock mode is compatible with itself.

If you are playing along at home, you can try this.
Create a target table:

DROP TABLE IF EXISTS dbo.BulkUpdateDestinationHeap;
CREATE TABLE dbo.BulkUpdateDestinationHeap
( Id bigint );
GO

In one session, get a lock on the table and hold it. We do this simply so we can set up other sessions to insert into the table.

BEGIN TRANSACTION;
SELECT Id
  FROM dbo.BulkUpdateDestinationHeap WITH ( TABLOCK, HOLDLOCK );

In two separate sessions, try to load some records into the table using the code pattern mentioned above:

INSERT INTO dbo.BulkUpdateDestinationHeap WITH ( TABLOCK )
( Id )
  SELECT TOP 1000000
         ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) )
    FROM master..spt_values t1
      CROSS JOIN master..spt_values t2;

In a fourth session, you can see what the existing locks and lock requests look like by looking at the sys.dm_tran_locks DMV. Here’s a query that will just return the relevant columns:

SELECT Locks.request_mode,
       Locks.resource_type,
       Tables.name resource_name,
       Locks.request_status,
       Locks.request_session_id
  FROM sys.dm_tran_locks Locks
    INNER JOIN sys.tables Tables
      ON Locks.resource_associated_entity_id = Tables.object_id
  WHERE Locks.resource_type <> 'DATABASE' -- ignore DATABASE lock
  ORDER BY Locks.request_session_id,
           Locks.request_mode;

This is what I see:
IXlockswaiting

Everything seems OK for now. Since IX locks are not compatitlble with S locks, both IX lock requests are waiting. However, since an IX lock is compatible with itself, both lock requests are granted as soon as the transaction in the first session commits or rolls back. When that happens, we see these lock requests:
bulkupdateconvert

If you wait a few seconds, eventually one of the sessions will be killed by the deadlock manager. This happens because each session needs to convert its IX lock to a BU lock to continue. BU locks are compatible with one another, but the BU lock mode is not compatible with IX. The BU lock request from session 53 is blocked by the currently held IX lock in session 63. Similarly, the BU lock request from session 63 is blocked by the currently held IX lock in session 53.

The XML deadlock report tells the same story in the resource list:

 <resource-list>
  <objectlock lockPartition="0"
              objid="658101385"
              subresource="FULL"
              dbid="5"
              objectname="LockDemos.dbo.BulkUpdateDestinationHeap"
              id="lock26db1093b00"
              mode="IX"
              associatedObjectId="658101385">
   <owner-list>
    <owner id="process26db55648c8"
           mode="IX" />
    <owner id="process26db55648c8"
           mode="BU"
           requestType="convert" />
   </owner-list>
   <waiter-list>
    <waiter id="process26dc4057468"
            mode="BU"
            requestType="convert" />
   </waiter-list>
  </objectlock>
  <objectlock lockPartition="0"
              objid="658101385"
              subresource="FULL"
              dbid="5"
              objectname="LockDemos.dbo.BulkUpdateDestinationHeap"
              id="lock26db1093b00"
              mode="IX"
              associatedObjectId="658101385">
   <owner-list>
    <owner id="process26dc4057468"
           mode="IX" />
    <owner id="process26dc4057468"
           mode="BU"
           requestType="convert" />
   </owner-list>
   <waiter-list>
    <waiter id="process26db55648c8"
            mode="BU"
            requestType="convert" />
   </waiter-list>
  </objectlock>
 </resource-list>

As for workarounds, depending on your workload you might consider using TABLOCKX instead of TABLOCK. When TABLOCKX is used, SQL Server requests and waits for an X lock on the object instead of an IX lock that it will convert to a BU lock. We don’t encounter the deadlock and both sessions can insert data into the target table. The sessions block each other and are forced to serialize, but there is no deadlock when we use TABLOCKX. You can run through the same demo above using TABLOCKX instead of TABLOCK to confirm.

We still get parallel insert when using TABLOCKX. This can be checked by looking at the query plan.

I wanted to check how many rows were logged to the transaction log to make sure I was still getting minimal logging. The function fn_dblog is undocumented but I have seen people use it for this purpose. I saw exactly the same number of rows logged whether I was using TABLOCK or TABLOCKX. I get 6280 rows for both queries (most times, it does not seem to be deterministic) when using the simple recovery model and 9191 rows when using the full recovery model. This is evidence to me that we still get minimal logging too. Here’s the code I used to test that:

/*
Compare how many rows are written to the transaction log
when using TABLOCK v. using TABLOCKX
*/

DECLARE @MaxLsn nvarchar(25);

-- truncate the table
TRUNCATE TABLE dbo.BulkUpdateDestinationHeap;

-- get max LSN
SELECT @MaxLsn = N'0x' + MAX( [Current LSN] )
  FROM sys.fn_dblog( NULL, NULL );

-- use tablock to insert rows
INSERT INTO dbo.BulkUpdateDestinationHeap WITH ( TABLOCK )
( Id )
  SELECT TOP 1000000
         ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ))
    FROM master..spt_values t1
      CROSS JOIN master..spt_values t2;

-- find out how many rows that was
SELECT COUNT( * ),
       'Using TABLOCK'
  FROM fn_dblog( @MaxLsn, NULL );

-- truncate again
TRUNCATE TABLE dbo.BulkUpdateDestinationHeap;

-- get max LSN
SELECT @MaxLsn = N'0x' + MAX( [Current LSN] )
  FROM sys.fn_dblog( NULL, NULL );

-- use tablockx to insert rows
INSERT INTO dbo.BulkUpdateDestinationHeap WITH ( TABLOCKX )
( Id )
  SELECT TOP 1000000
         ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ))
    FROM master..spt_values t1
      CROSS JOIN master..spt_values t2;

-- find out how many rows that was
SELECT COUNT( * ),
       'Using TABLOCKX'
  FROM fn_dblog( @MaxLsn, NULL );

The lesson here isn’t to use TABLOCKX instead of TABLOCK. The lesson is that when specifying a “table lock”, you aren’t guaranteeing the lock mode, but rather what resources you will take locks on.