SQL Database

Identifying unused or inefficient indexes in SQL

SQL Server, like other relational databases, utilises indexing as a way to improve query performance. In a nutshell indexing is used to find data fast.

However, it’s important to remember that indexing comes at a cost, and as a database administrator, or developer, understanding how indexing actually works is fundamental to maintaining a performant system. While the benefits of indexing can be described in a simple manner, the way they work is everything but. This post explains one way to manage only one facet of indexing we need to consider.

Each index needs to be maintained by the system. This means that the more indexes you have on a table, the longer it takes to insert, update, or delete records.

Indexing can benefit your system, but also hurt it in terms of performance if used incorrectly.

Identifying unused or inefficient indexes.

As systems and query requirements can change over time, new indexes may have been created and old ones left unused. It’s important to periodically check on usage.

The way I prefer to identify unused or inefficient indexes is through the use of DMV’s (Dynamic management views) based on statistics collected by the SQL Data Engine.

The SQL Data Engine gathers this data from the last point your SQL Server Instance became active. So it’s always important to remember your statistics may be misleading if you’ve had a recent restart. The easiest way to check is through the following command;

-- Run this to see how long ago the DMV usage stats were cleared…
SELECT DATEDIFF (DAY, sd.crdate , GETDATE ()) AS days_history
FROM sys .sysdatabases sd
WHERE sd. [name] = 'tempdb' ;

As a general rule of thumb, the more days the better, but you should know your system well enough to understand when a true reflection of normal usage has been collected.

Once you know you have enough statistics data to make an informed decision run the following query which presents potentially inefficient non-clustered indexes (writes > reads) ;

/*
This sys.dm_db_index_usage_stats query filters by the current database, and only includes non-clustered indexes. It can help you decide whether the cost of
maintaining a particular index outweighs the benefit you are receiving from having it in place.
*/
SELECT OBJECT_NAME (ddius. [object_id]) AS [Table Name] ,
i .name AS [Index Name] ,
i .index_id ,
user_updates AS [Total Writes] ,
user_seeks + user_scans + user_lookups AS [Total Reads] ,
user_updates - ( user_seeks + user_scans + user_lookups )
AS [Difference]
FROM sys .dm_db_index_usage_stats AS ddius WITH ( NOLOCK )
INNER JOIN sys. indexes AS i WITH ( NOLOCK )
ON ddius. [object_id] = i .[object_id]
AND i. index_id = ddius .index_id
WHERE OBJECTPROPERTY (ddius. [object_id], 'IsUserTable') = 1
AND ddius. database_id = DB_ID()
AND user_updates > ( user_seeks + user_scans + user_lookups )
AND i. index_id > 1
ORDER BY [Difference] DESC ,
[Total Writes] DESC ,
[Total Reads] ASC ;

Sample results live system

So here’s a sample of my results. For each potential inefficient index I can see the number of writes (cost) and reads (benefits) for each index since the statistics collection started. This gives me an indicator as to what is potentially an inefficient index, but to decide you really need to understand your own system. I’ll take you through my thought process on those 5 indexes based on the context of my own system;

  • Index_ID 18: Large amount of Total Reads, I know this index serves a query for a popular system report. I will keep this index.
  • Index_ID 21: Small amount of Total Reads vs Total Writes. This figure would come under consideration, could there be a solution such as including a field from this index in another index, and then removing this one? Possibly, but further investigation required. The importance of knowing your own system is highlighted on this one. I know the query that utilises this serves a monthly report to senior management, which could explain the low reads, but the benefits of delivering this report with a large amount of data fast require it to remain, unless I find an alternative solution.
  • Index_ID 19: Keep, it’s obvious.
  • Index_ID 2: This is on a new table as part of a new development that we are about to release. We are already collecting the data, but the only reads will have come from our developers and test user group through testing and doesn’t reflect normal usage once it goes live.
  • Index_ID 20: This one is just costing us with no benefit and as we can see from the standard name (based on creation date which I prefer for these purposes) it’s not used and has existed for a long time. This will be removed to reduce cost on performance and storage.

The most important thing to remember is that although the statistics can display a straightforward right or wrong indicator, often the correct action lies in the fuzzy space in-between, where you must understand your own system and interpret the data.

The adjusted DMV query was taken from the book Performance Tuning with SQL Server Dynamic Management Views by Louis Davidson and Tim Ford. 

Performance tuning with SQL Server Dynamic Management Views

Leave a comment