The conventional wisdom in T-SQL programming often advises against using cursors in SQL Server, citing performance concerns and potential issues. This advice stems from the understanding that cursors, which process data row by row, can lead to less efficient operations compared to set-based approaches. However, there are scenarios where the task at hand necessitates iterating through data one row at a time, such as when executing dynamic SQL statements stored within a table.
SQL Server provides multiple strategies for iterating over data, including:
- Cursors combined with
WHILEloops. WHILEloops with a static dataset.- Recursion, which is particularly suited for hierarchical data and will be discussed in a future post.
This blog post compares the first two methods and provides recipes for generating your own loops to run dynamic SQL. For more comprehensive writings on the debate of which looping method to use, a quick internet search will yield numerous results where this is discussed.
While loops are generally praised for their low overhead and simplicity, potentially offering a performance edge, my experiments yielded some unexpected results.
One technique I use when performing a WHILE loop using a table-based approach involves leveraging the @@ROWCOUNT system variable, which returns the number of records affected by the preceding statement. This variable can be particularly useful in iterations with a static dataset, providing a straightforward mechanism to control loop execution. You can check out my usage of the @@ROWCOUNT in my code below.
I also want to add here that one method I employ when executing dynamic SQL is storing the SQL statement within a table and then utilizing the STRING_AGG and REPLACE functions to assemble the final SQL statement for execution. This approach streamlines the construction of SQL statements, eliminating the complexity of manually concatenating lines and managing variables within the string, which often makes reading the SQL statement difficult. I have also added a code block below to show my preferred method of executing dynamic SQL.
Contrary to common expectations, my testing revealed that cursors performed better in my specific use case. I constructed an example where I created a numbers table and iterated through it to update a column. The operation with the cursor routinely completed in approximately 9 seconds, whereas the WHILE loop using the @@ROWCOUNT with a static dataset took about 17 seconds to finish.
This finding underscores the importance of not dismissing cursors outright in every situation. While it’s true that set-based operations are generally more efficient in SQL Server, certain scenarios, particularly those involving dynamic SQL stored in tables or specific row-by-row manipulations, might benefit from the targeted use of cursors. The key takeaway is to evaluate each case on its own merits, considering the specific requirements and testing various approaches to identify the most effective solution.
Below uses a cursor to loop through the Numbers dataset.
--10,000 records takes 9 seconds
SET NOCOUNT ON;
DROP TABLE IF EXISTS #Numbers;
GO
SELECT VALUE, CAST(NULL AS INTEGER) AS UpdateColumn
INTO #Numbers
FROM GENERATE_SERIES(1,10000);
GO
DECLARE @vCurrentNumber INTEGER;
-- Declare the cursor
DECLARE cur CURSOR FOR SELECT Value FROM #Numbers;
-- Open the cursor
OPEN cur;
-- Fetch the first row
FETCH NEXT FROM cur INTO @vCurrentNumber;
-- Loop through the rows
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE #Numbers
SET UpdateColumn = Value + 1
WHERE Value = @vCurrentNumber;
-- Fetch the next row
FETCH NEXT FROM cur INTO @vCurrentNumber;
END
-- Close and deallocate the cursor
CLOSE cur;
DEALLOCATE cur;
SELECT * FROM #Numbers;
GO
Here is the code to use a WHILE loop using the @@ROWCOUNT system variable.
--10,000 records takes 17 seconds
SET NOCOUNT ON;
GO
DROP TABLE IF EXISTS #Numbers;
GO
SELECT VALUE, CAST(NULL AS INTEGER) AS UpdateColumn
INTO #Numbers
FROM GENERATE_SERIES(1,10000);
--FROM GENERATE_SERIES(1,10);
GO
DECLARE @vCurrentNumber INTEGER;
SELECT TOP 1
@vCurrentNumber = Value
FROM #Numbers;
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE #Numbers
SET UpdateColumn = Value + 1
WHERE Value = @vCurrentNumber;
SELECT TOP 1
@vCurrentNumber = Value
FROM #Numbers
WHERE UpdateColumn IS NULL;
END;
GO
SELECT * FROM #Numbers;
GO
Here is the code I use if I need to run dynamic SQL statements. Feel free to add this code to your SQL recipe book.
SET NOCOUNT ON;
GO
DROP TABLE IF EXISTS #DataProfiling;
DROP TABLE IF EXISTS #DataProfilingSQL;
GO
CREATE TABLE #DataProfilingSQL
(
DataProfilingType INTEGER NOT NULL,
OrderID INTEGER NOT NULL,
SQLLine VARCHAR(100) NOT NULL,
PRIMARY KEY (DataProfilingType, OrderID)
);
GO
-----------------------------------------------------------------
-----------------------------------------------------------------
DECLARE @vSchemaName NVARCHAR(100) = 'Your Schema Name Here';---------------Need to set
DECLARE @vTableName NVARCHAR(100) = 'Your Table Name Here'; ----------------Need to set
----------------------------------------------------------------------------------------
--This SQL statement determines the record count of the column
INSERT INTO #DataProfilingSQL (DataProfilingType, OrderID, SQLLine) VALUES
(1,1,'UPDATE #DataProfiling SET RecordCount ='),
(1,2,'('),
(1,3,'SELECT COUNT([ColumnName])'),
(1,4,'FROM SchemaName.TableName'),
(1,5,')'),
(1,6,'WHERE RowNumber = vRowNumber');
----------------------------------------------------------------------------------------
--This SQL statement determines the count of empty strings in a column
INSERT INTO #DataProfilingSQL (DataProfilingType, OrderID, SQLLine) VALUES
(2,1,'UPDATE #DataProfiling SET RecordCount ='),
(2,2,'('),
(2,3,'SELECT COUNT([ColumnName])'),
(2,4,'FROM SchemaName.TableName'),
(2,5,'WHERE [ColumnName] = '''''),
(2,6,')'),
(2,7,'WHERE RowNumber = vRowNumber');
----------------------------------------------------------------------------------------
--This SQL statement determines the count of NULL markers
INSERT INTO #DataProfilingSQL (DataProfilingType, OrderID, SQLLine) VALUES
(3,1,'UPDATE #DataProfiling SET RecordCount ='),
(3,2,'('),
(3,3,'SELECT COUNT([ColumnName])'),
(3,4,'FROM SchemaName.TableName'),
(3,5,'WHERE [ColumnName] IS NULL'),
(3,6,')'),
(3,7,'WHERE RowNumber = vRowNumber');
----------------------------------------------------------------------------------------
--This SQL statement determines the count of records that have two spaces
INSERT INTO #DataProfilingSQL (DataProfilingType, OrderID, SQLLine) VALUES
(4,1,'UPDATE #DataProfiling SET RecordCount ='),
(4,2,'('),
(4,3,'SELECT COUNT([ColumnName])'),
(4,4,'FROM SchemaName.TableName'),
(4,5,'WHERE [ColumnName] LIKE ''% %'''),
(4,6,')'),
(4,7,'WHERE RowNumber = vRowNumber');
--This SQL statement determines the count of records that have two "demo" or "test" in the string
INSERT INTO #DataProfilingSQL (DataProfilingType, OrderID, SQLLine) VALUES
(5,1,'UPDATE #DataProfiling SET RecordCount ='),
(5,2,'('),
(5,3,'SELECT COUNT([ColumnName])'),
(5,4,'FROM SchemaName.TableName'),
(5,5,'WHERE [ColumnName] LIKE ''%demo%'''),
(5,6,' OR [ColumnName] LIKE ''%test%'''),
(5,7,')'),
(5,8,'WHERE RowNumber = vRowNumber');
--From the above #DataProfilingSQL table, choose which statement to run. The default is 1.
DECLARE @vSQLStatement NVARCHAR(1000) = (SELECT STRING_AGG(SQLLine,' ') FROM #DataProfilingSQL WHERE DataProfilingType = 1); --default is 1
PRINT @vSQLStatement;
----------------------------------------------------------------------------------------
SET @vSQLStatement = REPLACE(@vSQLStatement,'SchemaName',@vSchemaName);
SET @vSQLStatement = REPLACE(@vSQLStatement,'TableName',@vTableName);
SELECT ROW_NUMBER() OVER (ORDER BY t.[Name], c.[Name]) AS RowNumber,
@@SERVERNAME AS ServerName,
s.[Name] AS SchemaName,
t.[Name] AS TableName,
c.[Name] AS ColumnName,
ty.[Name] AS DataType,
REPLACE(@vSQLStatement,'ColumnName',c.[Name]) AS SQLStatement,
CAST(NULL AS BIGINT) AS RecordCount
INTO #DataProfiling
FROM sys.Schemas s LEFT OUTER JOIN
sys.Tables t ON s.Schema_id = t.Schema_id INNER JOIN
sys.Columns c ON t.Object_id = c.Object_id INNER JOIN
sys.Types ty ON ty.User_Type_ID = c.User_Type_ID
WHERE 1=1 AND
s.[Name] = @vSchemaName AND
t.[Name] = @vTableName
AND ty.Name NOT IN ('XML','uniqueidentifier')-----------Modify as needed
ORDER BY 1,2,3,4,5;
GO
-----------------------------------------------------------------
-----------------------------------------------------------------
DECLARE @vRowNumber INTEGER;
DECLARE @vSQLStatement VARCHAR(8000);
DECLARE mycursor CURSOR FOR (SELECT RowNumber, SQLStatement FROM #DataProfiling);
OPEN mycursor;
FETCH NEXT FROM mycursor INTO @vRowNumber, @vSQLStatement;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @vSQLStatement = REPLACE(@vSQLStatement,'vRowNumber',@vRowNumber)
EXEC (@vSQLStatement);
FETCH NEXT FROM mycursor INTO @vRowNumber, @vSQLStatement;
END;
CLOSE mycursor;
DEALLOCATE mycursor;
SELECT * FROM #DataProfiling ORDER BY 1;
Happy coding!
