By: Alex Podlesny
Inspired by an interview question.
Or what to do when you get iced
from all 360 degrees.
Part One - Having Fun
How easy it is to move a condition from WHERE clause to a JOIN clause? The answer is - very easy. Some of us might say that is is esthetically better, some of us would say that semantically join is for joining while WHERE clause invented to introduce predicates. While others even argue that there exist a relational algebra rule that should allows interchangeability of the predicates.
All of those points are right in their own way. Lets try to move predicates around and see what surprising features we can uncover.
As a starting point we will have two tables the Product and the OrderDetails and one important stored procedure GetOrderDetails that returns order details data enriched with a relevant product information.
CREATE TABLE Product (
ProductID INT,
Name VARCHAR(50)
);
GO
CREATE TABLE OrderDetails(
OrderID INT,
ProductID INT,
NumberOfItems INT
);
GO
CREATE PROC GetOrderDetails
@OrderID INT AS
BEGIN
--??? logic goes here
END
GO
We populate our table with following data and start our experiment:
INSERT INTO Product (ProductID, Name)
VALUES
(1,'TV'),
(2,'Tablet'),
(3, 'Phone'),
(4, 'Dress')
INSERT INTO OrderDetails (OrderID, ProductID, NumberOfItems)
VALUES
(1, 1, 1), --TV
(2, 2, 1), --Tablet
(3, 3, 1), --Phone
(4, 4, 1), --Dress
(4, 1, 1) --TV
SELECT 'All Data in Product', * FROM Product
SELECT 'All Data in OrderDetails',* FROM OrderDetails
First four disappointing samples use the inner join.
By default JOIN stays for an INNER JOIN. The "inner" qualifier usually get omitted by developers, this is normal practice but the a better practice is to spell-out the complete join description so that your fellow developer, who is working at 2:00AM in the morning will not have to remember all cool intricacies of T-SQL. Here they are, all four stored procedures:
-- 1 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'INNER - 1 - Product & OrderDetails - WHERE' AS Notes
FROM Product AS p
JOIN OrderDetails AS od
ON p.ProductID = od.ProductID
WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
-- 2 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'INNER - 1 - Product & OrderDetails - JOIN' AS Notes
FROM Product AS p
JOIN OrderDetails AS od
ON p.ProductID = od.ProductID
AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
-- 3 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'INNER - 2 - OrderDetails & Product - WHERE' AS Notes
FROM OrderDetails AS od
JOIN Product AS p
ON p.ProductID = od.ProductID
WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
-- 4 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'INNER - 2 - OrderDetails & Product - JOIN' AS Notes
FROM OrderDetails AS od
JOIN Product AS p
ON p.ProductID = od.ProductID
AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
all four procedures return the same expected result, it is even getting boring:
Lets sharpen our refactoring skills and use the left outer join.
The "outer" means (as we all know) that all rows from the left table not meeting the join condition are included in the result, set is usually. Use of the "outer" qualifier is optional so that LEFT OUTER JOIN = LEFT JOIN, not going to mention about best practices this time...
So what will happen if we will use LEFT JOIN? Lets try...
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'LEFT JOIN - 1 - Product & OrderDetails - WHERE' AS Notes
FROM Product AS p
LEFT JOIN OrderDetails AS od
ON p.ProductID = od.ProductID
WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
Nothing serious happens at first, we have the same result, thank you to predicate in WHERE clause:
Should we move the predicate? Lets do it:
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'LEFT JOIN - 1 - Product & OrderDetails - JOIN' AS Notes
FROM Product AS p
LEFT JOIN OrderDetails AS od
ON p.ProductID = od.ProductID
AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
Surprised? Don't be, because this query returns all 4 Product records, as intended by LEFT OUTER JOIN and replaces all those non-matching order details data with NULLs.
This result listed below is not what we really want...
Lets go back to predicate in the WHERE clause and flip tables around, for those very smart you can try RIGHT OUTER JOIN instead,
and see how it works:
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'LEFT JOIN - 2 - OrderDetails & Product - WHERE' AS Notes
FROM OrderDetails AS od
LEFT JOIN Product AS p
ON p.ProductID = od.ProductID
WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
With this change we are back to expected two (2) records for Order number 4
Let's have more fun, let's move it again !
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'LEFT JOIN - 2 - OrderDetails & Product - JOIN' AS Notes
FROM OrderDetails AS od
LEFT JOIN Product AS p
ON p.ProductID = od.ProductID
AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
It is not a surprise any more, it is an introduced-unexpected-feature, now we are getting all the details for all orders, while product name is populated only for the Order #4.
This is very expensive feature in terms of performance impact and in terns of data exposure, new code will pull all records from OrderDetails table. Here is the result:
Part Two - Deal with it
As you can see, moving a given predicate from WHERE clause to a JOIN clause can dramatically alter expected result. Visually such a change is very hard to catch, especially when reviewing complex queries. Imagine if code will be also include cumbersome comments that migh lead reviewer to believe that behavior was actually intended. An attempt to run such query against unreliable data in staging environment may not expose an error or expose it inconsistently, reducing testing efficiency and allowing such an ambiguous code to be released to production.
Here is how we can protect the user, help testing team, remove a burden from a reviewer and improve confidence in a code.
Let's cover the code with the Unit Test! Here is the code:
USE DBTestDriven
GO
--************************************************************************************************
EXEC DBTD_DROP_PROC_IF_EXISTS 'UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder'
GO
--************************************************************************************************
CREATE PROC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder
@v_Debug INT = 0 --by default do not return debud info
AS
BEGIN
EXEC DBTD_UNIT_TEST 'OrdersBusinessLogic'
EXEC DBTD_USE_TRANSACTION 'We are going to delete data from Product and
OrderDetails table and need transation to restore everything back'
DELETE FROM Product --this is brutal
DELETE FROM OrderDetails --this is brutal
--add predictable set of data
INSERT INTO Product ( ProductID, Name)
VALUES (1,'TV'), (2,'Tablet'), (3, 'Phone'), (4, 'Dress')
INSERT INTO OrderDetails ( OrderID, ProductID, NumberOfItems)
VALUES
(1, 1, 1), --TV
(2, 2, 1), --Tablet
(3, 3, 1), --Phone
(4, 4, 1), --Dress
(4, 1, 2) --TV
CREATE TABLE #ExpectedResult(
OrderID INT,
ProductID INT,
Name VARCHAR(50),
NumberOfItems INT
)
CREATE TABLE #ActualResult(
OrderID INT,
ProductID INT,
Name VARCHAR(50),
NumberOfItems INT,
Notes VARCHAR(250)
)
--set expectations
INSERT INTO #ExpectedResult
(OrderID, ProductID, Name, NumberOfItems)
VALUES (4, 4, 'Dress', 1),
(4, 1, 'TV', 2)
--Run business logic that we are trying to verity.
INSERT INTO #ActualResult
EXEC GetOrderDetails @OrderID = 4
--Check expectations
IF @v_Debug != 0
BEGIN
--get debug info before running assert
SELECT 'Product', * FROM Product
SELECT 'OrderDetails',* FROM OrderDetails
SELECT '#ActualResult', * FROM #ActualResult
SELECT '#ExpectedResult',* FROM #ExpectedResult
END
EXEC DBTD_ASSERT_ROWSETS_ARE_EQUAL
'','','#ActualResult',
'','','#ExpectedResult',
'USE EXPECTED COLUMNS',
'something is wrong with GetOrderDetails procedure'
END
GO
Run Unit test manually
BEGIN TRAN
EXEC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder @v_Debug = 1
ROLLBACK
when running our code against last version of the procedure that user LEFT OUTER JOIN and predicate in the join clause we get following error: #ActualResult and #ExpectedResult recordsets should have same number of rows. something is wrong with GetOrderDetails procedure.
you would know why Expected and Actual result is different after examining the table output listed below

Now lets use one of the earlier created versions of the GetOrderDetails procedure
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
SELECT
od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
'INNER - 1 - Product & OrderDetails - JOIN' AS Notes
FROM Product AS p
INNER JOIN OrderDetails AS od
ON p.ProductID = od.ProductID
AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
and rerun the unit test
BEGIN TRAN
EXEC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder @v_Debug = 1
ROLLBACK
as you can see below - everything is fine, we got our expected records, code is safe
Conclusion
Wake up when you doing code review or answering an interview question, do not assume - read the code.
Or better - cover you code with unit tests.
See Also