Posts From July, 2015

Predicates in WHERE clause or JOIN clause? 

By: Alex Podlesny
Inspired by an interview question.
Or what to do when you get iced
from all 360 degrees.
Part One - Having Fun 
How easy it is to move a condition from WHERE clause to a JOIN clause? The answer is - very easy. Some of us might say that is is esthetically better, some of us would say that semantically join is for joining while WHERE clause invented to introduce predicates. While others even argue that there exist a relational algebra rule that should allows interchangeability of the predicates.
All of those points are right in their own way. Lets try to move predicates around and see what surprising features we can uncover. 
 
As a starting point we will have two tables the Product and the OrderDetails and one important stored procedure GetOrderDetails that returns order details data enriched with a relevant product information.

CREATE TABLE Product (
  ProductID INT,
  Name VARCHAR(50)
);
GO

CREATE TABLE OrderDetails(
  OrderID INT,
  ProductID INT,
  NumberOfItems INT
);
GO

CREATE PROC GetOrderDetails
   @OrderID INT AS
BEGIN
  --??? logic goes here
END
GO

We populate our table with following data and start our experiment:

INSERT INTO Product (ProductID, Name)
VALUES
  (1,'TV'),
  (2,'Tablet'),
  (3, 'Phone'),
  (4, 'Dress')

INSERT INTO OrderDetails (OrderID, ProductID, NumberOfItems)
VALUES
  (1, 1, 1), --TV
  (2, 2, 1), --Tablet
  (3, 3, 1), --Phone
  (4, 4, 1), --Dress
  (4, 1, 1)  --TV

SELECT 'All Data in Product', * FROM Product
SELECT 'All Data in OrderDetails',* FROM OrderDetails

 
First four disappointing samples use the inner join.
By default JOIN stays for an INNER JOIN. The "inner" qualifier usually get omitted by developers, this is normal practice but the a better practice is to spell-out the complete join description so that your fellow developer, who is working at 2:00AM in the morning will not have to remember all cool intricacies of T-SQL. Here they are, all four stored procedures:
 

-- 1 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 1 - Product & OrderDetails - WHERE' AS Notes
  FROM Product AS p
    JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

-- 2 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 1 - Product & OrderDetails - JOIN' AS Notes
  FROM Product AS p
    JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

-- 3 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 2 - OrderDetails & Product - WHERE' AS Notes
  FROM OrderDetails AS od
    JOIN Product AS p
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

-- 4 ***************************************************
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'INNER - 2 - OrderDetails & Product - JOIN' AS Notes
  FROM OrderDetails AS od
    JOIN Product AS p
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO

all four procedures return the same expected result, it is even getting boring:

 
Lets sharpen our refactoring skills and use the left outer join.
The "outer" means (as we all know) that all rows from the left table not meeting the join condition are included in the result,  set is usually. Use of the "outer" qualifier is optional so that LEFT OUTER JOIN = LEFT JOIN, not going to mention about best practices this time...
So what will happen if we will use LEFT JOIN? Lets try...

ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 1 - Product & OrderDetails - WHERE' AS Notes
  FROM Product AS p
    LEFT JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO 

Nothing serious happens at first, we have the same result, thank you to predicate in WHERE clause: 
 
 
Should we move the predicate? Lets do it:
 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 1 - Product & OrderDetails - JOIN' AS Notes
  FROM Product AS p
    LEFT JOIN OrderDetails AS od
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
Surprised? Don't be, because this query returns all 4 Product records, as intended by LEFT OUTER JOIN and replaces all those non-matching order details data with NULLs.
This result listed below is not what we really want...

 
Lets go back to predicate in the WHERE clause and flip tables around, for those very smart you can try RIGHT OUTER JOIN instead,
and see how it works:
 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 2 - OrderDetails & Product - WHERE' AS Notes
  FROM OrderDetails AS od
    LEFT JOIN Product AS p
     ON p.ProductID = od.ProductID
  WHERE od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
With this change we are back to expected two (2) records for Order number 4

 
Let's have more fun, let's move it again !
 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
  SELECT
    od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
    'LEFT JOIN - 2 - OrderDetails & Product - JOIN' AS Notes
  FROM OrderDetails AS od
    LEFT JOIN Product AS p
     ON p.ProductID = od.ProductID
     AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
It is not a surprise any more, it is an introduced-unexpected-feature, now we are getting all the details for all orders, while product name is populated only for the Order #4.
This is very expensive feature in terms of performance impact and in terns of data exposure, new code will pull all records from OrderDetails table. Here is the result:
 
 
Part Two - Deal with it

As you can see, moving a given predicate from WHERE clause to a JOIN clause can dramatically alter expected result. Visually such a change is very hard to catch, especially when reviewing complex queries. Imagine if code will be also include cumbersome comments that migh lead reviewer to believe that behavior was actually intendedAn attempt to run such query against unreliable data in staging environment may not expose an error or expose it inconsistently, reducing testing efficiency and allowing such an ambiguous code to be released to production.

Here is how we can protect the user, help testing team, remove a burden from a reviewer and improve confidence in a code.

Let's cover the code with the Unit Test! Here is the code: 

USE DBTestDriven
GO
--************************************************************************************************
EXEC DBTD_DROP_PROC_IF_EXISTS 'UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder'
GO
--************************************************************************************************
CREATE PROC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder
   @v_Debug INT = 0 --by default do not return debud info
AS
BEGIN
   EXEC DBTD_UNIT_TEST 'OrdersBusinessLogic'
   EXEC DBTD_USE_TRANSACTION 'We are going to delete data from Product and
                              OrderDetails table and need transation to restore everything back'

   DELETE FROM Product --this is brutal
   DELETE FROM OrderDetails --this is brutal

   --add predictable set of data
   INSERT INTO Product ( ProductID, Name)
     VALUES (1,'TV'), (2,'Tablet'), (3, 'Phone'), (4, 'Dress')

   INSERT INTO OrderDetails ( OrderID, ProductID, NumberOfItems)
     VALUES 
      (1, 1, 1), --TV
      (2, 2, 1), --Tablet
      (3, 3, 1), --Phone
      (4, 4, 1), --Dress
      (4, 1, 2)  --TV

   CREATE TABLE #ExpectedResult(
     OrderID INT,
     ProductID INT,
     Name VARCHAR(50),
     NumberOfItems INT
   )

   CREATE TABLE #ActualResult(
     OrderID INT,
     ProductID INT,
     Name VARCHAR(50),
     NumberOfItems INT,
     Notes VARCHAR(250)
   )

   --set expectations
   INSERT INTO #ExpectedResult
     (OrderID, ProductID, Name, NumberOfItems)
     VALUES (4, 4, 'Dress', 1),
            (4, 1, 'TV',    2)

   --Run business logic that we are trying to verity.
   INSERT INTO #ActualResult
     EXEC GetOrderDetails @OrderID = 4

   --Check expectations
   IF @v_Debug != 0
   BEGIN
     --get debug info before running assert
     SELECT 'Product', * FROM Product
     SELECT 'OrderDetails',* FROM OrderDetails
     SELECT '#ActualResult', * FROM #ActualResult
     SELECT '#ExpectedResult',* FROM #ExpectedResult
   END

   EXEC DBTD_ASSERT_ROWSETS_ARE_EQUAL
     '','','#ActualResult',
     '','','#ExpectedResult',
     'USE EXPECTED COLUMNS',
     'something is wrong with GetOrderDetails procedure'
END
GO 

Run Unit test manually 
BEGIN TRAN
EXEC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder @v_Debug = 1
ROLLBACK
 
when running our code against last version of the procedure that user LEFT OUTER JOIN and predicate in the join clause we get following error: #ActualResult and #ExpectedResult recordsets should have same number of rows. something is wrong with GetOrderDetails procedure. 

 
you would know why Expected and Actual result is different after examining the table output listed below
 
Now lets use one of the earlier created versions of the GetOrderDetails procedure 
ALTER PROC GetOrderDetails (@OrderID INT) AS
BEGIN
   SELECT
     od.OrderID, od.ProductID, p.Name, od.NumberOfItems,
     'INNER - 1 - Product & OrderDetails - JOIN' AS Notes
   FROM Product AS p
     INNER JOIN OrderDetails AS od 
      ON p.ProductID = od.ProductID
      AND od.OrderID = @OrderID
END
GO
EXEC GetOrderDetails @OrderID = 4
GO
 
and rerun the unit test 
BEGIN TRAN
EXEC UT_OrdersBusinessLogic_ReturnsOrderDetailsForAGivenOrder @v_Debug = 1
ROLLBACK
 
as you can see below - everything is fine, we got our expected records, code is safe

 
 
 
Conclusion 
Wake up when you doing code review or answering an interview question, do not assume - read the code.
Or better - cover you code with unit tests.  
 
 

See Also

Thursday, July 16, 2015 2:24:00 PM
  • RSS
  • Add To My MSN
  • Add To Windows Live
  • Add To My Yahoo
  • Add To Google

Statistics

  • Entries (13)
  • Comments (0)