分享

Correlated Joins Using "Apply"

 waywin 2008-06-18

Correlated Joins Using "Apply"

By Gregory Jackson, 2008/06/16

Correlated Joins Using ‘Apply‘

One of my cohorts came to me a while back with a SQL Query challenge. He was creating a report for our Live Voice call center application in which he needed to list all Live Voice Transactions within a given date range (easy stuff so far).

The business rules, however, dictate that a Live Voice transaction can result in 1 to many Live Voice calls actually being made. The call center rep will continue making phone calls until they have either reached the customer in question or until it is determined that the call cannot be completed (bad phone number, etc).

This report was required to only show the 3 most recent calls made for each transaction and for each of the calls, it was required to display the call‘s disposition (busy, voice mail, no answer, etc). Hmmm.....that‘s a little tougher.

Take the following:

--INNER JOIN
SELECT
lvt.ID,
lvt.PhoneNumber,
lvt.CreateDate,
lvc.CallWindowStart,
lvc.CallWindowEnd,
lvc.LVCallDispositionID
FROM
LVTransaction lvt
INNER JOIN
LVCall lvc on lvc.LVTransactionID = lvt.ID
Order By
Lvt.CreateDate,lvc.CreateDate DESC

This example gives me All LVCall records for each LVTransaction. Some LVTransactions have up to a dozen calls made before a customer is reached or before it‘s determined that the call cannot be successfully completed. So, this query is close, but not yet what we need.

Any attempts to limit the joined records with correlated logic results in errors due to the fact that the outer query columns are not available to the inner query (out of scope).

Correlated Join?:

--Correlated INNER JOIN Attempt
SELECT
lvt.ID,
lvt.PhoneNumber,
lvt.CreateDate,
lvc.CallWindowStart,
lvc.CallWindowEnd,
lvc.LVCallDispositionID
FROM
LVTransaction lvt
INNER JOIN
(SELECT TOP 3 * FROM LVCall WHERE LVTransactionID = lvt.ID ORDER BY CreateDate DESC) lvc on lvc.LVTransactionID = lvt.ID

This frugal attempt yields the following...


Msg 4104, Level 16, State 1, Line 2
The multi-part identifier "lvt.ID" could not be bound.

The plot thickens...

So I muddled around with a few ideas, attempting various flavors of correlated subqueries, using temp tables, using User Defined functions, etc. and I was not able to come up with a solution that I felt adequately met the requirements.

This seemingly simple problem was turning into a fairly painful ordeal.....welcome to my world.

After searching the web for a bit, I finally came up with a workable solution. The APPLY SQL extension introduced with the release of SQL Server 2005. The APPLY statement can be used in your SQL FROM Clause and allows you to call a table returning function for each row of the outer query. Furthermore (and more importantly for our example), it allows you to pass in outer query columns as arguments to the called function.

The apply statement comes in 2 flavors: CROSS APPLY and OUTER APPLY. CROSS APPLY will return All records in the outer query that have a matching record returned by the inner function\query (Similar to an Inner Join). OUTER APPLY will return all records in the outer query whether they have a matching record in the inner function\query or not (Similar to an Outer Join).

In researching the APPLY statement for this article, I came across the following article by Bill Graziano. using CROSS APPLY in SQL Server 2005. This excellent article discusses a very similar scenario to mine and presents the following solution (Using CROSS APPLY):

CROSS APPLY On A UDF:

SELECT C.CustomerID,
O.SalesOrderID,
O.TotalDue
FROM
AdventureWorks.Sales.Customer AS C
CROSS APPLY
AdventureWorks.dbo.fn_GetTopOrders(C.CustomerID, 3) AS O
ORDER BY
CustomerID ASC, TotalDue DESC

Notice that Bill is passing in a "3" to the function to indicate that he wants the 3 TOP orders for each customer (Nearly identical to what we need).

The function used above is defined as follows:

CREATE FUNCTION dbo.fn_GetTopOrders(@custid AS int, @n AS INT)
RETURNS TABLE
AS
RETURN
SELECT TOP(@n) *
FROM Sales.SalesOrderHeader
WHERE CustomerID = @custid
ORDER BY TotalDue DESC
GO

So, although this example gives a great example of using CROSS APPLY, it relies on the use of a UDF, which will work great, but I‘d like to avoid it if possible. When devising my solution, I realized that a subquery behaves very much like a function in that it essentially returns a set of records to the caller (outer query) so I thought to myself that we should be able to use the OUTER APPLY statement on a subquery......like this:

OUTER APPLY against a SubQuery:

--OUTER APPLY
SELECT
lvt.ID,
lvt.PhoneNumber,
lvt.CreateDate,
lvc.CallWindowStart,
lvc.CallWindowEnd,
lvc.LVCallDispositionID
FROM
LVTransaction lvt
OUTER APPLY
(
SELECT top 3 *
FROM LVCall
WHERE lvtransactionID = lvt.ID
Order By CreateDate DESC
) as lvc

VOILA !!!!

Notice, my solution does the same thing that Bill‘s solution does but it doesn‘t require a UDF so it‘s a little easier to implement and to read, etc.

This is the pattern we‘ve now been using at my work to implement these types of queries. It‘s been easy to teach to other engineers on the team, it works well and it doesn‘t require creating separate UDF‘s for implementation.

In reading Bill‘s article, I also noticed a footnote showing another solution that someone else had sent to him that uses ROW_NUMBER() and PARTITIONING. This was actually very interesting to me. I have to admit that I‘d never even heard of these features until coming across the article (I guess we learn something every day).

 

Here‘s the solution to my problem using this technique:

ROW_Number() and Partitioning:
--ROW_NUMBER
SELECT
ID AS LVTransactionID,
PhoneNumber,
CreateDate,
CallWindowStart,
CallWindowEnd,
LVCallDispositionID
FROM
(
SELECT
lvt.ID,
lvt.PhoneNumber,
lvt.CreateDate,
lvc.CallWindowStart,
lvc.CallWindowEnd,
lvc.LVCallDispositionID
FROM
LVTransaction lvt
INNER JOIN
(
SELECT *, ROW_Num = ROW_NUMBER() OVER
(PARTITION BY LVCall.LVTransactionID
ORDER BY LVCall.CreateDate Desc)
FROM LVCall
) lvc on lvc.LVTransactionID = lvt.ID AND lvc.ROW_NUM <= 3
) as result

This interesting solution provides the exact same resultset as Bill‘s original example and as my eventual solution. It seems a little more complicated to me, but I have to admit, the subtree cost displayed in the Estimated Query Plan suggests that it would perform better than my solution. I plan on investigating this approach further when I have time.

I‘m certain that there are other solutions to this problem. My intent however, with this article is to introduce some T-SQL extensions newly introduced in SQL Server 2005 that you may not have yet come across and to provide a real world example on how their use can solve a real business problem. As other solutions to this same problem come up, I would love to see them. I am concerned with performance as well as ease of application due to the fact that the techniques used must be dispersed throughout our organization so that we are consistent in our implementation with these sorts of problems

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多