Thursday, December 27, 2007

T-SQL hacks, getting Nth highest, top N and paging

Its time for T-SQL

Some of the common scenario for data retrieval could be (without using TOP keyword remember oracle doesn't have TOP)
  • Getting Nth highest/lowest value column
  • Getting top Nth highest/lowest value columns
  • For paging purpose, getting X row for page n i.e. getting column 11 to 20)
Here is some of my hecks for obtaining above results. I have used NorthWind orders table for this purpose.

For bullet point 1: Getting Nth highest, here is the t-sql
select * from orders a
where N-1 =
(select count(orderID) from orders b where b.orderid > a.orderid)

trick here is self join. Notice closely here what I am doing.
  1. For each row of inner table (alias b) get count of columns from the same table i.e. self join, who has value higher than current row of outer table.
  2. Use common math like this: for any column, in outer table, to be Nth highest, it must be lower than N-1 columns of inner table.
  3. Select that row. It is your Nth highest row
Apply same logistic and you get N highest/lowest row like this
--TOP N highest
select * from orders a where 10 >
(select count(orderID) from orders b where b.orderid > a.orderid) order by orderid desc

--TOP N lowest
select * from orders a where 10 >
(select count(orderID) from orders b where b.orderid < a.orderid)

The only difference between last two queries are inner query condition. By same methodology mentioned above, first query is looking for number of columns having higher value, second one looks for number of column having lower value.

Well well well....
so what is the practical use of all this crap then? Hmmmm....
May be the answer is bullet point number 3 : For paging purpose, getting X row for page n i.e. getting column 11 to 20)

Now I am applying above logic in two fold to get rows for paging .

My input here would be page size and index of the page (1,2,3 base).
Logic here is to get 11 to 20 row, first get 1 to 20 and then filter out 1 to 10

Here is the query
begin
declare @page_size int, @page_number int

set @page_size = 10
set @page_number = 3

select * from orders a
where @page_size*@page_number >
(select count(orderID) from orders b where b.orderid > a.orderid)
and not exists
(select c.orderid from orders c
where @page_size*(@page_number-1) >
(select count(orderID) from orders d where c.orderid < d.orderid)
and a.orderid = c.orderid
)
order by orderid desc
end

*********************************************************************
Again, I know some of the results above can be obtained easily with sql 2k5 (i haven't spoiled my hand with this yet though) and up,this exercise is to strengthen understanding of how select gets executed. The whole logic is to closely understand execution of query and thereby manipulating it to get desired result. Alternatively it can be helpful as interview question for either side of the table :)

No comments: