Try using a characteristic function for this. Details below (sorry for
any poor formatting).
HTH,
steve
[TeamPS]
Brage Mogstad wrote:
>
> This one is really freaking me out.
> I'm trying to "shirk" a dataset before its returned to a dwo.
>
> My select is a union:
>
> SELECT account,
> sum(USD) as period 1,
> 0.00 as period_2,
> FROM......
> WHERE pdate between '1998-01-01' and '1998-01-31'
> UNION ALL
> SELECT account,
> 0.00 as period 1,
> sum(USD) as period_2,
> FROM......
> WHERE pdate between '1998-02-01' and '1998-02-28'
>
> This gives me result in the dw:
>
> ACCOUNT PERIOD_1 PERIOD_2
>
> 1010 1000,- 0
> 1010 0 2500.-
> 1020 750,- 0
> 1020 0 340.-
> etc,etc.
>
> What I want returned to the dw is this:
>
> 1010 1000,- 2500,-
> 1020 750,- 340,-
> etc,etc
>
> I've got 12 periods.
> The net trafic: row * 12 = no god
>
> Any ideas??
>
> NB
> I can't use a
>
> Select sum(USD) as period_1
> (select sum(USD) from ...
> where pdate between '1998-02-01' and '1998-02-28' ) as period_2
> FROM..
> WHERE..
>
> because this kind of transaction table usually has 40-80 000 rows, and this
> one gets to slow....
Single-Statement SQL Statements for fast conditional data retrieval
Characteristic functions can be used to implement single-statement SQL
statements with conditional logic in the columns retrieved. Other
solutions to problems where characteristic
functions can typically be used are generally inefficient and difficult
to implement as a single SQL statement.
A characteristic function is one which returns one or zero (and
optionally NULL). It can be used in part of a computed expression in an
SQL SELECT statement to conditionally return one
value or another.
Examples of useful characteristic functions are given (using Transact
SQL), along with case studies.
A 'Classic' Example
Suppose you have sales data held in the following table:
Product
Month
Revenue
"Widget"
1
5500
"Widget"
2
7440
�
"Gadget"
1
8900
"Gadget"
2
7700
�
"Gadget"
12
10345
The view your users want to see however is a report of the form:
Product
Jan
Feb
�
Dec
"Widget"
5500
7440
�
"Gadget"
8900
7700
�
10345
This type of transformation of data is called pivoting.
First go
A conventional approach to this problem might be to build a report
table, and update the individual monthly totals in separate updates (see
exhibit A)
INSERT INTO report
SELECT product,
jan_rev = revenue,
feb_rev = 0,
�
dec_rev = 0
FROM sales
WHERE month = 1
UPDATE report
SET feb_rev = revenue
FROM sales
WHERE month = 2
�
UPDATE report
SET dec_rev = revenue
FROM sales
WHERE month = 12
This is not ideal on three counts:
1.This is certainly not a single SQL statement
2.The sales table is likely to be scanned sequentially twelve times
3.In order to ensure recovery and concurrency, transaction management
would need to be included
Second go
Another approach might be to implement a twelve-way join on the sales
table (see exhibit B).
SELECT s1.product,
jan_rev = s1.revenue,
feb_rev = s2.revenue,
�
dec_rev = s12.revenue
FROM sales s1, sales s2,�, sales s12
WHERE s1.product = s2.product
AND s2.product = s3.product
�
AND s11.product = s12.product
AND s1.month = 1
AND s2.month = 2
�
AND s12.month = 12
This is a single SQL statement, and hence concurrency is handled by the
DBMS. Unfortunately, as might be expected, this statement is not at all
efficient.
Third go
What is required is a method of returning one row for each product, and
conditionally calculating the monthly totals as we go. This can be
achieved by using characteristic functions.
Concentrating on the February revenue, we want to catch the revenue
figure for month = 2. Before illustrating the characteristic function
used to achieve this, we need some notation.
Define a characteristic function d[expression] as a function which
returns 1 if the expression is true, 0 if the expression is false (and
NULL if the expression is NULL). Hence in our
example, we want d[month=2] or more generally d[A=B]. This can be
achieved as follows:
d[A=B] = 1-abs(sign(A-B))
In our example this is written:
1-abs(sign(month-2))
Most characteristic functions working on numeric columns use the
built-in functions abs and sign. Sign(x) returns 1 if x is positive, -1
if x is negative and 0 if x is zero. Abs(x) returns the
absolute value of x. The above expression is evaluated as shown in the
following table:
month month-2 sign(month-2) abs(sign(month-2)) 1-abs(sign(month-2))
1 -1 -1 1 0
2 0 0 0 1
3 1 1 1 0
.....
12 10 1 1 0
If this expression is then multiplied by the actual revenue figure, then
we have almost solved the problem. All that is needed now is to add a
GROUP BY clause, to ensure that only one
row is retrieved for each product.
SELECT product,
jan_rev = sum(revenue*(1-abs(sign(month-1)))),
feb_rev = sum(revenue*(1-abs(sign(month-2)))),
�
dec_rev = sum(revenue*(1-abs(sign(month-12))))
FROM sales
GROUP BY product
Observe, this is a single SQL statement, hence concurrency is controlled
by the DBMS. It also only requires one table scan to produce the
results.
Other Characteristic Functions
Below is a list of characteristic expressions for commonly required
expressions for the d[] functions.
Numeric values
d[A=B] = 1-abs(sign(A-B))
d[A!=B] = abs(sign(A-B)) {NB = 1-d[A=B]}
d[A<B] = 1-sign(1+sign(A-B))
d[A<=B] = sign(1-sign(A-B))
d[A>B] = 1-abs(1-sign(A-B))
d[A>=B] = sign(1+sign(A-B))
d[A is NULL] = isnull(0*A,1)
d[A is NOT NULL] = 1-isnull(0*A,1)
d[NOT A] = 1-d[A]
d[A AND B] = d[A]*d[B] {NB where d[A] and d[B] are never NULL}
d[A OR B] = sign(d[A]+d[B]) {NB where d[A] and d[B] are never NULL}
String values
d[A=B] = charindex(A,B)*charindex(B,A)
d[A!=B] = 1-charindex(A,B)*charindex(B,A)
d[A is NULL] = isnull(0*ascii(A),1)
d[A is NOT NULL] = 1-isnull(0* ascii(A),1)
Date values
d[A=B] = 1-abs(sign(datediff(dd,B,A))
d[A!=B] = abs(sign(datediff(dd,B,A))
d[A<B] = 1-sign(1+sign(datediff(dd,B,A)))
d[A<=B] = sign(1-sign(datediff(dd,B,A)))
d[A>B] = 1-abs(1-sign(datediff(dd,B,A)))
d[A>=B] = sign(1+sign(datediff(dd,B,A)))
d[A is NULL] = isnull(0*datepart(yy,A),1)
d[A is NOT NULL] = 1-isnull(0* datepart(yy,A),1)
A Case Study
Deal information was held in the following format:
id
Description
Rejectdate
1
Deal A
2-Jan-1996
�
783
Deal MMH
5-Jan-1996
784
Deal IHJ
8-Jan-1996
The requirement was, given a particular day, display all rejected deals
in terms of the number of working days that have elapsed since that deal
was rejected. Group all deals which have
been rejected for more than 5 days together. Hence the resulting report
for 9-Jan-1996 should look something like this (NB we assume here that
"6 days rejected" can easily be interpreted
as "more than 5 days�"):
Deal
Name
Days Rejected
1
Deal A
5
783
Deal MMH
2
784
Deal IHJ
1
There are four steps to solving this problem.
Step 1
Calculate the number of days rejected (Store in rj)
rj
=
datediff(dw,GetDate(),rejectdate)
Step 2
For over 1 week old (8 days or more), return 8, else return rj (Store in
rj2)
rj2
=
rj*d[8>=rj]-8*d[rj>8]
=
rj*sign(1+sign(8-rj))-8*(1-sign(1-sign(rj-8)))
Step 3
Calculate the day of the week now (Store in wk)
wk = datepart (dw, GetDate())
Step 4
For rj2> wk, subtract 2 (to remove the weekend) else return rj2 (Store
in rj3)
rj3
=
rj2 + 2*d[rj2>=wk]
=
rj2 + 2*sign(1+sign(rj2-wk))
The Final Expression
The final expression is shown below, and can be built up by replacing
the variable declarations above with their corresponding expression from
a previous step.
d[datediff (dw, getdate(), rejectdate)]*sign(1+sign(8-
d[datediff (dw, getdate(), rejectdate)]))-8*(1-sign(1-sign
(d[datediff (dw, getdate(), rejectdate)]-8))) + 2*sign(1+sign
(d[datediff (dw, getdate(), rejectdate)]*sign(1+sign(8-
d[datediff (dw, getdate(), rejectdate)]))-8*(1-sign(1-sign
(d[datediff (dw, getdate(), rejectdate)]-8)))
- datepart (dw, GetDate())))
NULL Values - a warning!
The expression for d[A=B] above where a data value is missing would
return a zero. It would perhaps make more sense in such cases for the
function to return NULL. One way to achieve
this is to use a feature of Sybase (pre System 10) whereby 1/0 is NULL.
In addition if d[A] or d[B] can take NULL values, then the expressions
d[A AND B] and d[A OR B] need some
modification. Hence the characteristic functions become:
d[A=B] = 1/(1-abs(sign(A-B)))
d[A AND B] = isnull(d[A]*d[B],0/(1-isnull(d[A],1)* isnull(d[B],1)))
d[A OR B] = isnull(sign(d[A]+d[B]),1/(isnull(d[A],0) +
isnull(d[B],0)))
In Sybase system 10 and above, 1/0 generates an error, and the syntax
becomes even more daunting!
d[A=B] = 1 +
ascii(char(convert(int,abs(sign(1-abs(sign(A-B)))))-1))
d[A AND B] = isnull(d[A]*d[B], ascii(char(convert(int,abs(sign(
1-isnull(d[A],1)* isnull(d[B],1))))-1)))
d[A OR B] = isnull(sign(d[A]+d[B]),
ascii(char(convert(int,abs(sign( isnull(d[A],0) + isnull(d[B],0))))-1)))
Generally, if one were assuming (pre System 10) that 0/n would return
NULL, this would need to be replaced with the expression:
ascii(char(convert(int,abs(sign(n)))-1))