I love complex sql queries.
Today I was working on generating a report that called for a join between two tables which looked as those below. The fist table (company) contained approx.80K records and the second (company_lookup) contained approx. 20K records.
.--------------------------------.
| Company |
+----+------------------+--------+
| Id | Name | Ticker |
+----+------------------+--------+
| 1 | Dummy company 1 | IBM |
| 2 | Dummy company 2 | 0002 |
| 3 | Dummy company 3 | 46ABD |
| 4 | Dummy company 4 | NAN |
| 5 | Dummy company 5 | BP |
| 6 | Dummy company 6 | BP |
| 7 | Dummy company 7 | MSFT |
| 8 | Dummy company 8 | E |
| 9 | Dummy company 9 | GOOG |
| 10 | Dummy company 10 | SUN |
| 11 | Dummy company 11 | 0046 |
‘----+------------------+--------’
.---------------------------.
| Company_Lookup |
+------------------+--------+
| Name | Ticker |
+------------------+--------+
| Dummy company 1 | IBM |
| Dummy company 2 | 2 |
| Dummy company 3 | 46ABD |
| Dummy company 4 | NAN |
| Dummy company 5 | BP |
| Dummy company 7 | MSFT |
| Dummy company 8 | E |
| Dummy company 9 | GOOG |
| Dummy company 11 | 46 |
‘------------------+--------’
As there can be multiple records with the same ticker, I tried writing a simple join on ticker and name.
select * from company a, company_lookup b
where a.ticker = b.ticker and
a.name = b.name
However, notice the following rows
.--------------------------------.
| Company |
+----+------------------+--------+
| Id | Name | Ticker |
+----+------------------+--------+
| 1 | Dummy company 1 | IBM |
| 2 | Dummy company 2 | 0002 | <-----
| 3 | Dummy company 3 | 46ABD |
| 4 | Dummy company 4 | NAN |
| 5 | Dummy company 5 | BP |
| 6 | Dummy company 6 | BP |
| 7 | Dummy company 7 | MSFT |
| 8 | Dummy company 8 | E |
| 9 | Dummy company 9 | GOOG |
| 10 | Dummy company 10 | SUN |
| 11 | Dummy company 11 | 0046 | <-----
‘----+------------------+--------’
.---------------------------.
| Company_Lookup |
+------------------+--------+
| Name | Ticker |
+------------------+--------+
| Dummy company 1 | IBM |
| Dummy company 2 | 2 | <-----
| Dummy company 3 | 46ABD |
| Dummy company 4 | NAN |
| Dummy company 5 | BP |
| Dummy company 7 | MSFT |
| Dummy company 8 | E |
| Dummy company 9 | GOOG |
| Dummy company 11 | 46 | <-----
‘------------------+--------’
Essentially, the ticker is same but is stored differently in the two tables (although the data type is varchar in both tables). A simple join doesn’t work as it fails to compare, for ex., ’2′ with ’0002′. I can’t blindly convert the column to numeric either since the column also stores alphanumeric values. After a lot of tinkering, I managed to write a rather ugly query using ‘isnumeric’ and ‘case’ clause.
select b.*, a.* from company a, company_lookup b
where
case isnumeric(a.ticker)
when 0 then a.ticker
when 1 then cast(cast(a.ticker as int) as varchar(10)) --stripping leading zeros, if any, by converting to int and then back to varchar
end = b.ticker
and
b.name = a.name
My intention was to do a numeric conversion only if the tickers are numeric. To my dismay, this query didn’t work either and kept giving me a ‘Can’t convert NaN to varchar’ error. Upon debugging further, it turned out that the isNumeric() function treats ‘NaN’, ‘E’, ’31E’ etc. as numeric. However, the cast/convert idn’t able to convert ‘NAN’ to a valid numeric value.
This issue has also been touched upon in the following stackoverflow query: http://stackoverflow.com/questions/570075/sql-isnumeric-returns-true-but-sql-reports-conversion-failed
So, finally I had to modify the query to filter out such values as follows.
select b.*, a.* from company a, company_lookup b
where
case a.ticker
when 'NAN' then 'NAN'
when 'E' then 'E'
else
case isnumeric(a.ticker)
when 0 then a.ticker
when 1 then cast(cast(a.ticker as int) as varchar(10))
end
end = b.ticker
and
b.name = a.name
P.S. – By the way, the above tables have been generated using the Text::ASCIITable perl module. Also, WordPress has a habit of converting double dashes to some special character so I had to edit the HTML of this blog entry to replace all dashes with ampersand hash045;