For those who don’t know what OUTER-JOIN is, here are our much over used friends DEPT and EMP with an illustration.
DEPTNO DNAME ---------- ---------- 1 Dept1 2 Dept2 EMPNO DEPTNO ENAME ---------- ---------- ---------- 111 PlainJane
select d.*,e.empno,e.ename from dept d,emp e where d.deptno=e.deptno; DEPTNO DNAME EMPNO ENAME ---------- ---------- ---------- ---------- 1 Dept1 11 PlainJane
select d.*,e.empno,e.ename from dept d,emp e where d.deptno=e.deptno(+); DEPTNO DNAME EMPNO ENAME ---------- ---------- ---------- ---------- 1 Dept1 11 PlainJane 2 Dept2
Please notice that query1 does not return any information about DEPT2. From the data in the tables, it should be obvious that DEPT2 could not find at least one row to match up with, when it joined across DEPTNO between DEPT and EMP. Because DEPT2 could not complete a full row during execution of this query, it did not return in the result set of the query. This is of course the normal behavior of SQL. But in the business community, what is technically correct is often not the result sought by those who pay the bills. In this case, when a manager commissions a report that shows Departments and the Employees that work in them, they usually want to see departments even when they have no employees. For this OUTER-JOIN was invented.
In query2 we see the "Old Style" Oracle syntax for OUTER-JOIN. It is the PLUS SIGN in parenthesis. This syntax was not actually invented by Oracle but rather some smart guy whose name escapes me at the moment. Oracle saw its value and was an early adopter of OUTER-JOIN but doing so required them to use a syntax before ANSI SQL COMMITTEE settled on one. The new and improved ANSI SQL is fine and all, some say even better, but being an old bird I tend to stick with old and proven ways till something makes me change. I will comment briefly at the end of this discussion on ANSI syntax of OUTER-JOIN.
The results from query2 show the effect of OUTER-JOIN. Analyzing the query a little, we call DEPT the Driving table and EMP the Destination or OUTER table in the join event. The little PLUS SIGN tells the query that for this join element, there may in fact be no hits. OUTER-JOIN thus does the following: Whenever the driving table (DEPT in our example) fails to find a match for a specific join element on the destination table (EMP in our example), make believe there was ONE match and fill in the destination items with nulls. Relative to our example: Any department that has no employees should make believe it has one employee with all empty fields. OUTER-JOIN allows DEPT2 to construct a completed row even though it has no employees, and thereby show up in the result set just with many of the data elements left blank.
There you have it, an effective solution for the need, in a simple idea. Ah, but there is the RUB. For just as the Universe has her rule of Increasing Entropy, so Computer Science has its rule of Increasing Complexity. That which starts simple tends not to stay simple for long but rather, will become more complex as more developers adopt it. OUTER-JOIN is no exception. Indeed, its deceiving simplicity aside, OUTER-JOIN is in fact one of the most error prone constructs available to us in SQL coding. Miscoding of OUTER-JOIN will usually show up as one of three common manifestations:
- the Half Baked Multi-Column Join,
- the Incomplete Join Trail,
- the Join Against a Constant.
Let us take a look at these three miscoding errors.
In the following text I am going to refrain from unnecessary sample data and table describes. If you have stuck with this discussion this far I am going to assume you have at least some understanding of SQL and or Relational Databases, and are therefore capable of constructing the necessary mental pictures in your head without the need for me to commit everything to paper. To this end I will be using meaningless column and table names, and I will present code example for clarification in a table below.
The Half Baked Multi-Column Join as the name suggests is a join between two tables where the join is a two or more column join. The mistake is simply that one of the join elements has the OUTER-JOIN construct and one does not. Because the join event has at least one join element missing the OUTER-JOIN construct, the join event can fail to produce a completed row for the same reason we saw in our DEPT/EMP example above. It will behave as if there was no OUTER-JOIN construct at all. The solution is to code the missing OUTER-JOIN construct.
The Incomplete Join Trail presents in situations where there is a chain of joins as one might see in a hierarchical set of tables. The error is that somewhere down the join chain the coder stopped using OUTER-JOIN. Once OUTER-JOIN begins, it is only good so long as a join chain utilizes OUTER-JOIN all the way down the chain. The solution is the code the missing OUTER-JOIN constructs.
The Join against a Constant involves a join event in which a constant is being compared to one of the columns. The error is that the OUTER-JOIN must also include the OUTER-JOIN construct in the constant test. Without it, the join can fail to complete a row. This is the most pervasive miscoding of OUTER-JOIN because the behavior of a constant test when considered as part of a join event of an OUTER-JOIN is different from its behavior when it stands alone. I have included a sample of this behavior difference with two seemingly semantically equivalent queries that are not equivalent because one uses a true join and one does not. The point being that OUTER-JOIN constructs only give meaning when they are actually being used in a join. The solution is as we have seen, to code the missing OUTER-JOIN constructs.
Miscoding Type | Wrong Coding | Right Coding |
---|---|---|
Multi-Column Join | select * from T1, T2 where T1.C1 = T2.C1(+) and T1.C2 = T2.C2; |
select * from T1, T2 where T1.C1 = T2.C1(+) and T1.C2 = T2.C2(+); |
Incomplete Join Trail | select * from T1, T2, T3, T4 where T1.C1 = T2.C1(+) and T2.C2 = T3.C2(+) and T3.C3 = T4.C3; |
select * from T1, T2, T3, T4 where T1.C1 = T2.C1(+) and T2.C2 = T3.C2(+) and T3.C3 = T4.C3(+); |
Join to a Constant | select * from T1, T2 where T1.C1 = T2.C1(+) and T1.C2 = ‘Y’; |
select * from T1, T2 where T1.C1 = T2.C1(+) and T1.C2(+) = ‘Y’; |
Behavioral Problems? | select * from dept where 3 = deptno(+); no rows selected
|
select * from dept ,(select 3 c1 from dual) x where x.c1 = dept.deptno(+); DEPTNO DNAME C1
------- ------- ---------- 3 |
So, whenever you come across OUTER-JOIN, be skeptical. For the more complex the code, the more likely it is you will see one of these miscoding errors. If you are coding your own OUTER-JOINS, double check you code to make sure you have not made one of these mistakes yourself. When if comes to OUTER-JOIN, its all to easy to make a mistake and the fix is quick if you catch it.
A note of Caution: Should you see one of these coding errors in your PL/SQL travels, stop to ask yourself one additional question: “What was the original intent?”. Remember that when you fix the apparent coding error by adding in the missing OUTER-JOIN constructs, you will be changing the rows returned from the query. Thus it behooves you to find out if in fact these rows should be returned or not. It may be that the code you are "fixing" actually suffers from what we all know as CUT-AND-PASTE syndrome. The original author of the code may have simply cut and pasted a piece of code that is accepted as working, from somewhere else and not even looked at the fact the it had an OUTER-JOIN coding error. The point is, that you should know if an OUTER-JOIN is actually correct or not for the query you are changing. After all, it may be that the query has been running for some time and no one noticed it was missing any rows. So, to simply fix what looks like an error without doing the proper research is just asking to get your self in trouble.
A quick comment on the ANSI version of OUTER-JOIN: Here is an example of an ANSI coded OUTER-JOIN.
select dept.*,emp.* from dept left outer join emp on dept.deptno = emp.deptno DEPTNO DNAME EMPNO ENAME DEPTNO ---------- ---------- ---------- ---------- ---------- 1 Dept1 11 PlainJane 1 2 Dept2
Let us take note of two points:
- if we look closely at the syntax we see that ANSI syntax may be superior with respect to OUTER-JOIN over the more traditional Oracle syntax because the ON clause stipulates the join criteria. Thus it is not possible to code (I don’t think), an OUTER-JOIN that exhibits the coding errors we have discussed without also coding and incorrect join. All the join elements must appear in the ON clause and therefore are inclusive to the OUTER-JOIN operation.
- On the other hand, ANSI syntax of OUTER-JOIN can be more confusing then the much simpler Oracle syntax, as there are multiple ways to code the same thing.
Consider this code snippet:
select dept.*,emp.* from emp right outer join dept on dept.deptno = emp.deptno DEPTNO DNAME EMPNO ENAME DEPTNO ---------- ---------- ---------- ---------- ---------- 1 Dept1 11 PlainJane 1 2 Dept2
Which leads to the question: "What is different between LEFT OUTER JOIN and RIGHT OUTER JOIN?". Well, seems like nothing other than to specify on which side of the expression lives the Driving table. So why then do we need both? The answer may be hiding somewhere in more complex join examples. The existence of the two variations may afford more flexibility in table order in the FROM clause but none the less this is not very intuitive for a language that was originally created as an intuitive language. Ah well, a discussion for another day.
For more information, go Google (or whatever you favorite search engine is these days).
About the Author: Kevin Meade is a 22-year veteran of Relational Databases in general and Oracle in particular. He is employed under many hats: Architect, Designer, Modeler, and Programmer and so is currently a contractor using the title of “Oracle Specialist” in the Finance and Insurance Industries. Married with two girls, likes to swim in his heated pool with his family, is addicted to Strategy Computer games, keeps a Saltwater Reef in his home, and brews his own dark beers (yum).
Outer Joins and ANSI Syntax
Good article. As a fellow Oracle oldie (started out on 5.1B), I can appreciate the difficulty in using the ANSI syntax. A couple of years ago, I spent 9 months on a Sybase IQ project, which forced me to get familiar with the ANSI syntax, and I've never looked back. I force all of my developers to use ANSI syntax now. I used to think that all that talk of "more intuitive" etc. was rubbish, but I'm a convert now.
Also worth noting that ANSI syntax for outer joins offers additional functionality:
- No more "ORA-01417: a table may be outer joined to at most one other table". You can multi-column outer-join to as many tables as you want.
- Full Outer Join
I suspect that Right Outer Join is included for ANSI compliance. It is (perhaps) more relevant to databases where FROM clause order is important to the optimizer.