Join the World

Join the World


March 11, 2001

by David Faour

When accessing data, we often want to answer questions that requires input from multiple tables. While there are many ways to accomplish the task, the use of proper join techniques, depending upon your question, will ensure that you are getting the answers you want.

 The most commonly used join types are inner-joins, and outer-joins. An inner-join may also be called a natural-join or an equi-join.  Inner joins will return only those rows where there is a match in both tables. An outer join will return all of the rows in table1, and only the matching information from table2.

 So, how would we normally use these? Consider the following problem. You are a sales manager, and you want to know which customers have ordered from you in the thirty days, so that you can send a thank you note. The code would look something like the following.

 

Select C.[Field List], O.[Field List] from Orders O join Customers C on O.OrderDate>Today-30 and O.CustomerID=Customers.CustomerID

 

There are several things going on here. First I have given the tables the aliases O and C. To alias a table, the syntax is as follows select alias.field from tablename alias where...and so on. Second, the Orders table is the most limiting factor, I am only interested in customers with matching records in the orders table, and the orders must have been placed in the last 30 days. Because the Orders table is the most limiting, I am putting it first in the table list. I have also placed the O.OrderDate>Today-30 requirement in the join clause. Assume I had written the code as follows

select fields

from O

join C

on O.customerID=C.CustomerID

where O.OrderDate>Today-30       

This would not be efficient. Customers that had placed an order at ANY time would be joined onto the orders table, and then the rows would be filtered for date.

 In the first example, the rows are eliminated for date before joining the customers.

Let's take a look at how an outer join works.  Suppose I want to see a list of all customers along with some ordering information for those customers who have placed orders. The code will resemble the following.                  

            Select C.[Field List], O.[Field List]

from Customers C left outer join Orders O on C.CustomerID=O.CustromerID

 

This query will return all of the customers, even those who haven't placed orders, along with the ordering information for those customers who have placed orders.

 

You might have noticed the words "left outer join." In SQL server, there are left outer and right outer joins. The left/right indicator tells SQL server which is the table that you want all of the rows to come from, and which is the one to include only matched rows. Left outer joins will return all of the rows from the table on the left (the one listed first in the from list)-and return the data from matching rows in the table on the right. Right outer joins will return all of the rows from the table on the right (the one listed second in the from list)-and return data from the matching rows on the left.

 

What about multiple joins?

 

Consider the following query.

 

Select T1.fields, T2.fields, T3.fields

from T1 left outer join T2

on T1.t1ID=T2.t1ID

left outer join T3

on T3.t2ID=T2.t2ID

 

This query will first capture all of the rows in T1 and the matching data from T2. It will then take this intermediate "table" or rowset and add all of the data from the matching rows in T3.

 

Some things to remember to optimize joins.

1.      In SQL server, it's generally faster to put your conditions in the join clause instead of in the where clause

2.      Put your most limiting table first, to minimize the number of rows to be joined

3.      Consider the question you are trying to answer and make sure you have chosen the right type of join.