SQL Tips - Why LEFT JOINS are bad
Published on Feb 13, 2009 by Jamie MunroWhen you do SQL queries do you find yourself always, or almost always, using left joins? If you've answered yes to this question, I would suggest to continue to read and understand some downfalls that you might be getting yourself into.
Let's start by looking at a standard query that has a left join:
SELECT * FROM users u LEFT JOIN countries c ON c.id = u.country_id WHERE u.id = 5
The following query will return all fields from the users table and all fields from the countries table. Now you're thinking, "Yeah, looks good, so what's wrong with this?". Well, I'm glad you asked.
Because we've done a left join, if user with the id of 5 doesn't have a country selected, all the fields from the countries table will be NULL. There are many occassions where you want to use a left join, in the example above it could simply be because selecting a country is not required in your registration form. If we were to take the query above and use an INNER JOIN instead:
SELECT * FROM users u INNER JOIN countries c ON c.id = u.country_id WHERE u.id = 5
If the user with the id of 5 doesn't have a country_id set in the users table, no results would be returned, unlike the left join that returned one row with NULL values for the countries fields.
You should now understand what happens when we do a left join. Because of the results above, there are usually three main reasons why doing a left join when you should do an inner join are bad.
The first reason is simply because you are expecting the results of the joined table to not be NULL.
The second reason is for performance. In our example above, speed would most likely not be a huge issue. However, let's assume we need to build a large report where we are joining 4 or 5 tables together to get specific data.
As I described above, when you left join, all rows from the main table are returned regardless of the results of the join. However, if you inner join it only returns the rows that match. Let me provide some statistics. Assume the users table has 10,000 records in it and we want to get all Canadian users. To accomplish this we are going to join the countries table on the primary key and foreign key. This time we are going to use an INNER JOIN. Let's assume that 3,000 users are Canadian. Our result set is now done to 3,000 from 10,000. We now want to further limit to the province of Ontario. To do this we are going to join the provinces table on it's primary key with the foreign key in the users table. Because our record set is down to 3,000 users, our query now has to only compare 3,000 users in the join instead of 10,000 users.
The third reason is to ensure data accuracy in reports. If your report requires calculations of fields based on certain criteria, the totals returned might be wrong because left joins would return all rows from the main table where an inner join would only return the rows that were successfully joined because of a match. You might not think this is a big issue, just wait until you give your boss a report that has invalid numbers calculated into it!
I hope the above examples clearly describe how calculation errors and slow performing results can occur by using a LEFT JOIN instead of an INNER JOIN. Also, don't think that I'm a left join hater either, I only use them when I need to :)