Follow @endyourif rss Facebook LinkedIn

Why You Should Avoid ORDER BY RAND() OR NEWID() Why You Should Avoid ORDER BY RAND() OR NEWID()

Published on Feb 23, 2009

Quite often we get a complaint from a client that their homepage is too slow.  The first thing we do is load the homepage up and see for ourselves.  Yep, it looks slow, so the next step is too see what is being done.  Quite often you'll see some random data being displayed on the homepage.  Whether it's random news articles, photos, videos, etc...If this is not the case, you may find my article on improving slow loading web pages more useful.

If the homepage is slow because of random data, you can lay a bet with high certainty that those 10 random articles are being pulled with an ORDER BY RAND() OR NEWID() slowing the site to a crawl.  In case you are curious the difference, RAND() is the MySQL way and NEWID() is the SQL Server way.  In this article, I'll show you why you should avoid these functions and several different solutions to help you overcome this issue.

The first thing we'll do is run a standard random query returning 10 rows on a large table and see how it performs.

SELECT * FROM videos ORDER BY RAND() LIMIT 10

The following query should look pretty standard.  I ran this query on a table with approximately 360,000 records on it.  Check out the results, they are shocking!

Showing rows 0 - 9 (10 total, Query took 14.3622 sec)

14.36 seconds, did your mouth just drop?  Now imagine if we were running this on our homepage every time a user visited it?  You know what, scratch that, I don't want to imagine that because I'm sure our web server would be coming down in the matter of minutes with a lot of concurrent users.

Now that we understand how bad the problem is, what are our solutions?  Well we have several.  Each solution provides something different.  Some solutions will not be so dynamic, some won't be so random, but I guarantee you that each solution will run in under 0.5 seconds!

Solution 1:

Solution 1 involves doing some mock randomness.  The first thing to do is a simple query to get the count or max id of our videos:

SELECT COUNT(*) FROM videos

Using this number we can do one of two things:

1. Select 10 random numbers between 1 and the count.  Once we have 10 random numbers we will do straight select queries where id = random number.  I would avoid using an IN clause because it will most likely be slower than 10 simple queries.

The major CON to this solution is, if you delete data, the id might not exist and you may get less than 10 results.  If this is the case, I would put the queries in a loop and loop until you get 10 records.  Even if you have to do 15 or 20 simple select queries it will be faster than the 14.36 seconds query we did originally.

2. Select 1 random number between 1 and the count - 10.  With this number we do one query where id is greater than random number and id is less than or equal to random number + 10.

There are two potential CONs to this solution.  Same as above where you might not get 10 records if you delete data and if your data is entered chronological, you may not get very random results as they are all grouped together.  To solve the first CON you can do a larger range, perhaps 1 to count - 30 and update your where clause to random number + 30 instead of 10 - just remember to LIMIT 10 your results.  There is no solution to CON 2, only use this solution if it is acceptable to have pseudo random data.

Solution 2:

Solution 2 involves creating a CRON job and re-creating the random data on a regular basis.  This solution is a bit more involved, but a better solution all round I find.  On a high traffic site, this solution will perform far less queries than solution 1 bringing down the entire load on the server.

This is a 3 part solution.

Step 1, make a copy of your existing guilty random code and place it in a new file by its self.  This script should simply query the random data using RAND() or NEWID() and save the results to an HTML file or a PHP file (whatever your chosing) that you will include into your homepage in step 3.

Step 2, create a CRON job that runs as often as you like to regenerate your data.  For help with this, search creating CRON jobs in Google.  There are a lot of utilities that provide you exactly what you need.  This CRON job should simply do a WGET on your the script you created in step 1.

Step 3, update your homepage and remove the guilty code and replace it with the inclusion of the file you create in step 1 (the static data that gets saved).

The only CON to this solution is, if a user refreshes your homepage before your CRON job has run to create new random records they will see the same data.  There is no 100% solution for this, but the best one would be to run your CRON job every 1 minute.  My philosophy is that I am content with 1 CRON job calling random records every one minute than 100s or 1000s of users calling random records every second.

I hope my explanations are clear and you have another happy client!

Tags: SQL Tutorials for Beginners, Intermediate and Advanced Users | Optimization | rand() | newid()

About Jamie Munro

Welcome to my blog. I have been compiling articles since 2009! Below is a list of every post I've ever created. Some of my major focuses have been compiling the best source of jQuery tutorials on the Internet today. A recent focus of mine is now focusing on creating the best SQL Tutorials for Beginners, Intermediate and Advanced Users. This is a compiliation of all my experience about how to do common SQL commands to optimize for performance. As you might see from the links around here, I've also written and published 5 books. My first and last book are focused around ASP.NET MVC and Web API tutorials. The last book has a big focus on integrating Knockout.js as the MVVM for the front-end code. O'Reilly Media was so interested in Knockout.js that they actually commissioned me to write that book first as it is one of the only books dedicated entirely to this MVVM library. So if you're looking for the best Knockout.js tutorials, I've compiled the best list. My first book, even though it was self published, was on the PHP framework: CakePHP. I used this framework for years and many of my first ever exampes were all focused on CakePHP tutorials that provide so many fantastic PHP examples. There are so many Javascript tutorials out there, but I noticed that one of the things I work on most with Javascript is arrays. With this I've worked hard on compiling a big list of really advanced Javascript array examples. These array examples are amazing including how to group by, get distinct array elements and so many more. I've also dabbled a little bit with Node.js and during those experiments I definitely ran into some common errors like Can't set headers after they are sent, Solving No Access-Control-Allow-Origin with Node js and Express, and Uncaught ReferenceError: require is not defined. With these common errors I've compiled a list of Node.js tutorials that help solve these problems, but more importantly getting you started with the basics.