This article assumes you already know SQL and want to
optimize queries.
This article assumes you already know SQL and want to optimize queries.
This article is valid for any SQL-92 and up database Queries, it is also helpful for optimizing non-database oriented programs.
This article is valid for any SQL-92 and up database
Queries, it is also helpful for optimizing non-database oriented programs.
The reasons to optimize
Time is money and people don’t like to wait so programs are
expected to be fast.
In Internet time and client/server programming, it’s even
more true because suddenly a lot of people are waiting for the DB to give them
an answer which makes response time even longer.
Even if you use faster servers, this has been proven to be a
small factor compared to the speed of the algorithm used. Therefore, the
solution lies in optimization.
Theory of optimization
There are many ways to optimize Databases and queries. My
method is the following.
Look at the DB Schema and see if it makes sense
Most often, Databases have bad designs and are not
normalized. This can greatly affect the speed of your Database. As a general
case, learn the 3 Normal Forms and apply them at all times. The normal forms
above 3rd Normal Form are often called de-normalization forms but
what this really means is that they break some rules to make the Database faster.
What I suggest is to stick to the 3rd normal form
except if you are a DBA (which means you know subsequent forms and know what
you’re doing). Normalization after the 3rd NF is often done at a
later time, not during design.
Only query what you really need
Filter as much as possible
Your Where
Clause is the most important part for optimization.
Select only the fields you need
Never use “Select *” — Specify
only the fields you need; it will be faster and will use less bandwidth.
Be careful with joins
Joins are expensive in terms of
time. Make sure that you use all the keys that relate the two tables together and
don’t join to unused tables — always try to join on indexed fields. The join type
is important as well (INNER, OUTER,… ).
Optimize queries and stored procedures (Most Run First)
Queries are very fast. Generally, you can retrieve many
records in less than a second, even with joins, sorting and calculations. As a
rule of thumb, if your query is longer than a second, you can probably optimize
it.
Start with the Queries that are most often used as well as the
Queries that take the most time to execute.
Add, remove or modify indexes
If your query does Full Table Scans,
indexes and proper filtering can solve what is normally a very time-consuming process.
All primary keys need indexes because they makes joins faster. This also means that all tables need a primary
key. You can also add indexes on fields you often use for filtering in the Where Clauses.
You especially want to use Indexes on Integers, Booleans, and
Numbers. On the other hand, you probably don’t want to use indexes on Blobs, VarChars and Long
Strings.
Be careful with adding indexes because they need to be
maintained by the database. If you do many updates on that field, maintaining
indexes might take more time than it saves.
In the Internet world, read-only tables are very common.
When a table is read-only, you can add indexes with less negative impact
because indexes don’t need to be maintained (or only rarely need maintenance).
Move Queries to Stored Procedures (SP)
Stored Procedures are usually better and faster than queries
for the following reasons:
- Stored
Procedures are compiled (SQL Code is not), making them faster than SQL code. - SPs
don’t use as much bandwidth because you can do many queries in one SP. SPs also stay on the server until the final results are returned. - Stored
Procedures are run on the server, which is typically faster. - Calculations
in code (VB, Java, C++, …) are not as fast as SP in most cases. - It
keeps your DB access code separate from your presentation layer, which
makes it easier to maintain (3 tiers model).
Remove unneeded Views
Views are a special type of Query — they are not tables. They
are logical and not physical so every time you run select * from MyView, you
run the query that makes the view and your query on the view.
If you always need the same information, views could be
good.
If you have to filter the View, it’s like running a query on
a query — it’s slower.
Tune DB settings
You can tune the DB in many ways. Update statistics used by
the optimizer, run optimization options, make the DB read-only, etc… That takes
a broader knowledge of the DB you work with and is mostly done by the DBA.
Using Query Analysers
In many Databases, there is a tool for running and
optimizing queries. SQL Server has a tool called the Query Analyser, which is
very useful for optimizing. You can write queries, execute them and, more importantly,
see the execution plan. You use the execution to understand what SQL Server
does with your query.
Optimization in Practice
Example 1:
I want to retrieve the name and salary of the employees of
the R&D department.
Original:
Query : Select * From Employees
In Program : Add a filter on Dept or use command : if Dept =
R&D–
Corrected :
Select Name, Salary From Employees Where Dept = R&D–
In the corrected version, the DB filters data because it
filters faster than the program.
Also, you only need the Name and Salary, so only ask for
that.
The data that travels on the network will be much smaller,
and therefore your performances will improve.
Example 2 (Sorting):
Original:
Select Name, Salary
From Employees
Where Dept = ‘R&D’
Order By Salary
Do you need that Order By Clause? Often, people use Order By
in development to make sure returned data are ok; remove it if you don’t need
it.
If you need to sort the data, do it in the query, not in the
program.
Example 3:
Original:
For i = 1 to 2000
Call Query
: Select salary From Employees Where EmpID = Parameter(i)
Corrected:
Select salary From Employees Where EmpID >= 1 and EmpID
The original Query involves a lot of network bandwidth and
will make your whole system slow.
You should do as much as possible in the Query or Stored
Procedure. Going back and forth is plain stupid.
Although this example seems simple, there are more complex
examples on that theme.
Sometimes, the processing is so great that you think it’s
better to do it in the code but it’s probably not.
Sometimes, your Stored Procedure will be better off creating
a temporary table, inserting data in it and returning it than going back and
forth 10,000 times. You might have a slower query that saves time on a greater
number of records or that saves bandwidth.
Example 4 (Weak Joins):
You have two tables Orders and Customers. Customers can have
many orders.
Original:
Select O.ItemPrice, C.Name
From Orders O, Customers C
Corrected:
Select O.ItemPrice, C.Name
From Orders O, Customers C
Where O.CustomerID = C.CustomerID
In that case, the join was not there at all or was not there on all keys.
That would return so many records that your query might take hours. It’s a
common mistake for beginners.
Corrected 2:
Depending on the DB you use, you will need to specify the
Join type you want in different ways.
In SQL Server, the query would need to be corrected to:
Select O.ItemPrice, C.Name
From Orders O INNER JOIN Customers C ON O.CustomerID =
C.CustomerID
Choose the good join type (INNER, OUTER, LEFT, …).
Note that in SQL Server, Microsoft suggests you use the
joins like in the Corrected 2 instead of the joins in the Where Clause because
it will be more optimized.
Example 5 (Weak Filters):
This is a more complicated example, but it illustrates
filtering at its best.
We have two tables — Products (ProductID, DescID, Price) and
Description(DescID, LanguageID, Text). There are 100,000 Products and
unfortunately we need them all.
There are 100 languages (LangID = 1 = English). We only want
the English descriptions for the products.
We are expecting 100 000 Products (ProductName, Price).
First try:
Select D.Text As ProductName, P.Price
From Products P INNER JOIN Description D On P.DescID =
D.DescID
Where D.LangID = 1
That works but it will be really slow because your DB needs
to match 100,000 records with 10,000,000 records and then filter that Where
LangID = 1.
The solution is to filter On LangID = 1 before joining the
tables.
Corrected:
Select D.Text As ProductName, P.Price
From (Select DescID, Text From Description Where D.LangID =
1) D
INNER JOIN Products
P On D.DescID = P.DescID
Now, that will be much faster. You should also make that
query a Stored Procedure to make it faster.
Example 6 (Views):
Create View v_Employees AS
Select * From Employees
Select * From v_Employees
This is just like running Select * From Employees twice.
You should not use the view in that case.
If you were to always use the data for employees of R&D
and would not like to give the rights to everyone on that table because of
salaries being confidential, you could use a view like that:
Create View v_R&DEmployees AS
Select Name, Salary From Employees Where Dept = 1
(Dept 1 is R&D).
You would then give the rights to View v_R&DEmployees to
some people and would restrict the rights to Employees table to the DBA only.
That would be a possibly good use of views.
Conclusion
I hope this will help you make your queries faster and your
databases more optimized. This should make your program look better and can
possibly mean money, especially for high load web applications where it means
your program can serve more transactions per hour and you often get paid by
transaction.
While you can put the above examples to practice in your database of choice, the preceding tips are especially true for major Databases like Oracle or SQL Server.