SQL Structure | DataCraft Analytics

When it comes to structuring SQL code, I’m pretty specific about what I like and why. I tend to follow most normal conventions with a few specific exceptions. The first distinction that I make is a basic classification of the type of statement that I”m writing and for what purpose. These include:

short trivial statements
single-table non-trivial statements
complex statements

While it may seem a bit strange that I differentiate how I structure my SQL based on the ‘type’ of statement, it is simply about how persistent the SQL is likely to be and how complex it is likely to be. For example I get less worried about splitting out the elements so much if my SQL statements are simply for me to check something in a database, or if they are extremely simple and straightforward with very little logic.

Some times, what starts out as a simple, single line statements ends up growing, and as a statement is changed, I tend to improve the formatting to fit the style.

At the basis of formatting SQL is I’m trying to:

make my SQL statement easy to read
make my SQL statement easy to quickly parse
make my SQL statement easy to debug
make my SQL statement easy to comment

Short Trivial Statements

These are the most simple statements. They usually consist of a select * syntax with a single table and no where clause. The rule is, keep them on the same line.


select top 10 * from table

These types of statements are usually not practical and used mainly to preview data and just used quickly. There’s not much to format in these types of statements. Because they are generally temporary and very easy to read, there is little needed in the way of formatting. Of course, it is okay to format them in a similar way to what we format more complex statements below, but generally this isn’t considered necessary.

Single-table non-trivial statements

Quite often we write queries that contain a single table, but have more specific logic that specify a few fields and a simple criteria or filter. Again, these are most often used as temporary queries. Rather than include these types of queries on a single line, we split the statement into a single line for each block of the statement.

select field1,field2
from table
where field = value

Again, these types of statements are usually used in quick simply logic and because they don’t usually contain much logic, are easy to split to a statement block per line. The main reason that we want to split them across multiple lines is to make it easier to read and to limit the amount of horizontal scrolling that we need to undertaken. Consider the comparison:

select field1,field2,field3
from table
where field = value;

select field1,field2,field3 from table where field = value;

It is easier to scan the top statement and see the different fields, tables and filter criteria. The difference is less dramatic because the field names and table names are not particularly long, however in production systems, it is often the case that fields end up having relatively long names:

/* individual lines per block */
select RecordIdentifier,InitialContractFulfilmentDate,ContractPrototypeTerminationDate
from FacilityContractPrototypeInformation
where ContractProttypeTerminationDate < '2017-01-01';

/* a single line .. generally will need horizontal scrolling to read the whole thing */
select RecordIdentifier,InitialContractFulfilmentDate,ContractPrototypeTerminationDate from FacilityContractPrototypeInformation where ContractProttypeTerminationDate < '2017-01-01';

You can now see that as the field names approach a more realistic size, then the statement starts to get long enough that it requires horizontal scrolling.

Complex Statements

These types of statements are the typical SQL used in most stored procedures. They usually contain a lot more logic by way of table joins or where clauses. The general thinking behind these types of statement is to make them long but narrow. By putting each element on its own line and using indenting it is relatively easy to locate the relevant piece of logic you want quickly. Also a benefit is that most modern mouse have a scroll wheel, and lines are generally easy to number more so than columns. This means that navigating a large statement or referring to it is easier if it is narrow and long.

Indenting helps to group blocks of information together, and helps to visually identify pieces of logic that are nested.

select
  field1,
  field2
from
 tableA A
 inner join
 tableB B on A.field1 = B.field1
where
 field1 = value1
 and
 field2 = value2

Consider the below example, where we have a sub-select. This demonstrates how we indent such nested queries. In this case the nested query is very simple, so we include the statement on a single line.

select
  field1,
  field2
from
 (
     select * from tableC where field1 = value1
 ) A
 inner join
 tableB B on A.field1 = B.field1
where
 field1 = value1
 and
 field2 = value2

I can easily scan down the list of fields, and knowing each field starts on a new line, I can locate any field I want easily. The same applied for tables and where conditions, I can scan down easily and do not have to parse each line carefully to find individual elements.

Some key rules for formatting:

Each 'block' should start on a new line, this includes 'select','from','where' and 'group by' blocks.
Each field should be indented one tab from the select block on its own line.
Commas should go on the end of each line. Some people prefer to put them at the start of each line.
Each table should be indented one level from the 'from' block and be on it's own line.
Table joines should be indented at the same level as the tables that they apply to and should be on their own line. This is different to some people who like to add join statements at the end of the table line.
Each evaluation of a where condition should have its own line.
Boolean joins of 'and' and 'or' should be on their own line, indented to the same level as their applicable statements.
If you mix the use of 'or' and 'and' boolean operators you should always use brackets to be explicit about the way you intend to apply the operators.