Thursday, January 5, 2017

Running SQL Queries on Production/live Databases - 5 Things to Remember

Did you ever face a situation where some of your innocuous looking action has cause production issue and that too big time? Well, I hope you have not because it's certainly, not a pleasant experience. One of such innocuous looking action is running SQL queries on production databases. I had that in past very early in my career where I removed some config as duplicates only to find after a week that it stopped publishing messages to one of the downstream. When you work in complex systems which has so many components, millions of lines of code, thousands of configuration and many databases with hundreds of tables, you have to be really careful with anything you do. Often there is on the real way to perform a production-like testing, hence the best bet is to keep your change as much isolated and limited as possible.

Anyway, all this background because today I am going to share some tips to you guys while querying production or live databases to prevent production issues. Since many Java developers are not SQL experts but they do write SQL queries, stored procedures, and interact with both test and production databases, there is a good chance that their innocuous looking action can cause production issue.

Last year at the same time, we had one of such incident where a developer's SELECT query blocked some process in production. The innocuous looking SELECT query holds a lock on one of the tables which were needed by the process trying to update and insert the data into the same table. The developer run the query at the end of the day and forgot about that, only to find next morning that some essential job hasn't completed and they are running from last night. Eventually, DBAs were involved and they kill the connection which is blocking the job and things were restored back.

5 Things to consider while running SQL queries on production databases

Well, this is one of the extreme cases where developer forgets to cancel the query when it was taking a long time but probability of happening something like this is quite high, especially if you have production access and don't know much about locking in the database. The best is to improve your knowledge of locking and isolation level but to be safe than sorry, you can also follow following tips while running SQL queries on production:

1) Always query with NOLOCK option
This can cause production issue if some job is also running and trying to update the same table you are querying. By saying NOLOCK you reduce the risk of blocking and deadlock e.g.

SELECT Id, Name, Address from Employee with (NOLOCK) where Id= 2

When you run your query with NOLOCK hint, it instructs query engine to not issue shared locks and does not honor exclusive locks. When you use with NOLOCK option, it is possible to read an uncommitted transaction or a set of pages that are rolled back in the middle of a read. Dirty reads are possible. Though, worth noting is that this option only applies to the SELECT statement and available for Microsoft SQL Server. See Microsoft SQL Server T-SQL Fundamentals learn more about table and row level locking.

2) Always run your query in backup or secondary server
If possible, always run your query on backup or secondary server instead of running it on the primary server. Only if just cannot use secondary because you feel data is not most up-to-date use primary, but bottom line is to avoid touching the primary or main server on production hours.

3) Test your queries on UAT before running on production
This is the same rule which I might have told you before when explaining about UNIX commands. Similar to UNIX command which you should test on Staging boxes before running on production machines, you should also first run your query on QA or UAT environment before running them in production. This not only give you a good idea of what to expect but also save you from syntax error and accidental mistakes in production.

4) Avoid touching Production database during Market hours
If you are working for a system which has some market hours e.g. stock exchanges which run from morning 9 to 4 PM, then you should avoid touching your production database during that period and only run your queries after market hours. There is a lot of activity goes on DB during market hours and there is always a good chance that your innocuous looking SELECT queries may interfere with them.

5) Have a four eye check on your query
If you are not working alone and have some team members, you can always ask your colleagues to do a four eye check on the query you want to run in production. If you can review your queries from DBAs then it's even better.

Running SQL Queries on Production/live Databases - 5 Things to Remember

That's all about some important tips to remember while querying production database. You should also learn more about how database executes your SQL query e.g. how the index works, how locking works, table scan, row scan, table level locking, or row level locking etc. If you know those basics well, you can predict your queries behavior more deterministically and potentially avoid unpleasant surprises. One book which can help you here is the SQL Performance Explained by Markus Winand, this will give you a good idea of the database from developer's perspective.

What about you? What tips you follow or what caution you take while querying production/live database?

Further Learning
Introduction to SQL
The Complete SQL Bootcamp
SQL for Newbs: Data Analysis for Beginners

Other SQL and best practices articles you may like
  • A List of Frequently asked SQL queries from Interview (list)
  • The Developer's list of must read SQL books (list)
  • Difference between where and having clause in SQL (answer)
  • The difference between union and union all in SQL? (answer)
  • Difference between row_number, rank, and dense_rank in SQL (answer)
  • The difference between self and equijoin in SQL? (answer)
Thank you for reading this article, if you like, please share with your friends and colleagues too. It helps a lot.


Piotr Rodak said...

These are very good points. I also have seen situations where queries were causing problems in production systems. I would also add that strict separation of permissions is also helpful - you should not be running diagnostic queries with your full access account - you may run an update or delete command by mistake and cause huge problems. It is better to have a read-only role to support investigations.

Anonymous said...

One of the hard lesson we learned from Production issue was to be very careful while using explicit transaction in SQL script like begin transaction and end transaction. In one case, the SQL script throws error after begin transaction and it wasn't rolled back. Subsequent run of the script blocked the whole DB which leads chaos all over the place because jobs and other SQL script got stuck. The worst case was that the programmer running the script had no clue. What I learned from this is always use @@TRANCOUNT and SET XACT_ABORT to automatically rollback the transaction in case of any error. And you must test your script in the test environment before running into production. Production issues teach lessons which you will never forgot but it's very costly way of learning. You must avoid it.

Post a Comment