Monday, July 17, 2023

5 Tips for Running SQL Queries on Production Database

Did you ever face a situation where some of your innocuous-looking action has caused production issues and that too big time? Well, I hope you have not because it's certainly, not a pleasant experience. One such innocuous-looking action is running SQL queries on production databases. I had that in the past very early in my career where I removed some config as duplicates only to find after a week that it stopped publishing messages to one of the downstream. When you work in complex systems that have so many components, millions of lines of code, thousands of configurations, and many databases with hundreds of tables, you have to be really careful with anything you do.

Often there is no real way to perform production-like testing, hence the best bet is to keep your change as much isolated and limited as possible.

Anyway, all this background because today I am going to share some tips with you guys while querying production or live databases to prevent production issues. Since many Java developers are not SQL experts but they do write SQL queries, stored procedures, and interact with both test and production databases, there is a good chance that their innocuous-looking action can cause production issues.

Last year at the same time, we had one such incident where a developer's SELECT query blocked some processes in production. The innocuous-looking SELECT query holds a lock on one of the tables which was needed by the process of trying to update and insert the data into the same table.

The developer runs the query at the end of the day and forgot about that, only to find the next morning that some essential job hasn't completed and they are running from last night. Eventually, DBAs were involved and they kill the connection which is blocking the job and things were restored back.




5 Things to consider while running SQL queries on production databases

Well, this is one of the extreme cases where the developer forgets to cancel the query when it was taking a long time but the probability of happening something like this is quite high, especially if you have production access and don't know much about locking in the database.

The best thing to do is to improve your knowledge of locking and isolation level but to be safe than sorry, you can also follow the following tips while running SQL queries on production:


1. Always query with the NOLOCK option

This can cause a production issue if some job is also running and trying to update the same table you are querying. By saying NOLOCK you reduce the risk of blocking and deadlock  like:
SELECT Id, Name, Address from Employee with (NOLOCK) where Id= 2

W
hen you run your query with NOLOCK hint, it instructs the query engine to not issue shared locks and does not honor exclusive locks. When you use with NOLOCK option, it is possible to read an uncommitted transaction or a set of pages that are rolled back in the middle of a read.

Dirty reads are possible. Though, worth noting is that this option only applies to the SELECT statement and available for Microsoft SQL Server.

SQL tips for beginners



2. First run your query in backup or secondary Database Server

If possible, always run your query on a backup or secondary server instead of running it on the primary server. Only if you just cannot use secondary because you feel data is not the most up-to-date use primary, but the bottom line is to avoid touching the primary or main server on production hours.


3. Test your queries on UAT/Test Database before running on the Production

This is the same rule which I might have told you before when explaining UNIX commands. Similar to the UNIX command which you should test on Staging boxes before running on production machines, you should also first run your query on QA or UAT environment before running them in production. This not only gives you a good idea of what to expect but also saves you from syntax errors and accidental mistakes in production.


4. Avoid touching the Production database during Live hours

If you are working for a system that has some market hours like stock exchanges which run from morning 9 to 4 PM, then you should avoid touching your production database during that period and only run your queries after market hours. 

There is a lot of activity that goes on DB during market hours and there is always a good chance that your innocuous-looking SELECT queries may interfere with them.


5. Have a four-eye check on your SQL query

If you are not working alone and have some team members, you can always ask your colleagues to do a four-eye check on the query you want to run in production. If you can review your queries from DBAs then it's even better.

Running SQL Queries on Production/live Databases - 5 Things to Remember


That's all about some important tips to remember while querying the production database. You should also learn more about how the database executes your SQL query like how the index works, how locking works, table scan, row scan, table-level locking, or row-level locking, etc. If you know those basics well, you can predict your queries' behavior more deterministically and potentially avoid unpleasant surprises.


Other SQL and best practices articles you may like
  • 10 Frequently asked SQL queries from Interview (list)
  • The Developer's list of must-read SQL books (list)
  • Difference between where and having a clause in SQL (answer)
  • Top 5 Courses to learn SQL and Database (best courses)
  • The difference between union and union all in SQL? (answer)
  • Top 5 Online Courses to learn MySQL in Depth (best courses)
  • Difference between row_number, rank, and dense_rank in SQL (answer)
  • Top 5 Courses to learn PostgreSQL in-depth (online courses)
  • The difference between self and equijoin in SQL? (answer)
  • 7 Free Courses to learn Database and SQL? (free courses)
  •  Best way to write complex SQL queries (tips)
  • 10 Free Courses to learn Oracle and SQL Server (free courses)
  • 50+ Microsoft Server Interview Questions (answers)
  • 10+ Oracle Interview Questions with Answers (list)


4 comments:

  1. These are very good points. I also have seen situations where queries were causing problems in production systems. I would also add that strict separation of permissions is also helpful - you should not be running diagnostic queries with your full access account - you may run an update or delete command by mistake and cause huge problems. It is better to have a read-only role to support investigations.

    ReplyDelete
  2. One of the hard lesson we learned from Production issue was to be very careful while using explicit transaction in SQL script like begin transaction and end transaction. In one case, the SQL script throws error after begin transaction and it wasn't rolled back. Subsequent run of the script blocked the whole DB which leads chaos all over the place because jobs and other SQL script got stuck. The worst case was that the programmer running the script had no clue. What I learned from this is always use @@TRANCOUNT and SET XACT_ABORT to automatically rollback the transaction in case of any error. And you must test your script in the test environment before running into production. Production issues teach lessons which you will never forgot but it's very costly way of learning. You must avoid it.

    ReplyDelete
  3. #1 is horrible, terrible, really bad advice. DO NOT USE the "NOLOCK" hint (outside of some edge cases) in SQL Server. You can easily get bad reads and incorrect data.

    ReplyDelete
  4. NOLOCK is a disaster. Unfortunately virtually impossible to recreate the problem it causes, otherwise I would happily sue vendors that use it when our users make business decisions on incomplete data.

    >> When you use with NOLOCK option, it is possible to read an uncommitted transaction or a set of pages that are rolled back in the middle of a read.

    The Dirty Reads is not the heart of the problem, and I could live with them as in practice we never ever have a ROLLBACK in production. Actually, no I can't. One vendor provides an API-type table for queuing up inserts/changes. But they read it with NOLOCK (because they have a policy of "NOLOCK pepper-pot everywhere" ... or perhaps I should be more gracious and say "Because they use it so frequently they have no QA for where NOT to use NOLOCK", and juniors get into, or are told by line-manager, "everywhere habit") ... so if my code does a ROLLBACK their API will process the records anyway.

    But the double-results and missing-results, when index block split occurs, are a far more significant problem where my users make a business decision based on the results or, worse, a shortened-results list is then used for a subsequent update.

    How often do they happen? I have no idea, I have never found a way to find out, but I do trap ERROR 601 and that gets tripped several times a week in Vendor code. So on the one hand that is a measure that the problem exists, on the other hand how well did the APP recover each time some benign operation aborted with a (fatal) 601? I very much doubt that the vendors have tested their code for every possible such fail point.

    Our company policy is, now, to require new vendors to state where they use NOLOCK and if they have "pepper-pot" policy we go elsewhere.

    ReplyDelete