There are certain things, which you don't learn in academics or training class, you develop those understanding after few years of work experience, and then you realize, it was very basic, how come I had missed that all those years. Understanding of how a multi-threaded Java program executes is one of such things. You definitely have heard about threads, how to start a thread, how to stop a thread, definitions like its independent path of execution, all funky libraries to deal with inter-thread communication, yet when it comes to debugging a multithreaded Java program, you struggle.
At least I can say this from my personal experience. Debugging is in my opinion real trainer, you will learn a subtle concept and develop an understanding which will last long, only through debugging.
In this article, I am going to talk about three important things about any program execution, not just Java, Thread, code, and data.
Once you have a good understanding of how these three work together, it would be much easier for you to understand how a program is executing, why a certain bug comes only sometimes, why a particular bug comes all time and why a particular bug is truly random.
And, if you are serious about mastering Java multi-threading and concurrency then I also suggest you take a look at the Java Multithreading, Concurrency, and Performance Optimization course by Michael Pogrebinsky on Udemy. It's an advanced course to become an expert in Multithreading, concurrency, and Parallel programming in Java with a strong emphasis on high performance
What is a program? In short, it's a piece of code, which is translated into binary instruction for CPU. CPU is the one, who executes those instructions e.g. fetch data from memory, add data, subtract data, etc. In short, what you write is your program, the Code.
What varies between the different execution of the same program, is data. It's not just mean restarting the program, but a cycle of processing, for example, for an electronic trading application, processing one order is one execution. You can process thousands of orders in one minute and with each iteration, data varies.
One more thing to note is that you can create Threads in code, which will then run parallel and execute code, which is written inside their run() method. The key thing to remember is threads can run parallel.
When a Java program starts, one thread known as the main thread is created, which executed code written inside the main method, if you create a thread, then those threads are created and started by the main thread, once started they start executing code written in their run() method. See Multithreading and Parallel Computing in Java if you are not familiar with how to create another thread in Java.
At least I can say this from my personal experience. Debugging is in my opinion real trainer, you will learn a subtle concept and develop an understanding which will last long, only through debugging.
In this article, I am going to talk about three important things about any program execution, not just Java, Thread, code, and data.
Once you have a good understanding of how these three work together, it would be much easier for you to understand how a program is executing, why a certain bug comes only sometimes, why a particular bug comes all time and why a particular bug is truly random.
And, if you are serious about mastering Java multi-threading and concurrency then I also suggest you take a look at the Java Multithreading, Concurrency, and Performance Optimization course by Michael Pogrebinsky on Udemy. It's an advanced course to become an expert in Multithreading, concurrency, and Parallel programming in Java with a strong emphasis on high performance
How Thread, Code, and Data work together in Program?
What is a program? In short, it's a piece of code, which is translated into binary instruction for CPU. CPU is the one, who executes those instructions e.g. fetch data from memory, add data, subtract data, etc. In short, what you write is your program, the Code.What varies between the different execution of the same program, is data. It's not just mean restarting the program, but a cycle of processing, for example, for an electronic trading application, processing one order is one execution. You can process thousands of orders in one minute and with each iteration, data varies.
One more thing to note is that you can create Threads in code, which will then run parallel and execute code, which is written inside their run() method. The key thing to remember is threads can run parallel.
When a Java program starts, one thread known as the main thread is created, which executed code written inside the main method, if you create a thread, then those threads are created and started by the main thread, once started they start executing code written in their run() method. See Multithreading and Parallel Computing in Java if you are not familiar with how to create another thread in Java.
So if you have 10 threads for processing Orders, they will run in parallel. In short, Thread executes code, with data coming in. Now, we will see three different kinds of issue, we talked about
1) Issues, which always comes
2) Issues, which comes only sometimes, but consistent with the same input
3) Issues, which is truly random
Issue one is most likely due to faulty code, also known as programming errors e.g. accessing the invalid index of an array, accessing Object's method after making it null, or even before initializing it. They are easy to fix, as you know their place.
You just need to have knowledge of programming language and API to fix this error.
The second issue is more likely to do with data than code. Only sometimes, but always come with the same input, could be because of incorrect boundary handling, malformed data like Order without certain fields for example price, quantity, etc.
Your program should always be written robustly so that it won't crash if incorrect data is given as input. The impact should only be with that order, the rest of the order must execute properly.
The third issue is more likely coming because of multithreading, where order and interleaving of multiple thread execution causing race conditions or deadlocks. They are random because they only appear if certain random things happen like thread 2 getting CPU before thread 1, getting a lock on incorrect order.
Remember, Thread scheduler and Operating system are responsible for allocating CPU to threads, they can pause them, take CPU from them at any time, all these can create a unique scenario, which exposes multithreading and synchronization issue.
Your code never depends upon the order of thread etc, it must be robust to run perfectly in all conditions.
In short, remember thread executes code with data given as input. Each thread work with the same code but different data. While debugging issues, pay attention to all three, Thread, Code, and data.
Further Learning
Multithreading and Parallel Computing in Java
Java Concurrency in Practice - The Book
Applying Concurrency and Multi-threading to Common Java Patterns
Java Concurrency in Practice Bundle by Heinz Kabutz
Other Java Multithreading Articles you may like
2 comments :
Great explanation, I think most important thing while dealing with thread related issue is finding out which thread accessing the variable that's why naming your thread is very important.
I prefer thread pool all the time, why would someone bother about creating and stopping thread when JDK can do it for you. If your purpose is full filled with parallel stream, that's even better, no need to worry about concurrency issues.
Post a Comment