All Possible ways - How to Debug a Java Application End - to -End?

Troubleshooting or “Debugging” a Java based application is one of the traits of a highly skilled software engineer. To identify precisely how skillful a software engineer is, it can be attributed to how fast he finds the problem in a software application.

Exceptions logged into the log file are easier to figure out, hence in this blog I am omitting those. When we know the root cause, finding solutions is easy.

Applications have “Ripple effect” on the system, it may sometimes be so far reaching that it brings the whole application down. Say for e.g Your application is running in tomcat and uses a san mount(assuming it’s a hard mount). If mount goes away (or unresponsive) tomcat stops serving requests. But Why?

Here is the answer :

When mount becomes unresponsive the threads reading/writing goes in a hung state, which makes the current running tomcat thread busy/waiting state, and over time as the requests of reading/writing mount increase, there will be a time when all threads of tomcats will go in the busy-wait state.

Hence the Tomcat stops serving and clients start receiving .

Let’s discuss the tools of the trade which could help to identify the problem:

Debugging the issues on HOST machine :

  1.  (list of open file pointers which are still consuming memory)

Running below command will list the open file pointers, and if this list remains the same, then most probably your application has not closed the stream properly, thus Operating system is still holding on it, and hence the memory available on the machine is reduced as this is not freed up.

Ripple Effect -> since if this list is growing, it will cause reduction in available memory for processes, and it may start causing GC in application and worst case scenario OS can kill the java process due to memory unavailability.

$ 
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
XXXXX 3342 tomcat 5w REG 8,1 0 108573 /var/tmp/data/0_00000 ()
java 139971 tomcat 61r REG 8,1 10 8471874 /var/tmp/ABC.csv ()
java 139971 tomcat 69r REG 8,1 148003 8472018 /var/tmp/XYZ.mp3 ()

2. 

Through this command you can monitor system resources, processes, their usage and much more. If you understand the output of this command it will help you debug the myriad variety of issues.

For e.g. Your have 4 core machine with hyperthreading enabled that makes virtually 8 cores, hence if your load average above is 10, then the machine is overworked, which internally will cause too much thread context switching, which internally will impact performance.

No of Threads to be used within an application according to the formula  where N is the number of CPU’s, WT is the waiting time and ST is the Execution time as it is written

 value, which is the time the CPU spends waiting for I/O to complete. This will tell you that your too much IO is happening, either your hardware or application requires improvement.

 value to determine the priority of a process. A process with a high “nice” gets a low priority and a low “nice” value means high priority.

Detailed description of top command output is 

3)  command / command

This network statistics command is used for monitoring network connections both incoming and outgoing, viewing routing tables, interface statistics etc. It is very useful in terms of network troubleshooting and performance measurement.

Recv-Q, Send-Q , these two values tell us how much data is in the queue for that socket, waiting to be read (Recv-Q) or sent (Send-Q).

 on  — When the receive buffer fills, TCP closes the receive window, which inhibits the sender from sending. The sender can’t send more data than the receiving TCP advertises in its receive window. This may cause sender application threads to freeze.

 on  — This means the other side TCP implementation at the OS level have been stuck and has stopped sending ACK for my data packets, this is typically an OS Level issue(probability of this is rare though).

 tcp/udp dump can be used to do network packet analysis.

3) command

 command buffer provides a lot of different type of messages and logs for Hard Drive, Cpu, Ethernet etc related logged by the kernel.

This command logs non-functioning hardware/erratic behavior/firmware/driver/time related etc issues. These issues sometimes impact the smooth functioning of the application. If you are unable to figure out anything, do look for the hardware errors using .

Other Notable tools and system which can be used to monitor/identify the issue on a specific host are 

Debugging the application on Host Machine :

Now let's discuss the various steps to debug the application. Below attached are some helper utilities which can be used to debug various aspects of request/application(you are free to modify and use it as per your requirement).

a)

  1. You can use  for collecting Query statistics if you are using Plain old JDBC or If you are using an ORM, they have their own statistics reporting of connection/query. For e.g in hibernate
setting below attribute<persistence>
<persistence-unit name="my-persistence-unit">
...
<properties>
<property name="" value="" />
...
</properties>
</persistence-unit>
</persistence>

2. 

This is another way to identify long running queries, examining various mysql thread states etc. This gives vital information in identification in problematic queries/tables which later on can be looked in detail.

3) 

For Mysql : set below properties to log slow executing logs,

set global log_slow_queries = 1;
set global slow_query_log_file = <some file name>;

Or alternatively you can set the this options in the my.cnf/my.ini option files

log_slow_queries = 1; 
slow_query_log_file = <some file name>;

b) :

Always add below GC options in the startup script of your tomcat application so that Garbage collection details can be logged.

GCViewer is a good tool to analyse the gc logs, you can download the same from here.

Most application developers prefer using G1GC without knowing the intricacies of Garbage collection. G1GC (though most preferred) would be preferable for any JVM that is in the path of a user’s browser, and ParallelGC for applications which have more asynchronous processing.

Eric Abbott has written a beautiful piece on G1GC, a  for all those who want to understand and fine tune their G1GC. The link is .

c) 

Most of the load balanced applications use mount(nfs/sftp etc) to share file data among there load balanced instances. Many times these mounts get stuck or stop responding, this leads the application threads in the hung state(as explained in the example given at the start).

Below is an example of nfs mount, its connectivity can be checked as below :

# on server


# on client
mount node1:/mnt/media /mnt/media
#works fine here

# on server


# on client
#shows the list of nfs mounts
/my_enterprise host0026:/data/my_enterprise
Flags: rw,sync,relatime,vers=4.0,rsize=8192,wsize=8192,namlen=255,,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=20.20.20.7,local_lock=none,addr=20.20.20.26

d) Thread dump

Thread dumps are a vital tool to understand state of running threads within the JVM. Thread dumps help to analyze thread contention and deadlock scenarios, more info can be found .

There are many ways to take thread dump, cleanest way without opening port on your tomcat server is below:

$ tail -f /<path to tomcat logs>/catalina.out >> ~/thread-dump-1 &
$ sudo -u tomcat-user kill -3 <pid>
$ kill %1
$ less ~/thread-dump-1

e)  :

Tomcat web application manager provides request connector information, this helps in debugging the exhaustion of tomcat worker threads, memory leaks etc.

For e.g Below connector information at a given point of time tells us this tomcat has enough threads to serve requests when max threads equals current thread busy count, user will start receiving 503 Service Unavailable as there are no available tomcat worker threads to service.


Max threads: 1000 Current thread count: 163 Current thread busy: 4 Keep alive sockets count: 24
Max processing time: 2290247 ms Processing time: 2051071.2 s Request count: 70904137 Error count: 533477 Bytes received: 102431.25 MB Bytes sent: 72495.65 MB

f) 

Regular expressions sometimes take too much time due to catastrophical backtracking. You can first monitor the execution time of your regular expression match in your application. And if you find it exorbitant use https://regex101.com to find behind the scenes regex match.

Though I would not recommend this but sometimes canceling a long running regular Expression is better choice. Check out the approaches mentioned on Stackoverflow for canceling the operation after stipulated period .

g) 

Cache always what is frequently used and does not change much over a period of time, do not cache everything. Always log the stats of the cache i.e no of entries in the cache, memory occupied by it etc at regular intervals so that you are aware that how much is used and will help in avoiding OutofMemory Errors.

File encoding:

Windows and Unix have different file encoding, and many times they cause problems. E.g Windows use CRLF line terminator while Unix just LF line terminator.

You can read more on this .

Nothing can beat remote debugging (if scale is not the issue), If you are allowed to remote debug the application deployed on production environment you can follow below mentioned steps :

This article is based on years of experience that I have gathered by working on Java based enterprise applications . There might be many more steps to debug, many more tools etc, above compilation is a small check list, it may not be a comprehensive list as every application differs in terms of design, some applications hog Memory, some IO or some CPU etc.

I have excluded debugging exceptions that are logged into the log file, as they are easier to figure out (for the developer) based on the line number where the error occurred. I hope it helps.



What am I missing here ? Let me know in comments section and I'll add in!

What’s next? Subscribe Learn INQuiZitively to be the first to read my stories.