Enterprise Architecture & Integration, SOA, ESB, Web Services & Cloud Integration

Enterprise Architecture & Integration, SOA, ESB, Web Services & Cloud Integration

Tuesday, 3 December 2013

WebLogic stuck threads and java.net.SocketInputStream.socketRead0

If you are a WebLogic developer or an administrator, you would have rarely missed a very common issue - Stuck threads. When a WebLogic thread is unable to finish a task within 10 minutes (which is default), the thread is marked as "STUCK". When there are too many stuck threads in a server instance, then the health of the server instance will go into "Warning" state. It will be tempting for one to increase the stuck thread time from 10 minutes to say 20 minutes to avoid stuck threads. But, it will not solve the issue, only help postponing the issue. Let me share my experience with stuck threads here.

When I was contacted for help by the developers/support folks in my company, I immediately looked at the health of the server. It was in "Warning" state. The reason for "Warning" state was "ThreadPool has stuck threads". This clearly indicated that there were stuck threads for some reasons.

The immediate step I remembered was to take the thread dump of the WebLogic server instance. If you are not aware, the thread dump can be taken in three ways (may be more, I know only 3 :-) ): a) Kill -3 b) jstack or c) WebLogic admin console. After collecting the thread dump, carefully look at the dump - you will be able to see all the threads and respective state. In my case, I found the following stuck thread.



"[STUCK] ExecuteThread: '14' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE native
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
oracle.net.ns.Packet.receive(Packet.java:293)
oracle.net.ns.DataPacket.receive(DataPacket.java:92)
oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:174)
oracle.net.ns.NetInputStream.read(NetInputStream.java:119)
oracle.net.ns.NetInputStream.read(NetInputStream.java:94)
oracle.net.ns.NetInputStream.read(NetInputStream.java:79)
oracle.jdbc.driver.T4CSocketInputStreamWrapper.readNextPacket(T4CSocketInputStreamWrapper.java:122)
oracle.jdbc.driver.T4CSocketInputStreamWrapper.read(T4CSocketInputStreamWrapper.java:78)
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1040)
oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1016)
oracle.jdbc.driver.T4C8TTILob.receiveReply(T4C8TTILob.java:847)
oracle.jdbc.driver.T4C8TTIClob.read(T4C8TTIClob.java:227)
oracle.jdbc.driver.T4CConnection.getChars(T4CConnection.java:2652)
oracle.sql.CLOB.getChars(CLOB.java:288)
oracle.jdbc.driver.OracleClobReader.needChars(OracleClobReader.java:178)
oracle.jdbc.driver.OracleClobReader.read(OracleClobReader.java:141)
oracle.xml.parser.v2.XMLCharReader.fillBuffer(XMLCharReader.java:183)
oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:450)
oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2363)
oracle.xml.parser.v2.XMLReader.tryRead(XMLReader.java:1087)
oracle.xml.parser.v2.XMLReader.scanXMLDecl(XMLReader.java:2922)
oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:269)
oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:312)
oracle.xdb.XMLType.getDOM(XMLType.java:1806)
com.xxxxx.yyy.zzz.getAAAA(BBBBB.java:814)
com.xxxxx.businessdeligate.yyyyyy.getAAAA(BBBBBB.java:337)
sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
com.xxxxx.service.DDDDDD.processRequest(CCCCCC.java:33)
sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
java.lang.reflect.Method.invoke(Method.java:597)
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:124)
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:82)
org.apache.cxf.jaxws.JAXWSMethodInvoker.invoke(JAXWSMethodInvoker.java:100)
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:68)
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:56)
org.apache.cxf.workqueue.SynchronousExecutor.execute(SynchronousExecutor.java:37)
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:92)
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:207)
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:73)
org.apache.cxf.transport.servlet.ServletDestination.doMessage(ServletDestination.java:79)
org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:256t)
org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160)
org.apache.cxf.transport.servlet.AbstractCXFServlet.invoke(AbstractCXFServlet.java:170)
org.apache.cxf.transport.servlet.AbstractCXFServlet.doPost(AbstractCXFServlet.java:148)
javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227)
weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125)
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:300)
weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:183)
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.doIt(WebAppServletContext.java:3686)
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3650)
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
weblogic.security.service.SecurityManager.runAs(SecurityManager.java:121)
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2268)
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2174)
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1446)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
weblogic.work.ExecuteThread.run(ExecuteThread.java:173)


The stuck thread could occur because of several reasons including the following: -
  1. Remote server that you are accessing could be taking more time than expected
  2. Network delay
In an ideal situation, you should talk to your remote service provider to reduce the response time (case 1) or network owner to check the latency issue (case 2). They should fix the issues at their end which will help you in turn to eliminate the stuck threads. For some reasons, the resolution might be time consuming and you may require to put an immediate stop to stuck threads as it would lead to thread exhaustion and higher system resource utilization. It is an unwanted situation. As a workaround, you can try with configuring time outs.

Query time out
In several cases, I have seen "Statement.execute" taking more time or sometime hanging infinitely because the database was not responding properly. You would want to time out these execute calls to save yourself from stuck thread. If you use WebLogic connection pool, you can configure statement time out parameter to a value which is acceptable to your application, say 15 seconds. The another way of doing this is to configure "Statement.setQueryTimeout" in the java code which will provide you fine control over timeout.

Read time out
In my case, after configuring query time out, the number of stuck threads has gone down drastically. But, not fully solved and still few were left. Further investigation shown me that the thread was hanging because of socket read operation. After searching entire Web, I found a useful WebLogic JVM parameter  - oracle.jdbc.ReadTimeout. The socket will wait till x seconds (as configured in the parameter) and then times out.

After configuring these two parameters, now my WebLogic server is free from any stuck threads. However, as I mentioned above, this should be used as a workaround and you should try to fix the real issue - network latency or the response time of the remote operation.

Thank for reading my post and please leave your comments or any further questions here.

3 comments: