Enterprise Architecture & Integration, SOA, ESB, Web Services & Cloud Integration

Enterprise Architecture & Integration, SOA, ESB, Web Services & Cloud Integration

Friday 31 January 2014

HTTP 500 error and Chunked encoding in OSB/WebLogic

Till recently, there was a strange issue seen in peak time with my production WebLogic Server. I have a bunch of Web Services deployed in CXF in WebLogic server. The message flow is as below: -

External Load balancer --> OSB --> External Load balancer --> (CXF in) Web Logic


While up to 95 to 98% of the requests sent to WebLogic had been processed successfully, the remaining requests were timed out in 30 seconds by WebLogic server. When I looked at the access log generated in the WebLogic server, all timed out requests had recorded with HTTP status code 500.  It took a while to understand, isolate and resolve the issue.

I did a thread dump analysis and saw several hogging threads with a kind of trace given below:


"[ACTIVE] ExecuteThread: '43' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE native
         
                java.net.SocketInputStream.socketRead0(Native Method)
         
                java.net.SocketInputStream.read(SocketInputStream.java:129)
         
                weblogic.servlet.internal.PostInputStream.read(PostInputStream.java:142)
         
                weblogic.utils.http.HttpChunkInputStream.readChunkSize(HttpChunkInputStream.java:115)
         
                weblogic.utils.http.HttpChunkInputStream.initChunk(HttpChunkInputStream.java:74)
         
                weblogic.utils.http.HttpChunkInputStream.skip(HttpChunkInputStream.java:203)
         
                weblogic.utils.http.HttpChunkInputStream.skipAllChunk(HttpChunkInputStream.java:378)
         
                weblogic.servlet.internal.ServletInputStreamImpl.ensureChunkedConsumed(ServletInputStreamImpl.java:35)
         
                weblogic.servlet.internal.ServletRequestImpl.skipUnreadBody(ServletRequestImpl.java:194)
         
                weblogic.servlet.internal.ServletRequestImpl.reset(ServletRequestImpl.java:152)
         
                weblogic.servlet.internal.MuxableSocketHTTP.requeue(MuxableSocketHTTP.java:195)
         
                weblogic.servlet.internal.VirtualConnection.requeue(VirtualConnection.java:329)
         
                weblogic.servlet.internal.ServletResponseImpl.send(ServletResponseImpl.java:1538)
         
                weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1455)
         
                weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
         
                weblogic.work.ExecuteThread.run(ExecuteThread.java:173)

After further investigation, I found that the WebLogic server was not receiving the message within the default POST TIMEOUT (30 seconds) configured in the WebLogic server. Increasing the time out value to 45 or 60 seconds did not help. It only aggravated the performance issue.

Let's look at how the issue was resolved. When you define a business service in Oracle Service Bus (OSB), you would have seen a parameter namely "Use Chunked Streaming Mode" under "HTTP Transport Configuration" section. The default value was configured as "Enabled". Apparently, the load balancer was not able to handle requests with chuncked encoding in peak time. It was unable to transmit the request message to the WebLogic server within the configured post timeout period. It can be validated against the thread dump shown above. After changing the value from "Enabled" to "Disabled", the 500 error has been eliminated permanently and now the server is in good health.

Thanks for reading this post and hope you liked the tip. Please feel free to post your comment here.