Saturday 24 April 2010

Pure Genius and mod_jk

I'm currently developing an application for MegaBank which makes heavy use of AJAX. The latest cut was throwing javascript errors in UAT but not in development. A bit of investigation with firebug showed that when an AJAX request returned 404 (which it does if the user supplies a non-existent cost code for example), the next AJAX request would respond with the SSO login page instead of the expected json response. Further investigation showed that the second AJAX response had been served by a different node than the one it was sent to, resulting in the SSO challenge.

Request had a cookie:
jsessionid=ASAD123DFDFS83242SDFASD9234234.node1

Response has a set-cookie header:
jsessionid=SDFSDF234DFSLFSD324234880SDFSDL.node2

i.e. our sticky session was becoming unstuck.

We reported the incident to our support team, who's spanked it straight back over the net with the wonderfully helpful "It's an application config issue". Great. Thanks. Our app doesn't do anything clever with cookies. There is no logout button. We never invalidate the session. We'd already checked that inbound request had the right jsession id, and also that the associated response had a set-cookie header from the wrong node. Furthermore the app server logs show that the second request didn't even hit the right node. Back to web support with an offer to demonstrate the problem and walk through why we believe it's an issue with the load balancer. Even if it turned out to be an issue with the app we would need their help to diagnose it.

Offer accepted. And "Pure Genius" arrives (That's not his real name, but it is what was written on his cuff links). The first words out of his mouth were, "We support 100s of applications and the load balancer works fine. It must be your application". It doesn't mater that this is a bank, where 99% of the applications he supports are legacy and have probably never heard of AJAX. So I walk him through the process, show him the HTTP headers, show him the logs, explain that this only seems to happen after a 404. He goes away and helpfully reports...

"It's an application configuration issue."

Grrrrrrrr. Escalate. Now engineering areinvolved. Helpfully "Pure Genius" has already told them it's a problem with our app, so they've they've prioritised it to the bottom of their queue. Grrrrrrrr. Another round of the escalation game follows and we get someone from engineering who knows what they're doing and isn't wearing blinkers. Guess what? There's a bug in mod_jk which causes it to fail a session over if it gets a non 200 error code with no content, so we added the words "Pure Genius" to the response and we have sticky sessions again. I guess he helped after all.