Then why was it *only* happening on the thread I had tried to reply to? I could not read that thread, could not reply to it although it apparently was accepting the reply but not continuing, and accessing any other part of the forum worked.
Because the database record was locked, simple... read up on how database
row locking works.
I really don't know why people are challenging the answer to the issue, I have explained exactly what the cause is, informed why the forum behaved how it did, stated that this is all logged and clearly visible on the server, and stated exactly what has triggered the fault. I have ample evidence to explain everything here that has been reported, yet it is still being challenged... doesn't make much sense to me.
Simple fact of the matter is, a 502 error comes from the server, not your browser and they are never (I repeat, NEVER) served with headers that tell your browser to cache the error. The logs clearly show a timeout communicating with the PHP instance it had established a connection to, the PHP logs report a loss of connection to the HTTP server when it attempted to write it's reply, and the database showed queries pending completion due to lock contention. The locks reported by the database were caused by the database maintenance tasks not completing in a timely manner (and as such were retaining row locks) due to high load on the server....
The cause of the high load is yet to be identified as it is sporadic and requires further analysis.
In the last instance logged, all the PHP processes were in use waiting on the database (confirmed without a doubt) and the HTTP server had disconnected from them wrongly assuming they had failed/crashed/hung, they were all gradually consumed until there were none left, this is confirmed by 20-30 access log entries with a 499 result code.
*IP-CENSORED* - - [16/Mar/2017:10:00:47 +1100] "GET /forum/stats/?expand=201605 HTTP/1.1" 499 0 "-" "Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.co
*IP-CENSORED* - - [16/Mar/2017:10:00:48 +1100] "GET /forum/stats/?expand=201309 HTTP/1.1" 499 0 "-" "Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.co
*IP-CENSORED* - - [16/Mar/2017:10:00:48 +1100] "GET /forum/microcontrollers/segger-j-link-edu-or-j-link-clone-which-one-would-you-get/25/ HTTP/1.1" 499 0 "https://www.google.hu/" "Mozilla/5.0 (Windows NT 10.0; WOW64
*IP-CENSORED* - - [16/Mar/2017:10:00:48 +1100] "GET /forum/crowd-funded-projects/asap-connect-the-future-of-usb-cables/?action=dlattach;attach=233543;image;PHPSESSID=*CENSORED* HTTP/1.1" 499 0
*IP-CENSORED* - - [16/Mar/2017:10:00:49 +1100] "GET /forum/metrology/lm399-based-10-v-reference/?action=dlattach;attach=55645;image HTTP/1.1" 499 0 "https://www.eevblog.com/forum/metrology/lm399-based-10-v-referenc
*IP-CENSORED* - - [16/Mar/2017:10:00:49 +1100] "GET /forum/testgear/tektronix-2465b-oscilloscope-teardown/?action=dlattach;attach=299653;image HTTP/1.1" 499 0 "https://www.eevblog.com/forum/testgear/tektronix-2465b-
*IP-CENSORED* - - [16/Mar/2017:10:00:49 +1100] "GET /forum/stats/?expand=201107%3BPHPSESSID%3D*CENSORED* HTTP/1.1" 499 0 "-" "Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.com
HTTP 499 in Nginx means that the client closed the connection before the server answered the request.
This is caused by the client (user or bot, whatever) giving up waiting for the page to load, which occurs when there is something preventing PHP returning in a timely manner. This is further proved by the following errors in the log at the same time.
2017/03/16 10:00:50 [error] 23246#0: *701749465 connect() to unix:/usr/local/php-fpm/run/eevblog.sock failed (11: Resource temporarily unavailable) while connecting to upstream...
Upon inspecting the process list at the time of the last incident there were well over 100 queries pending due to lock contention.... This is clearly not a client side cache/refresh issue, this is a server load issue, and forcing a full refresh (Ctrl+F5 or Clearing the Cache) has no effect other then putting more load on the server as you are now not just asking for the HTML for the page you are trying to view (since the content is dynamically generated by PHP here, it is never cached anyway), but making your browser download every image, script, css file even if it has a local cached copy of the static content, turning what would have been one request into tens of requests. Thankfully since this website uses CloudFlare most (not all) of this additional pointless load for static content is handled by the CF CDN.