First of all, I already searched for 502 error in Stackoverflow. There are a lot a threads, but the difference this time is that the error appears without a pattern and it’s not in Ubuntu.
Everything works perfectly, but about once a week my site shows: 502 Bad Gateway.
After this first error, every connection starts showing this message. Restarting MySQL + PHP-FPM + Nginx + Varnish doesn’t work.
I have to clone this instance, and make another one, to get my site up again (It is hosted in Amazon EC2).
In Nginx log it shows these line again and again:
[error] 16773#0: *7034 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 127.0.0.1
There are nothing in MySQL or Varnish log. But in PHP-FPM it shows theses type of line:
WARNING: [pool www] child 18978, script '/var/www/mysite.com/index.php' (request: "GET /index.php") executing too slow (10.303579 sec), logging
WARNING: [pool www] child 18978, script '/var/www/mysite.com/index.php' (request: "GET /index.php") execution timed out (16.971086 sec), terminating
Inside PHP-FPM slowlog it was showing:
[pool www] pid 20401
script_filename = /var/www/mysite.com/index.php
w3_require_once() /var/www/mysite.com/wp-content/plugins/w3-total-cache/inc/define.php:1478
(Inside the file “define.php” at line number 1478, it has this line of code: require_once $path;)
I thought the problem was with W3 Total Cache plugin. So I removed W3 Total Cache.
About 5 days later it happened again with this error in PHP-FPM slow log:
script_filename = /var/www/mysite.com/index.php
wpcf7_load_modules() /var/www/mysite.com/wp-content/plugins/contact-form-7/includes/functions.php:283
(Inside the file “functions.php” at line number 283, it has this line of code: include_once $file;)
The other day, the first error occurred in another part:
script_filename = /var/www/mysite.com/wp-cron.php
curl_exec() /var/www/mysite.com/wp-includes/class-http.php:1510
And again a different part of code:
[pool www] pid 20509
script_filename = /var/www/mysite.com/index.php
mysql_query() /var/www/mysite.com/wp-includes/wp-db.php:1655
CPU, RAM … everything is stable when this error occurs (less then 20% usage).
I tried everything, but nothing worked:
- Moved to a better server (CPU and RAM)
- Decreased timeout from Nginx, PHP-FPM, MySQL (my page loads quickly, so I decrease timeout to kill any outlier process)
- Changed the number of PHP-FPM spare servers
- Changed a lot of configuration from Nginx and PHP-FPM
- I know that there is a bug with PHP-FPM and Ubuntu that could cause this error. But I don’t think there is a bug with Amazon instances (Red Hat). (And I don’t want to migrate from PHP-FPM to Socks because I’ve read that Socks don’t works well under heavy load)
This was happening about every week since 5 months ago. I’m desperate.
I got to the point that I even put Nginx and PHP-FPM in Linux’s crontab, to restart theses services every day. But it didn’t work too.
Anyone has any suggestion where I can solve this problem? Anything will help!!
Server:
Amazon c3.large (2 core and 3.75GB RAM)
Linux Amazon Red Hat 4.8.2 64bits
PHP-FPM:
listen = 127.0.0.1:9000
listen.allowed_clients = 127.0.0.1
listen.mode = 0664
pm = ondemand
pm.max_children = 480
pm.start_servers = 140
pm.min_spare_servers =140
pm.max_spare_servers = 250
pm.max_requests = 50
request_terminate_timeout = 15s
request_slowlog_timeout = 10s
php_admin_flag[log_errors] = on
Nginx:
worker_processes 2;
events {
worker_connections 2048;
multi_accept on;
use epoll;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
types_hash_max_size 2048;
server_tokens off;
client_max_body_size 8m;
reset_timedout_connection on;
index index.php index.html index.htm;
keepalive_timeout 1;
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
fastcgi_send_timeout 30s;
fastcgi_read_timeout 30s;
listen 127.0.0.1:8080;
location ~ .php$ {
try_files $uri =404;
include fastcgi_params;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_keep_conn on;
fastcgi_pass 127.0.0.1:9000;
fastcgi_param HTTP_HOST $host;
}
}
I would start by tuning some configuration parameters.
PHP-FPM
I think that your pm values are somewhat off, a bit higher than I’ve normally seen configured on server around your specs… but you say that memory consumption it’s normal so that’s kind of weird.
Anyway… for
pm.max_children = 480
, considering that by default WordPress increases the memory limit to 40MB, you would end up using up to 18 gigs of memory, so you definitely would like to lower that.Check the fourth part on this post for more info about that: http://www.if-not-true-then-false.com/2011/nginx-and-php-fpm-configuration-and-optimizing-tips-and-tricks/
If you’re using… let’s say 512MB for nginx, MySQL, Varnish and other services, you would have about 3328 MB for php-fpm… divided by 40 MBs per process,
pm.max_children
should be about 80… but even 80 it’s very high.It’s probable that you can also lower the values of
pm.start_servers
,pm.min_spare_servers
andpm.max_spare_servers
. I prefer to keep them low and only increase them it’s necessaryFor
pm.max_requests
you should keep the default of 500 to avoid server respawns. I think it’s only advisable to lower it if you suspect memory leaks.Nginx
Change
keepalive_timeout
to 60 to make better use of keep alive.Other than that, I think everything looks normal.
I had this issue with Ubuntu, but
request_terminate_timeout
on PHP-FPM andfastcgi_send_timeout
+fastcgi_read_timeout
were enough to get rid of it.I hope you can fix it!