Discussion:
[modwsgi] Re: apache defunct processes
Cristiano Coelho
2016-12-31 01:11:11 UTC
Permalink
Sorry for bringing up sucn an ancient post but this is the closest thing
similar to my issue I have found.

With apache 2.4 and mod_wsgi 3.5 and python 2.7, I am having a similar
issue, exactly on apache reboots.

Not all the time, but some times, the wsgi processes would stay alive after
an apache restart, and need to be manually killed with sudo kill pid. The
worst part of this is that the process keeps running, this is known because
the process which is serving a django app, starts some background threads
with the app that perform some tasks periodically, and when this issue
happens those tasks start to stack up since duplicated logs appear when
only 1 server and 1 process is supposed to be running.
The apache process is restarted through amazon AWS elastic beanstalk, which
is a managed service, but the logs shows that a SIGTERM is attempted and
after 3 failures a SIGKILL is sent, yet the process stays alive and doing
tasks.

Note that all background tasks are either daemon threads or ThreadPool
instances from the multiprocessing library.
Graham,
Haven't found any evidence of apache crashing, the whole setup has
been running very successfully for the last two years. I usually use
force-reload when changes are made to virtual hosts.
The memory has definitely been increasing due to the orphaned
processes, especially when I get 2 or 3 processes per application
orphaned however this takes a few weeks to occur and using the
mod_wsgi inactivity timeout helps as these processes appear to drop
down to minimal memory consumption.
I have upgraded to v3 of mod_wsgi so will monitor for a few weeks and
report back if I can't resolve. Thanks again for your assistance.
Paddy
Graham,
Thank you for such a detailed response. As a first step I will update
mod_wsgi to a more recent version!
But can you confirm you are using daemon mode and what the
WSGIDaemonProcess configuration is?
WSGIDaemonProcess designcliq user=django group=django threads=25
display-name=%{GROUP} inactivity-timeout=3600
WSGIProcessGroup designcliq
I usually have to kill them individually to get rid of them and
free
up the memory.
Technically you can't kill defunct processes, they are actually
already dead, so not sure what you are doing.
Late night reboot.
Here is a more detailed example of what I am trying to get my head
around.
The following command shows some django applications twice, for
example (wsgi:designcliq) appears twice under parent id's 10436 and
19648 (top of output).
django 19686 19648 0 19:29 ? 00:00:00 (wsgi:designcliq) -k
start
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 443 19648 0 20:43 ? 00:00:00 (wsgi:erinaheight -k
start
django 476 19648 0 20:43 ? 00:00:00 (wsgi:simplystyli -k
start
django 593 19648 0 20:44 ? 00:00:00 (wsgi:gilliantenn -k
start
django 3719 19648 0 21:00 ? 00:00:00 (wsgi:pipair) -k
start
django 5548 19648 0 21:10 ? 00:00:00 (wsgi:keyboardkid -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
django 11371 19648 0 21:42 ? 00:00:00 (wsgi:classicinte -k
start
paddy 13613 4428 0 21:55 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19651 19648 0 19:29 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19700 19648 0 19:29 ? 00:00:00 (wsgi:doorssincer -k
start
django 19769 19648 0 19:29 ? 00:00:00 (wsgi:fasttraku) -k
start
django 19853 19648 0 19:29 ? 00:00:00 (wsgi:mariatennan -k
start
django 19913 19648 0 19:29 ? 00:00:00 (wsgi:talkoftheto -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
django 30964 19648 0 20:33 ? 00:00:00 (wsgi:funkparty) -k
start
If I then stop apache and run the same command some applications still
show up running under parent 10436 even though apache has been
stopped.
* Stopping web server apache2
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
paddy 14014 4428 0 21:57 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
Any ideas?
Have you seen any evidence that Apache itself is crashing?
Alternatively, have you been doing anything like attaching debuggers
direct to Apache?
Events like that can sometimes leave processes around, as can other
things.
The operating system generally has a job to go around and cleanup
zombie processes that haven't been reclaimed and which may be orphaned
in some way.
As I pointed out, zombie processes don't actually consume memory and
it is just an entry in the process table. Thus, unless you are seeing
issues such as growing system wide memory usage as a result, or of
Apache no longer serving requests, then I wouldn't be overly
concerned.
BTW, when you do Apache restarts, are you doing a 'restart' or a
'graceful restart'. A graceful restart could possibly result in
processing hanging around as in that case Apache doesn't forcibly kill
them off and so if they don't shutdown promptly themselves, and for
some reason Apache didn't clean them up properly when they do exit,
they could remain in the process table.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2016-12-31 02:16:39 UTC
Permalink
The version of mod_wsgi you are using is over 50 versions behind the latest and is a version which was merely a patch release of a version from over 5 years ago. I can only suggest you upgrade to the latest mod_wsgi version as it is not supported unless you can manage to force your operating system vendor to support it.

There is a known orphaned processes issue with mod_wsgi, but is only known as being a problem with certain versions of Apache 2.2 and has never been seen with Apache 2.4. It also never occurred for an Apache restart, only for internal daemon process restarts, albeit it was still the result of some bug in Apache which seemed to have been resolved around Apache 2.2.18.

I can only speculate that you aren’t using daemon mode, or not always, and requests are running in, or leaking into Apache child worker processes. On a graceful restart, Apache can let worker processes linger so they handle keep alive connections, so if your code was running in embedded mode processes, Apache may well not be shutting them down straight away. Apache could then be loosing track of them as from memory there are cases it will give up on processes when a graceful restart occurs.

I would ensure you are using daemon mode of mod_wsgi, not embedded mode. Ensure you set at global Apache configuration scope:

WSGIRestrictEmbedded On

so that use of embedded mode of mod_wsgi is prohibited and you will get errors if a WSGI application request is wrongly delegated to a Apache worker process. This will highlight that you have some issue with your mod_wsgi configuration for delegating requests to daemon mode processes.

Graham
Sorry for bringing up sucn an ancient post but this is the closest thing similar to my issue I have found.
With apache 2.4 and mod_wsgi 3.5 and python 2.7, I am having a similar issue, exactly on apache reboots.
Not all the time, but some times, the wsgi processes would stay alive after an apache restart, and need to be manually killed with sudo kill pid. The worst part of this is that the process keeps running, this is known because the process which is serving a django app, starts some background threads with the app that perform some tasks periodically, and when this issue happens those tasks start to stack up since duplicated logs appear when only 1 server and 1 process is supposed to be running.
The apache process is restarted through amazon AWS elastic beanstalk, which is a managed service, but the logs shows that a SIGTERM is attempted and after 3 failures a SIGKILL is sent, yet the process stays alive and doing tasks.
Note that all background tasks are either daemon threads or ThreadPool instances from the multiprocessing library.
Graham,
Haven't found any evidence of apache crashing, the whole setup has
been running very successfully for the last two years. I usually use
force-reload when changes are made to virtual hosts.
The memory has definitely been increasing due to the orphaned
processes, especially when I get 2 or 3 processes per application
orphaned however this takes a few weeks to occur and using the
mod_wsgi inactivity timeout helps as these processes appear to drop
down to minimal memory consumption.
I have upgraded to v3 of mod_wsgi so will monitor for a few weeks and
report back if I can't resolve. Thanks again for your assistance.
Paddy
Graham,
Thank you for such a detailed response. As a first step I will update
mod_wsgi to a more recent version!
But can you confirm you are using daemon mode and what the
WSGIDaemonProcess configuration is?
WSGIDaemonProcess designcliq user=django group=django threads=25
display-name=%{GROUP} inactivity-timeout=3600
WSGIProcessGroup designcliq
I usually have to kill them individually to get rid of them and free
up the memory.
Technically you can't kill defunct processes, they are actually
already dead, so not sure what you are doing.
Late night reboot.
Here is a more detailed example of what I am trying to get my head
around.
The following command shows some django applications twice, for
example (wsgi:designcliq) appears twice under parent id's 10436 and
19648 (top of output).
django 19686 19648 0 19:29 ? 00:00:00 (wsgi:designcliq) -k
start
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 443 19648 0 20:43 ? 00:00:00 (wsgi:erinaheight -k
start
django 476 19648 0 20:43 ? 00:00:00 (wsgi:simplystyli -k
start
django 593 19648 0 20:44 ? 00:00:00 (wsgi:gilliantenn -k
start
django 3719 19648 0 21:00 ? 00:00:00 (wsgi:pipair) -k
start
django 5548 19648 0 21:10 ? 00:00:00 (wsgi:keyboardkid -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
django 11371 19648 0 21:42 ? 00:00:00 (wsgi:classicinte -k
start
paddy 13613 4428 0 21:55 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19651 19648 0 19:29 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19700 19648 0 19:29 ? 00:00:00 (wsgi:doorssincer -k
start
django 19769 19648 0 19:29 ? 00:00:00 (wsgi:fasttraku) -k
start
django 19853 19648 0 19:29 ? 00:00:00 (wsgi:mariatennan -k
start
django 19913 19648 0 19:29 ? 00:00:00 (wsgi:talkoftheto -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
django 30964 19648 0 20:33 ? 00:00:00 (wsgi:funkparty) -k
start
If I then stop apache and run the same command some applications still
show up running under parent 10436 even though apache has been
stopped.
* Stopping web server apache2
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
paddy 14014 4428 0 21:57 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
Any ideas?
Have you seen any evidence that Apache itself is crashing?
Alternatively, have you been doing anything like attaching debuggers
direct to Apache?
Events like that can sometimes leave processes around, as can other things.
The operating system generally has a job to go around and cleanup
zombie processes that haven't been reclaimed and which may be orphaned
in some way.
As I pointed out, zombie processes don't actually consume memory and
it is just an entry in the process table. Thus, unless you are seeing
issues such as growing system wide memory usage as a result, or of
Apache no longer serving requests, then I wouldn't be overly
concerned.
BTW, when you do Apache restarts, are you doing a 'restart' or a
'graceful restart'. A graceful restart could possibly result in
processing hanging around as in that case Apache doesn't forcibly kill
them off and so if they don't shutdown promptly themselves, and for
some reason Apache didn't clean them up properly when they do exit,
they could remain in the process table.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2016-12-31 02:57:09 UTC
Permalink
Hello, thanks for the quick response!

This apache deploy is done automatically by AWS Elastic Beanstalk, so I
don't really have control over the version used, I'm amazed it is using a 5
y.o version.
I know for sure it is running already with Daemon Mode since I have looked
at the wsgi config they provide.

At the end is part of the wsgi file config and also the logs of the faulty
restart that caused the process to stay alive. The configuration is pretty
much provided by Amazon so one would expect it is ideal.

Also, now that you mention keep alive settings, is there any chance this
issue is caused by mpm_event and 100s keep alive settings combination? Will
mod_wsgi/apache try to wait until the connections are closed and this is
causing issues? This is really odd because there have been many successful
restarts even under load and many faulty restarts while the servers were
probably not being used.



-- wsgi.conf (partially)
LoadModule wsgi_module modules/mod_wsgi.so
WSGIPythonHome /opt/python/run/baselinenv
WSGISocketPrefix run/wsgi
WSGIRestrictEmbedded On
WSGIDaemonProcess wsgi processes=1 threads=10 display-name=%{GROUP} \
python-path=/opt/python/current/app:/opt/python/run/venv/lib64/python2.7/site-packages:/opt/python/run/venv/lib/python2.7/site-packages
user=wsgi group=wsgi \
home=/opt/python/current/app
WSGIProcessGroup wsgi
</VirtualHost>
-- Restart logs
[Fri Dec 30 18:26:46.825763 2016] [core:warn] [pid 24265:tid
140339875915840] AH00045: child process 24396 still did not exit, sending a
SIGTERM
[Fri Dec 30 18:26:48.827998 2016] [core:warn] [pid 24265:tid
140339875915840] AH00045: child process 24396 still did not exit, sending a
SIGTERM
[Fri Dec 30 18:26:50.830264 2016] [core:warn] [pid 24265:tid
140339875915840] AH00045: child process 24396 still did not exit, sending a
SIGTERM
[Fri Dec 30 18:26:52.832466 2016] [core:error] [pid 24265:tid
140339875915840] AH00046: child process 24396 still did not exit, sending a
SIGKILL
[Fri Dec 30 18:26:54.539770 2016] [suexec:notice] [pid 12669:tid
/usr/sbin/suexec)
[Fri Dec 30 18:26:54.550651 2016] [so:warn] [pid 12669:tid
140513528571968] AH01574: module expires_module is already loaded, skipping
[Fri Dec 30 18:26:54.550700 2016] [so:warn] [pid 12669:tid
140513528571968] AH01574: module deflate_module is already loaded, skipping
[Fri Dec 30 18:26:54.550791 2016] [so:warn] [pid 12669:tid
140513528571968] AH01574: module wsgi_module is already loaded, skipping
[Fri Dec 30 18:26:54.552750 2016] [auth_digest:notice] [pid 12669:tid
140513528571968] AH01757: generating secret for digest authentication ...
[Fri Dec 30 18:26:54.553328 2016] [lbmethod_heartbeat:notice] [pid
12669:tid 140513528571968] AH02282: No slotmem from mod_heartmonitor
[Fri Dec 30 18:26:54.553663 2016] [:warn] [pid 12669:tid 140513528571968]
mod_wsgi: Compiled for Python/2.7.9.
[Fri Dec 30 18:26:54.553671 2016] [:warn] [pid 12669:tid 140513528571968]
mod_wsgi: Runtime using Python/2.7.10.
[Fri Dec 30 18:26:54.554100 2016] [core:warn] [pid 12669:tid
140513528571968] AH00098: pid file /var/run/httpd/httpd.pid overwritten --
Unclean shutdown of previous Apache run?
[Fri Dec 30 18:26:54.555343 2016] [mpm_event:notice] [pid 12669:tid
140513528571968] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10
configured -- resuming normal operations
El viernes, 30 de diciembre de 2016, 23:16:46 (UTC-3), Graham Dumpleton
The version of mod_wsgi you are using is over 50 versions behind the
latest and is a version which was merely a patch release of a version from
over 5 years ago. I can only suggest you upgrade to the latest mod_wsgi
version as it is not supported unless you can manage to force your
operating system vendor to support it.
There is a known orphaned processes issue with mod_wsgi, but is only known
as being a problem with certain versions of Apache 2.2 and has never been
seen with Apache 2.4. It also never occurred for an Apache restart, only
for internal daemon process restarts, albeit it was still the result of
some bug in Apache which seemed to have been resolved around Apache 2.2.18.
I can only speculate that you aren’t using daemon mode, or not always, and
requests are running in, or leaking into Apache child worker processes. On
a graceful restart, Apache can let worker processes linger so they handle
keep alive connections, so if your code was running in embedded mode
processes, Apache may well not be shutting them down straight away. Apache
could then be loosing track of them as from memory there are cases it will
give up on processes when a graceful restart occurs.
I would ensure you are using daemon mode of mod_wsgi, not embedded mode.
WSGIRestrictEmbedded On
so that use of embedded mode of mod_wsgi is prohibited and you will get
errors if a WSGI application request is wrongly delegated to a Apache
worker process. This will highlight that you have some issue with your
mod_wsgi configuration for delegating requests to daemon mode processes.
Graham
Sorry for bringing up sucn an ancient post but this is the closest thing
similar to my issue I have found.
With apache 2.4 and mod_wsgi 3.5 and python 2.7, I am having a similar
issue, exactly on apache reboots.
Not all the time, but some times, the wsgi processes would stay alive
after an apache restart, and need to be manually killed with sudo kill pid.
The worst part of this is that the process keeps running, this is known
because the process which is serving a django app, starts some background
threads with the app that perform some tasks periodically, and when this
issue happens those tasks start to stack up since duplicated logs appear
when only 1 server and 1 process is supposed to be running.
The apache process is restarted through amazon AWS elastic beanstalk,
which is a managed service, but the logs shows that a SIGTERM is attempted
and after 3 failures a SIGKILL is sent, yet the process stays alive and
doing tasks.
Note that all background tasks are either daemon threads or ThreadPool
instances from the multiprocessing library.
Graham,
Haven't found any evidence of apache crashing, the whole setup has
been running very successfully for the last two years. I usually use
force-reload when changes are made to virtual hosts.
The memory has definitely been increasing due to the orphaned
processes, especially when I get 2 or 3 processes per application
orphaned however this takes a few weeks to occur and using the
mod_wsgi inactivity timeout helps as these processes appear to drop
down to minimal memory consumption.
I have upgraded to v3 of mod_wsgi so will monitor for a few weeks and
report back if I can't resolve. Thanks again for your assistance.
Paddy
Graham,
Thank you for such a detailed response. As a first step I will update
mod_wsgi to a more recent version!
But can you confirm you are using daemon mode and what the
WSGIDaemonProcess configuration is?
WSGIDaemonProcess designcliq user=django group=django threads=25
display-name=%{GROUP} inactivity-timeout=3600
WSGIProcessGroup designcliq
I usually have to kill them individually to get rid of them and
free
up the memory.
Technically you can't kill defunct processes, they are actually
already dead, so not sure what you are doing.
Late night reboot.
Here is a more detailed example of what I am trying to get my head
around.
The following command shows some django applications twice, for
example (wsgi:designcliq) appears twice under parent id's 10436 and
19648 (top of output).
django 19686 19648 0 19:29 ? 00:00:00 (wsgi:designcliq) -k
start
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 443 19648 0 20:43 ? 00:00:00 (wsgi:erinaheight -k
start
django 476 19648 0 20:43 ? 00:00:00 (wsgi:simplystyli -k
start
django 593 19648 0 20:44 ? 00:00:00 (wsgi:gilliantenn -k
start
django 3719 19648 0 21:00 ? 00:00:00 (wsgi:pipair) -k
start
django 5548 19648 0 21:10 ? 00:00:00 (wsgi:keyboardkid -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
django 11371 19648 0 21:42 ? 00:00:00 (wsgi:classicinte -k
start
paddy 13613 4428 0 21:55 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19651 19648 0 19:29 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19700 19648 0 19:29 ? 00:00:00 (wsgi:doorssincer -k
start
django 19769 19648 0 19:29 ? 00:00:00 (wsgi:fasttraku) -k
start
django 19853 19648 0 19:29 ? 00:00:00 (wsgi:mariatennan -k
start
django 19913 19648 0 19:29 ? 00:00:00 (wsgi:talkoftheto -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
django 30964 19648 0 20:33 ? 00:00:00 (wsgi:funkparty) -k
start
If I then stop apache and run the same command some applications still
show up running under parent 10436 even though apache has been
stopped.
* Stopping web server apache2
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
paddy 14014 4428 0 21:57 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
Any ideas?
Have you seen any evidence that Apache itself is crashing?
Alternatively, have you been doing anything like attaching debuggers
direct to Apache?
Events like that can sometimes leave processes around, as can other
things.
The operating system generally has a job to go around and cleanup
zombie processes that haven't been reclaimed and which may be orphaned
in some way.
As I pointed out, zombie processes don't actually consume memory and
it is just an entry in the process table. Thus, unless you are seeing
issues such as growing system wide memory usage as a result, or of
Apache no longer serving requests, then I wouldn't be overly
concerned.
BTW, when you do Apache restarts, are you doing a 'restart' or a
'graceful restart'. A graceful restart could possibly result in
processing hanging around as in that case Apache doesn't forcibly kill
them off and so if they don't shutdown promptly themselves, and for
some reason Apache didn't clean them up properly when they do exit,
they could remain in the process table.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an
.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2016-12-31 03:30:40 UTC
Permalink
Post by Cristiano Coelho
Hello, thanks for the quick response!
This apache deploy is done automatically by AWS Elastic Beanstalk, so I don't really have control over the version used, I'm amazed it is using a 5 y.o version.
I am not surprised. They base off Ubuntu and Debian/Ubuntu systems are quite bad when it comes to supplying up to date versions of packages. People like to think RHEL/CentOS are the worst, but RHEL/CentOS do a much better job than Debian based systems of supplying up to date, plus supported versions of packages. Debian/Ubuntu are becoming a source of huge pain for Open Source package maintainers because their users are always on old versions, yet still expect support, but don’t realise that package authors will not support such old versions, and Debian/Ubuntu maintainers don’t support them either. So you are using effectively unsupported software. Amazon compounds the problems by using older OS versions even longer and not providing more up to date versions. Not the brightest idea to be basing a business on unsupported software, but that is what companies do.
Post by Cristiano Coelho
I know for sure it is running already with Daemon Mode since I have looked at the wsgi config they provide.
At the end is part of the wsgi file config and also the logs of the faulty restart that caused the process to stay alive. The configuration is pretty much provided by Amazon so one would expect it is ideal.
Far from ideal. There are various things wrong with their configuration.
Post by Cristiano Coelho
Also, now that you mention keep alive settings, is there any chance this issue is caused by mpm_event and 100s keep alive settings combination? Will mod_wsgi/apache try to wait until the connections are closed and this is causing issues? This is really odd because there have been many successful restarts even under load and many faulty restarts while the servers were probably not being used.
Unrelated. The keep alive settings only apply to Apache child worker processes and not to mod_wsgi daemon mode processes. Because of how Apache handles the type of child process mod_wsgi uses, you get at most 4 seconds grace, which is when that SIGKILL arrives after the repeated SIGTERM signals. You can’t block SIGKILL, although a process can become unresponsive if it was blocked in a kernel system call and doesn’t return.

http://stackoverflow.com/questions/8600430/cases-in-which-sigkill-will-not-work <http://stackoverflow.com/questions/8600430/cases-in-which-sigkill-will-not-work>

Whether other threads can still run in that scenario I don’t know. Either way, it would imply that some filesystem device I/O around a broken mount would be the cause, which means it is due to Amazon’s infrastructure having issues.
Post by Cristiano Coelho
-- wsgi.conf (partially)
LoadModule wsgi_module modules/mod_wsgi.so
WSGIPythonHome /opt/python/run/baselinenv
WSGISocketPrefix run/wsgi
WSGIRestrictEmbedded On
WSGIDaemonProcess wsgi processes=1 threads=10 display-name=%{GROUP} \
python-path=/opt/python/current/app:/opt/python/run/venv/lib64/python2.7/site-packages:/opt/python/run/venv/lib/python2.7/site-packages user=wsgi group=wsgi \
home=/opt/python/current/app
Should be using python-home to specify Python virtual environment and not using python-path to specify site-packages directory. Even then, with their setup of only a single daemon process group, they should be able to just use the WSGIPythonHome only at global scope.
Post by Cristiano Coelho
WSGIProcessGroup wsgi
They should also specify:

WSGIApplication Group %{GLOBAL}

if only supporting the one WSGI application in the daemon process group. This avoids issues with some third party Python packages with C extensions that will not work in Python sub interpreters.

If they weren’t on such old mod_wsgi version, there is also a whole bunch of timeout values for daemon mode they should set up to better ensure that people’s WSGI applications can recover from stuck processes and backlog.

Overall they have done little to set things up well and if they don’t provide a way of users changing it, a user wouldn’t be able to tune and optimise the configuration and could well be wasting resources and money.
Post by Cristiano Coelho
</VirtualHost>
-- Restart logs
[Fri Dec 30 18:26:46.825763 2016] [core:warn] [pid 24265:tid 140339875915840] AH00045: child process 24396 still did not exit, sending a SIGTERM
[Fri Dec 30 18:26:48.827998 2016] [core:warn] [pid 24265:tid 140339875915840] AH00045: child process 24396 still did not exit, sending a SIGTERM
[Fri Dec 30 18:26:50.830264 2016] [core:warn] [pid 24265:tid 140339875915840] AH00045: child process 24396 still did not exit, sending a SIGTERM
[Fri Dec 30 18:26:52.832466 2016] [core:error] [pid 24265:tid 140339875915840] AH00046: child process 24396 still did not exit, sending a SIGKILL
[Fri Dec 30 18:26:54.539770 2016] [suexec:notice] [pid 12669:tid 140513528571968] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Fri Dec 30 18:26:54.550651 2016] [so:warn] [pid 12669:tid 140513528571968] AH01574: module expires_module is already loaded, skipping
[Fri Dec 30 18:26:54.550700 2016] [so:warn] [pid 12669:tid 140513528571968] AH01574: module deflate_module is already loaded, skipping
[Fri Dec 30 18:26:54.550791 2016] [so:warn] [pid 12669:tid 140513528571968] AH01574: module wsgi_module is already loaded, skipping
These warnings about already been loaded shows their configuration must be broken in other ways as well.
Post by Cristiano Coelho
[Fri Dec 30 18:26:54.552750 2016] [auth_digest:notice] [pid 12669:tid 140513528571968] AH01757: generating secret for digest authentication ...
[Fri Dec 30 18:26:54.553328 2016] [lbmethod_heartbeat:notice] [pid 12669:tid 140513528571968] AH02282: No slotmem from mod_heartmonitor
[Fri Dec 30 18:26:54.553663 2016] [:warn] [pid 12669:tid 140513528571968] mod_wsgi: Compiled for Python/2.7.9.
[Fri Dec 30 18:26:54.553671 2016] [:warn] [pid 12669:tid 140513528571968] mod_wsgi: Runtime using Python/2.7.10.
[Fri Dec 30 18:26:54.554100 2016] [core:warn] [pid 12669:tid 140513528571968] AH00098: pid file /var/run/httpd/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Fri Dec 30 18:26:54.555343 2016] [mpm_event:notice] [pid 12669:tid 140513528571968] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10 configured -- resuming normal operations
The version of mod_wsgi you are using is over 50 versions behind the latest and is a version which was merely a patch release of a version from over 5 years ago. I can only suggest you upgrade to the latest mod_wsgi version as it is not supported unless you can manage to force your operating system vendor to support it.
There is a known orphaned processes issue with mod_wsgi, but is only known as being a problem with certain versions of Apache 2.2 and has never been seen with Apache 2.4. It also never occurred for an Apache restart, only for internal daemon process restarts, albeit it was still the result of some bug in Apache which seemed to have been resolved around Apache 2.2.18.
I can only speculate that you aren’t using daemon mode, or not always, and requests are running in, or leaking into Apache child worker processes. On a graceful restart, Apache can let worker processes linger so they handle keep alive connections, so if your code was running in embedded mode processes, Apache may well not be shutting them down straight away. Apache could then be loosing track of them as from memory there are cases it will give up on processes when a graceful restart occurs.
WSGIRestrictEmbedded On
so that use of embedded mode of mod_wsgi is prohibited and you will get errors if a WSGI application request is wrongly delegated to a Apache worker process. This will highlight that you have some issue with your mod_wsgi configuration for delegating requests to daemon mode processes.
Graham
Sorry for bringing up sucn an ancient post but this is the closest thing similar to my issue I have found.
With apache 2.4 and mod_wsgi 3.5 and python 2.7, I am having a similar issue, exactly on apache reboots.
Not all the time, but some times, the wsgi processes would stay alive after an apache restart, and need to be manually killed with sudo kill pid. The worst part of this is that the process keeps running, this is known because the process which is serving a django app, starts some background threads with the app that perform some tasks periodically, and when this issue happens those tasks start to stack up since duplicated logs appear when only 1 server and 1 process is supposed to be running.
The apache process is restarted through amazon AWS elastic beanstalk, which is a managed service, but the logs shows that a SIGTERM is attempted and after 3 failures a SIGKILL is sent, yet the process stays alive and doing tasks.
Note that all background tasks are either daemon threads or ThreadPool instances from the multiprocessing library.
Graham,
Haven't found any evidence of apache crashing, the whole setup has
been running very successfully for the last two years. I usually use
force-reload when changes are made to virtual hosts.
The memory has definitely been increasing due to the orphaned
processes, especially when I get 2 or 3 processes per application
orphaned however this takes a few weeks to occur and using the
mod_wsgi inactivity timeout helps as these processes appear to drop
down to minimal memory consumption.
I have upgraded to v3 of mod_wsgi so will monitor for a few weeks and
report back if I can't resolve. Thanks again for your assistance.
Paddy
Graham,
Thank you for such a detailed response. As a first step I will update
mod_wsgi to a more recent version!
But can you confirm you are using daemon mode and what the
WSGIDaemonProcess configuration is?
WSGIDaemonProcess designcliq user=django group=django threads=25
display-name=%{GROUP} inactivity-timeout=3600
WSGIProcessGroup designcliq
I usually have to kill them individually to get rid of them and free
up the memory.
Technically you can't kill defunct processes, they are actually
already dead, so not sure what you are doing.
Late night reboot.
Here is a more detailed example of what I am trying to get my head
around.
The following command shows some django applications twice, for
example (wsgi:designcliq) appears twice under parent id's 10436 and
19648 (top of output).
django 19686 19648 0 19:29 ? 00:00:00 (wsgi:designcliq) -k
start
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 443 19648 0 20:43 ? 00:00:00 (wsgi:erinaheight -k
start
django 476 19648 0 20:43 ? 00:00:00 (wsgi:simplystyli -k
start
django 593 19648 0 20:44 ? 00:00:00 (wsgi:gilliantenn -k
start
django 3719 19648 0 21:00 ? 00:00:00 (wsgi:pipair) -k
start
django 5548 19648 0 21:10 ? 00:00:00 (wsgi:keyboardkid -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
django 11371 19648 0 21:42 ? 00:00:00 (wsgi:classicinte -k
start
paddy 13613 4428 0 21:55 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19651 19648 0 19:29 ? 00:00:00 (wsgi:hostingssl) -k
start
django 19700 19648 0 19:29 ? 00:00:00 (wsgi:doorssincer -k
start
django 19769 19648 0 19:29 ? 00:00:00 (wsgi:fasttraku) -k
start
django 19853 19648 0 19:29 ? 00:00:00 (wsgi:mariatennan -k
start
django 19913 19648 0 19:29 ? 00:00:00 (wsgi:talkoftheto -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
django 30964 19648 0 20:33 ? 00:00:00 (wsgi:funkparty) -k
start
If I then stop apache and run the same command some applications still
show up running under parent 10436 even though apache has been
stopped.
* Stopping web server apache2
django 14118 10436 0 Jul23 ? 00:00:00 (wsgi:designcliq) -k
start
django 6779 10436 0 Jul23 ? 00:00:00 (wsgi:funkparty) -k
start
paddy 14014 4428 0 21:57 pts/0 00:00:00 grep -i wsgi
django 16246 10436 0 Jul24 ? 00:00:00 (wsgi:fasttraku) -k
start
django 18161 10436 0 Jul24 ? 00:00:00 (wsgi:hostingssl) -k
start
django 23082 10436 0 Jul24 ? 00:00:00 (wsgi:mariatennan -k
start
Any ideas?
Have you seen any evidence that Apache itself is crashing?
Alternatively, have you been doing anything like attaching debuggers
direct to Apache?
Events like that can sometimes leave processes around, as can other things.
The operating system generally has a job to go around and cleanup
zombie processes that haven't been reclaimed and which may be orphaned
in some way.
As I pointed out, zombie processes don't actually consume memory and
it is just an entry in the process table. Thus, unless you are seeing
issues such as growing system wide memory usage as a result, or of
Apache no longer serving requests, then I wouldn't be overly
concerned.
BTW, when you do Apache restarts, are you doing a 'restart' or a
'graceful restart'. A graceful restart could possibly result in
processing hanging around as in that case Apache doesn't forcibly kill
them off and so if they don't shutdown promptly themselves, and for
some reason Apache didn't clean them up properly when they do exit,
they could remain in the process table.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2016-12-31 04:07:08 UTC
Permalink
Hello,

So the configuration might not be ideal, but these small config tweaks
shoun't be really the source of the issue right? It's a bit odd since I
haven't had issues with similar deploys. This project I guess uses more
libraries C extensions (lxml and postgres) and the background threads/pools
perform a lot of IO (email sending, message queue polling, and some
others), although they are all under daemon threads and should finish under
the 4 seconds grace time.

Would setting WSGIApplication Group %{GLOBAL} still allow me to use more
than 1 process on the daemon configuration? Although I don't think it will
do any change at all since the web servers only listen on port 80 and are
on the same domain so all requests should always be falling into the same
application group if I interpreted the docs correctly.

This issue is so random and since only happens on cloud deploys it gets
really difficult to test if a change helped or not and it can take days to
notice it. I guess I will keep playing around with settings and try to
gather more info of the stuck processes when it happens.

Thank you so much for all the help!
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2016-12-31 04:27:38 UTC
Permalink
Post by Cristiano Coelho
Hello,
So the configuration might not be ideal, but these small config tweaks shoun't be really the source of the issue right? It's a bit odd since I haven't had issues with similar deploys. This project I guess uses more libraries C extensions (lxml and postgres) and the background threads/pools perform a lot of IO (email sending, message queue polling, and some others), although they are all under daemon threads and should finish under the 4 seconds grace time.
If you are using lxml then you definitely need to use:

WSGIApplicationGroup %{GLOBAL}

as from memory it is one of the libraries which is known to have issues when used in Python sub interpreters. The problem will be if a callback function is registered which lxml calls when parsing XML. Because it doesn’t deal with thread locking properly when using a sub interpreter, it can deadlock its own thread. Other threads can still run, but if other request threads do the same, you can eventually exhaust all the request threads and the process hangs. Background threads you create separately could still run though. Although even if this occurs, it shouldn’t stop an Apache restart from killing the process.
Post by Cristiano Coelho
Would setting WSGIApplication Group %{GLOBAL} still allow me to use more than 1 process on the daemon configuration? Although I don't think it will do any change at all since the web servers only listen on port 80 and are on the same domain so all requests should always be falling into the same application group if I interpreted the docs correctly.
Application group is the Python interpreter context within each respective process. The value %{GLOBAL} just means the main or first interpreter context of the process. This is the same as if you had run command line Python and behaves the same. Any additional interpreter contexts created in a process are what are referred to as sub interpreter contexts. By default mod_wsgi uses a separate sub interpreter context in each process for each WSGI application delegated to run in the same set of processes.

So there is no restriction on setting ‘processes’ option of WSGIDaemonProcess to be more than one at the same time as setting WSGIApplicationGroup to %{GLOBAL}.
Post by Cristiano Coelho
This issue is so random and since only happens on cloud deploys it gets really difficult to test if a change helped or not and it can take days to notice it. I guess I will keep playing around with settings and try to gather more info of the stuck processes when it happens.
Which sounds even more like issue with sub interpreters. If the bit of code which triggers the deadlock is infrequent, then the loss of request threads could be slow. This is where newer mod_wsgi versions at least have various timeout options for causing daemon process restarts when requests timeout or block.

At the least, add:

WSGIApplicationGroup %{GLOBAL}

and see how it goes.

Graham
Post by Cristiano Coelho
Thank you so much for all the help!
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2016-12-31 21:56:47 UTC
Permalink
Tried it and no luck, same issue, seems to happen more often on the
production machine which is also behind a load balancer, got the process
stuck there on the first try with the config change. Also, the process
doesn't seem be in zombie status neither, looks completely like a normal
process (and it probably is since background threads stay running) but
isn't receiving requests. I can't really understand how can a process stay
alive like this and stay running normally even after a few sigterm and
sigkill signals!

What's odd is that the process id that is "stuck" is not really the one
that was attempted to be killed, but I'm not familiar with mod_wsgi/apache
internals so that's probably fine, below is both the two stuck processes
from top and the logs.


PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ RUSER
COMMAND
5948 wsgi 20 0 1203m 88m 11m S 0.3 4.4 0:03.40 wsgi httpd
12673 wsgi 20 0 1210m 146m 11m S 0.3 7.3 11:26.47 wsgi
httpd ---> this one should not be here.


[Sat Dec 31 21:38:43.097424 2016] [core:warn] [pid 12669:tid
140513528571968] AH00045: child process 13723 still did not exit, sending a
SIGTERM
[Sat Dec 31 21:38:45.099655 2016] [core:warn] [pid 12669:tid
140513528571968] AH00045: child process 13723 still did not exit, sending a
SIGTERM
[Sat Dec 31 21:38:47.101924 2016] [core:warn] [pid 12669:tid
140513528571968] AH00045: child process 13723 still did not exit, sending a
SIGTERM
[Sat Dec 31 21:38:49.104142 2016] [core:error] [pid 12669:tid
140513528571968] AH00046: child process 13723 still did not exit, sending a
SIGKILL
[Sat Dec 31 21:38:50.812271 2016] [suexec:notice] [pid 5944:tid
140156604848192] AH01232: suEXEC mechanism enabled (wrapper:
/usr/sbin/suexec)
[Sat Dec 31 21:38:50.825993 2016] [auth_digest:notice] [pid 5944:tid
140156604848192] AH01757: generating secret for digest authentication ...
[Sat Dec 31 21:38:50.826665 2016] [lbmethod_heartbeat:notice] [pid 5944:tid
140156604848192] AH02282: No slotmem from mod_heartmonitor
[Sat Dec 31 21:38:50.827032 2016] [:warn] [pid 5944:tid 140156604848192]
mod_wsgi: Compiled for Python/2.7.9.
[Sat Dec 31 21:38:50.827041 2016] [:warn] [pid 5944:tid 140156604848192]
mod_wsgi: Runtime using Python/2.7.10.
[Sat Dec 31 21:38:50.827503 2016] [core:warn] [pid 5944:tid
140156604848192] AH00098: pid file /var/run/httpd/httpd.pid overwritten --
Unclean shutdown of previous Apache run?
[Sat Dec 31 21:38:50.828766 2016] [mpm_event:notice] [pid 5944:tid
140156604848192] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10
configured -- resuming normal operations
[Sat Dec 31 21:38:50.828782 2016] [core:notice] [pid 5944:tid
140156604848192] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'


El sábado, 31 de diciembre de 2016, 1:27:46 (UTC-3), Graham Dumpleton
Post by Cristiano Coelho
Hello,
So the configuration might not be ideal, but these small config tweaks
shoun't be really the source of the issue right? It's a bit odd since I
haven't had issues with similar deploys. This project I guess uses more
libraries C extensions (lxml and postgres) and the background threads/pools
perform a lot of IO (email sending, message queue polling, and some
others), although they are all under daemon threads and should finish under
the 4 seconds grace time.
WSGIApplicationGroup %{GLOBAL}
as from memory it is one of the libraries which is known to have issues
when used in Python sub interpreters. The problem will be if a callback
function is registered which lxml calls when parsing XML. Because it
doesn’t deal with thread locking properly when using a sub interpreter, it
can deadlock its own thread. Other threads can still run, but if other
request threads do the same, you can eventually exhaust all the request
threads and the process hangs. Background threads you create separately
could still run though. Although even if this occurs, it shouldn’t stop an
Apache restart from killing the process.
Would setting WSGIApplication Group %{GLOBAL} still allow me to use more
than 1 process on the daemon configuration? Although I don't think it will
do any change at all since the web servers only listen on port 80 and are
on the same domain so all requests should always be falling into the same
application group if I interpreted the docs correctly.
Application group is the Python interpreter context within each respective
process. The value %{GLOBAL} just means the main or first interpreter
context of the process. This is the same as if you had run command line
Python and behaves the same. Any additional interpreter contexts created in
a process are what are referred to as sub interpreter contexts. By default
mod_wsgi uses a separate sub interpreter context in each process for each
WSGI application delegated to run in the same set of processes.
So there is no restriction on setting ‘processes’ option of
WSGIDaemonProcess to be more than one at the same time as setting
WSGIApplicationGroup to %{GLOBAL}.
This issue is so random and since only happens on cloud deploys it gets
really difficult to test if a change helped or not and it can take days to
notice it. I guess I will keep playing around with settings and try to
gather more info of the stuck processes when it happens.
Which sounds even more like issue with sub interpreters. If the bit of
code which triggers the deadlock is infrequent, then the loss of request
threads could be slow. This is where newer mod_wsgi versions at least have
various timeout options for causing daemon process restarts when requests
timeout or block.
WSGIApplicationGroup %{GLOBAL}
and see how it goes.
Graham
Thank you so much for all the help!
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an
.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2016-12-31 22:11:56 UTC
Permalink
Use ‘ps auxwww’ instead of top to look at processes. Because display-name option is used with WSGIDaemonProcess, the mod_wsgi daemon processes should be named differently and so you can tell them apart from Apache httpd worker processes and master process.

Also hunt around in the ‘ps’ command options and run it such that it shows the ‘STATUS’ field as well so can see what state process is truly in.

Also update the Apache httpd configuration so that LogLevel is set to ‘info’ instead of ‘warn’. That will cause mod_wsgi to output logging about when daemon processes are being restarted and why.

BTW, how much ability do you have to customise the generated Apache configuration file. With ability to pip install mod_wsgi now, it shouldn’t be that hard to substitute in a newer mod_wsgi version.

Graham
Tried it and no luck, same issue, seems to happen more often on the production machine which is also behind a load balancer, got the process stuck there on the first try with the config change. Also, the process doesn't seem be in zombie status neither, looks completely like a normal process (and it probably is since background threads stay running) but isn't receiving requests. I can't really understand how can a process stay alive like this and stay running normally even after a few sigterm and sigkill signals!
What's odd is that the process id that is "stuck" is not really the one that was attempted to be killed, but I'm not familiar with mod_wsgi/apache internals so that's probably fine, below is both the two stuck processes from top and the logs.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ RUSER COMMAND
5948 wsgi 20 0 1203m 88m 11m S 0.3 4.4 0:03.40 wsgi httpd
12673 wsgi 20 0 1210m 146m 11m S 0.3 7.3 11:26.47 wsgi httpd ---> this one should not be here.
[Sat Dec 31 21:38:43.097424 2016] [core:warn] [pid 12669:tid 140513528571968] AH00045: child process 13723 still did not exit, sending a SIGTERM
[Sat Dec 31 21:38:45.099655 2016] [core:warn] [pid 12669:tid 140513528571968] AH00045: child process 13723 still did not exit, sending a SIGTERM
[Sat Dec 31 21:38:47.101924 2016] [core:warn] [pid 12669:tid 140513528571968] AH00045: child process 13723 still did not exit, sending a SIGTERM
[Sat Dec 31 21:38:49.104142 2016] [core:error] [pid 12669:tid 140513528571968] AH00046: child process 13723 still did not exit, sending a SIGKILL
[Sat Dec 31 21:38:50.812271 2016] [suexec:notice] [pid 5944:tid 140156604848192] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Sat Dec 31 21:38:50.825993 2016] [auth_digest:notice] [pid 5944:tid 140156604848192] AH01757: generating secret for digest authentication ...
[Sat Dec 31 21:38:50.826665 2016] [lbmethod_heartbeat:notice] [pid 5944:tid 140156604848192] AH02282: No slotmem from mod_heartmonitor
[Sat Dec 31 21:38:50.827032 2016] [:warn] [pid 5944:tid 140156604848192] mod_wsgi: Compiled for Python/2.7.9.
[Sat Dec 31 21:38:50.827041 2016] [:warn] [pid 5944:tid 140156604848192] mod_wsgi: Runtime using Python/2.7.10.
[Sat Dec 31 21:38:50.827503 2016] [core:warn] [pid 5944:tid 140156604848192] AH00098: pid file /var/run/httpd/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Sat Dec 31 21:38:50.828766 2016] [mpm_event:notice] [pid 5944:tid 140156604848192] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10 configured -- resuming normal operations
[Sat Dec 31 21:38:50.828782 2016] [core:notice] [pid 5944:tid 140156604848192] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
Post by Cristiano Coelho
Hello,
So the configuration might not be ideal, but these small config tweaks shoun't be really the source of the issue right? It's a bit odd since I haven't had issues with similar deploys. This project I guess uses more libraries C extensions (lxml and postgres) and the background threads/pools perform a lot of IO (email sending, message queue polling, and some others), although they are all under daemon threads and should finish under the 4 seconds grace time.
WSGIApplicationGroup %{GLOBAL}
as from memory it is one of the libraries which is known to have issues when used in Python sub interpreters. The problem will be if a callback function is registered which lxml calls when parsing XML. Because it doesn’t deal with thread locking properly when using a sub interpreter, it can deadlock its own thread. Other threads can still run, but if other request threads do the same, you can eventually exhaust all the request threads and the process hangs. Background threads you create separately could still run though. Although even if this occurs, it shouldn’t stop an Apache restart from killing the process.
Post by Cristiano Coelho
Would setting WSGIApplication Group %{GLOBAL} still allow me to use more than 1 process on the daemon configuration? Although I don't think it will do any change at all since the web servers only listen on port 80 and are on the same domain so all requests should always be falling into the same application group if I interpreted the docs correctly.
Application group is the Python interpreter context within each respective process. The value %{GLOBAL} just means the main or first interpreter context of the process. This is the same as if you had run command line Python and behaves the same. Any additional interpreter contexts created in a process are what are referred to as sub interpreter contexts. By default mod_wsgi uses a separate sub interpreter context in each process for each WSGI application delegated to run in the same set of processes.
So there is no restriction on setting ‘processes’ option of WSGIDaemonProcess to be more than one at the same time as setting WSGIApplicationGroup to %{GLOBAL}.
Post by Cristiano Coelho
This issue is so random and since only happens on cloud deploys it gets really difficult to test if a change helped or not and it can take days to notice it. I guess I will keep playing around with settings and try to gather more info of the stuck processes when it happens.
Which sounds even more like issue with sub interpreters. If the bit of code which triggers the deadlock is infrequent, then the loss of request threads could be slow. This is where newer mod_wsgi versions at least have various timeout options for causing daemon process restarts when requests timeout or block.
WSGIApplicationGroup %{GLOBAL}
and see how it goes.
Graham
Post by Cristiano Coelho
Thank you so much for all the help!
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2016-12-31 22:48:32 UTC
Permalink
With ps auxwww I'm not really sure what to look at, I can see the process
that should have died together with another one spawned on the same date,
and also the new ones spawned today from the test:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 5944 0.0 0.4 224520 9096 ? S 21:38 0:00
/usr/sbin/httpd -DFOREGROUND
root 5946 0.0 0.1 27304 2372 ? S 21:38 0:00
/usr/sbin/rotatelogs /var/log/httpd/healthd/application.log.%Y-%m-%d-%H 3600
apache 5947 0.0 0.3 224524 7436 ? S 21:38 0:00
/usr/sbin/httpd -DFOREGROUND
wsgi 5948 7.9 4.8 1233468 98820 ? Sl 21:38 3:57
(wsgi:wsgi) -DFOREGROUND
apache 5949 0.0 0.4 1111720 9548 ? Sl 21:38 0:00
/usr/sbin/httpd -DFOREGROUND
apache 5950 0.0 0.4 980432 8484 ? Sl 21:38 0:00
/usr/sbin/httpd -DFOREGROUND
apache 5951 0.0 0.3 980432 7672 ? Sl 21:38 0:00
/usr/sbin/httpd -DFOREGROUND
apache 6075 0.0 0.4 980608 8692 ? Sl 21:38 0:00
/usr/sbin/httpd -DFOREGROUND
ec2-user 6938 0.0 0.1 117204 2464 pts/0 R+ 22:28 0:00 ps auxwww
wsgi 12673 0.6 7.3 1239612 149972 ? Sl Dec30 11:39
(wsgi:wsgi) -DFOREGROUND
root 12873 0.0 0.0 0 0 ? S Dec30 0:00
[kworker/u30:2]

You can see the process seems to be in the exact same status as the one
that's fine so perhaps it was never attempted to be killed at all and only
the one in the logs was killed with sigkill but it didn't kill the actual
wsgi one.

About customizing apache, I can easily add new configuration files through
adding new files on the deployment commands, this also means overwrite
files, and perhaps delete them as well. This is how I add additional wsgi
settings, and other apache settings like caching, gzip etc.
I can certainly include mod_wsgi in the requirements.txt file so it is
installed through PIP, but that would probably cause issues with the
already mod_wsgi modules that comes installed into the apache modules
folder.
The machine includes all modules inside /usr/lib64/httpd/modules and simply
adds a link from /etc/httpd where all the conf files live.


El sábado, 31 de diciembre de 2016, 19:12:03 (UTC-3), Graham Dumpleton
Post by Graham Dumpleton
Use ‘ps auxwww’ instead of top to look at processes. Because display-name
option is used with WSGIDaemonProcess, the mod_wsgi daemon processes should
be named differently and so you can tell them apart from Apache httpd
worker processes and master process.
Also hunt around in the ‘ps’ command options and run it such that it shows
the ‘STATUS’ field as well so can see what state process is truly in.
Also update the Apache httpd configuration so that LogLevel is set to
‘info’ instead of ‘warn’. That will cause mod_wsgi to output logging about
when daemon processes are being restarted and why.
BTW, how much ability do you have to customise the generated Apache
configuration file. With ability to pip install mod_wsgi now, it shouldn’t
be that hard to substitute in a newer mod_wsgi version.
Graham
Tried it and no luck, same issue, seems to happen more often on the
production machine which is also behind a load balancer, got the process
stuck there on the first try with the config change. Also, the process
doesn't seem be in zombie status neither, looks completely like a normal
process (and it probably is since background threads stay running) but
isn't receiving requests. I can't really understand how can a process stay
alive like this and stay running normally even after a few sigterm and
sigkill signals!
What's odd is that the process id that is "stuck" is not really the one
that was attempted to be killed, but I'm not familiar with mod_wsgi/apache
internals so that's probably fine, below is both the two stuck processes
from top and the logs.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ RUSER
COMMAND
5948 wsgi 20 0 1203m 88m 11m S 0.3 4.4 0:03.40 wsgi httpd
12673 wsgi 20 0 1210m 146m 11m S 0.3 7.3 11:26.47 wsgi
httpd ---> this one should not be here.
[Sat Dec 31 21:38:43.097424 2016] [core:warn] [pid 12669:tid
140513528571968] AH00045: child process 13723 still did not exit, sending a
SIGTERM
[Sat Dec 31 21:38:45.099655 2016] [core:warn] [pid 12669:tid
140513528571968] AH00045: child process 13723 still did not exit, sending a
SIGTERM
[Sat Dec 31 21:38:47.101924 2016] [core:warn] [pid 12669:tid
140513528571968] AH00045: child process 13723 still did not exit, sending a
SIGTERM
[Sat Dec 31 21:38:49.104142 2016] [core:error] [pid 12669:tid
140513528571968] AH00046: child process 13723 still did not exit, sending a
SIGKILL
[Sat Dec 31 21:38:50.812271 2016] [suexec:notice] [pid 5944:tid
/usr/sbin/suexec)
[Sat Dec 31 21:38:50.825993 2016] [auth_digest:notice] [pid 5944:tid
140156604848192] AH01757: generating secret for digest authentication ...
[Sat Dec 31 21:38:50.826665 2016] [lbmethod_heartbeat:notice] [pid
5944:tid 140156604848192] AH02282: No slotmem from mod_heartmonitor
[Sat Dec 31 21:38:50.827032 2016] [:warn] [pid 5944:tid 140156604848192]
mod_wsgi: Compiled for Python/2.7.9.
[Sat Dec 31 21:38:50.827041 2016] [:warn] [pid 5944:tid 140156604848192]
mod_wsgi: Runtime using Python/2.7.10.
[Sat Dec 31 21:38:50.827503 2016] [core:warn] [pid 5944:tid
140156604848192] AH00098: pid file /var/run/httpd/httpd.pid overwritten --
Unclean shutdown of previous Apache run?
[Sat Dec 31 21:38:50.828766 2016] [mpm_event:notice] [pid 5944:tid
140156604848192] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10
configured -- resuming normal operations
[Sat Dec 31 21:38:50.828782 2016] [core:notice] [pid 5944:tid
140156604848192] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
Post by Cristiano Coelho
Hello,
So the configuration might not be ideal, but these small config tweaks
shoun't be really the source of the issue right? It's a bit odd since I
haven't had issues with similar deploys. This project I guess uses more
libraries C extensions (lxml and postgres) and the background threads/pools
perform a lot of IO (email sending, message queue polling, and some
others), although they are all under daemon threads and should finish under
the 4 seconds grace time.
WSGIApplicationGroup %{GLOBAL}
as from memory it is one of the libraries which is known to have issues
when used in Python sub interpreters. The problem will be if a callback
function is registered which lxml calls when parsing XML. Because it
doesn’t deal with thread locking properly when using a sub interpreter, it
can deadlock its own thread. Other threads can still run, but if other
request threads do the same, you can eventually exhaust all the request
threads and the process hangs. Background threads you create separately
could still run though. Although even if this occurs, it shouldn’t stop an
Apache restart from killing the process.
Would setting WSGIApplication Group %{GLOBAL} still allow me to use more
than 1 process on the daemon configuration? Although I don't think it will
do any change at all since the web servers only listen on port 80 and are
on the same domain so all requests should always be falling into the same
application group if I interpreted the docs correctly.
Application group is the Python interpreter context within each
respective process. The value %{GLOBAL} just means the main or first
interpreter context of the process. This is the same as if you had run
command line Python and behaves the same. Any additional interpreter
contexts created in a process are what are referred to as sub interpreter
contexts. By default mod_wsgi uses a separate sub interpreter context in
each process for each WSGI application delegated to run in the same set of
processes.
So there is no restriction on setting ‘processes’ option of
WSGIDaemonProcess to be more than one at the same time as setting
WSGIApplicationGroup to %{GLOBAL}.
This issue is so random and since only happens on cloud deploys it gets
really difficult to test if a change helped or not and it can take days to
notice it. I guess I will keep playing around with settings and try to
gather more info of the stuck processes when it happens.
Which sounds even more like issue with sub interpreters. If the bit of
code which triggers the deadlock is infrequent, then the loss of request
threads could be slow. This is where newer mod_wsgi versions at least have
various timeout options for causing daemon process restarts when requests
timeout or block.
WSGIApplicationGroup %{GLOBAL}
and see how it goes.
Graham
Thank you so much for all the help!
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an
.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2016-12-31 23:22:14 UTC
Permalink
Post by Cristiano Coelho
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 5944 0.0 0.4 224520 9096 ? S 21:38 0:00 /usr/sbin/httpd -DFOREGROUND
root 5946 0.0 0.1 27304 2372 ? S 21:38 0:00 /usr/sbin/rotatelogs /var/log/httpd/healthd/application.log.%Y-%m-%d-%H 3600
apache 5947 0.0 0.3 224524 7436 ? S 21:38 0:00 /usr/sbin/httpd -DFOREGROUND
wsgi 5948 7.9 4.8 1233468 98820 ? Sl 21:38 3:57 (wsgi:wsgi) -DFOREGROUND
apache 5949 0.0 0.4 1111720 9548 ? Sl 21:38 0:00 /usr/sbin/httpd -DFOREGROUND
apache 5950 0.0 0.4 980432 8484 ? Sl 21:38 0:00 /usr/sbin/httpd -DFOREGROUND
apache 5951 0.0 0.3 980432 7672 ? Sl 21:38 0:00 /usr/sbin/httpd -DFOREGROUND
apache 6075 0.0 0.4 980608 8692 ? Sl 21:38 0:00 /usr/sbin/httpd -DFOREGROUND
ec2-user 6938 0.0 0.1 117204 2464 pts/0 R+ 22:28 0:00 ps auxwww
wsgi 12673 0.6 7.3 1239612 149972 ? Sl Dec30 11:39 (wsgi:wsgi) -DFOREGROUND
root 12873 0.0 0.0 0 0 ? S Dec30 0:00 [kworker/u30:2]
You can see the process seems to be in the exact same status as the one that's fine so perhaps it was never attempted to be killed at all and only the one in the logs was killed with sigkill but it didn't kill the actual wsgi one.
About customizing apache, I can easily add new configuration files through adding new files on the deployment commands, this also means overwrite files, and perhaps delete them as well. This is how I add additional wsgi settings, and other apache settings like caching, gzip etc.
I can certainly include mod_wsgi in the requirements.txt file so it is installed through PIP, but that would probably cause issues with the already mod_wsgi modules that comes installed into the apache modules folder.
The machine includes all modules inside /usr/lib64/httpd/modules and simply adds a link from /etc/httpd where all the conf files live.
The mod_wsgi module install using pip is not installed into the Apache modules directory, but the Python virtual environment.

We then need to find a way to suppress:

LoadModule wsgi_module modules/mod_wsgi.so

so that isn’t used, and instead use what is output from running:

mod_wsgi-express module-config

A bigger problem is whether mod_wsgi can be installed using pip. If they do not include the httpd-dev package on the system, it will not be possible to compile any additional Apache modules.

Does the program ‘apxs’ or ‘apxs2’ exist anywhere on the system? Usually it would be in /usr/sbin.

Graham
Post by Cristiano Coelho
Use ‘ps auxwww’ instead of top to look at processes. Because display-name option is used with WSGIDaemonProcess, the mod_wsgi daemon processes should be named differently and so you can tell them apart from Apache httpd worker processes and master process.
Also hunt around in the ‘ps’ command options and run it such that it shows the ‘STATUS’ field as well so can see what state process is truly in.
Also update the Apache httpd configuration so that LogLevel is set to ‘info’ instead of ‘warn’. That will cause mod_wsgi to output logging about when daemon processes are being restarted and why.
BTW, how much ability do you have to customise the generated Apache configuration file. With ability to pip install mod_wsgi now, it shouldn’t be that hard to substitute in a newer mod_wsgi version.
Graham
Tried it and no luck, same issue, seems to happen more often on the production machine which is also behind a load balancer, got the process stuck there on the first try with the config change. Also, the process doesn't seem be in zombie status neither, looks completely like a normal process (and it probably is since background threads stay running) but isn't receiving requests. I can't really understand how can a process stay alive like this and stay running normally even after a few sigterm and sigkill signals!
What's odd is that the process id that is "stuck" is not really the one that was attempted to be killed, but I'm not familiar with mod_wsgi/apache internals so that's probably fine, below is both the two stuck processes from top and the logs.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ RUSER COMMAND
5948 wsgi 20 0 1203m 88m 11m S 0.3 4.4 0:03.40 wsgi httpd
12673 wsgi 20 0 1210m 146m 11m S 0.3 7.3 11:26.47 wsgi httpd ---> this one should not be here.
[Sat Dec 31 21:38:43.097424 2016] [core:warn] [pid 12669:tid 140513528571968] AH00045: child process 13723 still did not exit, sending a SIGTERM
[Sat Dec 31 21:38:45.099655 2016] [core:warn] [pid 12669:tid 140513528571968] AH00045: child process 13723 still did not exit, sending a SIGTERM
[Sat Dec 31 21:38:47.101924 2016] [core:warn] [pid 12669:tid 140513528571968] AH00045: child process 13723 still did not exit, sending a SIGTERM
[Sat Dec 31 21:38:49.104142 2016] [core:error] [pid 12669:tid 140513528571968] AH00046: child process 13723 still did not exit, sending a SIGKILL
[Sat Dec 31 21:38:50.812271 2016] [suexec:notice] [pid 5944:tid 140156604848192] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Sat Dec 31 21:38:50.825993 2016] [auth_digest:notice] [pid 5944:tid 140156604848192] AH01757: generating secret for digest authentication ...
[Sat Dec 31 21:38:50.826665 2016] [lbmethod_heartbeat:notice] [pid 5944:tid 140156604848192] AH02282: No slotmem from mod_heartmonitor
[Sat Dec 31 21:38:50.827032 2016] [:warn] [pid 5944:tid 140156604848192] mod_wsgi: Compiled for Python/2.7.9. <http://2.7.9./>
[Sat Dec 31 21:38:50.827041 2016] [:warn] [pid 5944:tid 140156604848192] mod_wsgi: Runtime using Python/2.7.10. <http://2.7.10./>
[Sat Dec 31 21:38:50.827503 2016] [core:warn] [pid 5944:tid 140156604848192] AH00098: pid file /var/run/httpd/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Sat Dec 31 21:38:50.828766 2016] [mpm_event:notice] [pid 5944:tid 140156604848192] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10 configured -- resuming normal operations
[Sat Dec 31 21:38:50.828782 2016] [core:notice] [pid 5944:tid 140156604848192] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
Post by Cristiano Coelho
Hello,
So the configuration might not be ideal, but these small config tweaks shoun't be really the source of the issue right? It's a bit odd since I haven't had issues with similar deploys. This project I guess uses more libraries C extensions (lxml and postgres) and the background threads/pools perform a lot of IO (email sending, message queue polling, and some others), although they are all under daemon threads and should finish under the 4 seconds grace time.
WSGIApplicationGroup %{GLOBAL}
as from memory it is one of the libraries which is known to have issues when used in Python sub interpreters. The problem will be if a callback function is registered which lxml calls when parsing XML. Because it doesn’t deal with thread locking properly when using a sub interpreter, it can deadlock its own thread. Other threads can still run, but if other request threads do the same, you can eventually exhaust all the request threads and the process hangs. Background threads you create separately could still run though. Although even if this occurs, it shouldn’t stop an Apache restart from killing the process.
Post by Cristiano Coelho
Would setting WSGIApplication Group %{GLOBAL} still allow me to use more than 1 process on the daemon configuration? Although I don't think it will do any change at all since the web servers only listen on port 80 and are on the same domain so all requests should always be falling into the same application group if I interpreted the docs correctly.
Application group is the Python interpreter context within each respective process. The value %{GLOBAL} just means the main or first interpreter context of the process. This is the same as if you had run command line Python and behaves the same. Any additional interpreter contexts created in a process are what are referred to as sub interpreter contexts. By default mod_wsgi uses a separate sub interpreter context in each process for each WSGI application delegated to run in the same set of processes.
So there is no restriction on setting ‘processes’ option of WSGIDaemonProcess to be more than one at the same time as setting WSGIApplicationGroup to %{GLOBAL}.
Post by Cristiano Coelho
This issue is so random and since only happens on cloud deploys it gets really difficult to test if a change helped or not and it can take days to notice it. I guess I will keep playing around with settings and try to gather more info of the stuck processes when it happens.
Which sounds even more like issue with sub interpreters. If the bit of code which triggers the deadlock is infrequent, then the loss of request threads could be slow. This is where newer mod_wsgi versions at least have various timeout options for causing daemon process restarts when requests timeout or block.
WSGIApplicationGroup %{GLOBAL}
and see how it goes.
Graham
Post by Cristiano Coelho
Thank you so much for all the help!
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-01 00:04:48 UTC
Permalink
"LoadModule wsgi_module modules/mod_wsgi.so" is written in two files
created by amazon, conf.d/wsgi.conf and conf.modules.d/wsgi.conf (why it
is twice is beyond me) so I need to delete one and overwrite the other to
not include this line but define all the apache/mod_wsgi settings such as
virtual host and stuff which might be quite painful since that file is
created automatically by amazon and also uses deployment metadata for some
values/paths so it probably gets overwritten every time.

httpd-dev package can probably be installed if it can be installed through
"yum install xxx" since a few other packages are installed that way through
the deployment file (such as postgres devel)

apxs or apxs2 are not there, only apachectl.

So I guess it would really be too much effort to try to use a different
mod_wsgi version, and it's probably better to simply wait for amazon to
release a new version of the linux machine for that.

The logs doesn't ring any bell? *It almost seems that it is the apache
process that is actually stuck and is being shut down improperly, which is
causing mod_wsgi processes that rely on it to not be shut down, can that be
possible?* I'm really suspecting about apache mpm_event maybe having some
issues, considering this happens way more frequently on production with a
load balancer (its an aws load balancer which basically keeps long lived
connections to apache) compared to dev that rarely happens but browsers hit
apache directly, and other projects that use defult aws apache config (
mpm_prefork, ugh) doesn't seem to have this issue, but mpm_prefork is so
bad with its multiple processes per request eating a lot of memory that
probably means bigger machines to handle the same amount of traffic, reason
mpm_event is used.

I'm unware of python2.7 daemon threads, or multiprocessing.pool.ThreadPool
having issues with keeping processes alive.


El sábado, 31 de diciembre de 2016, 20:22:22 (UTC-3), Graham Dumpleton
Post by Graham Dumpleton
The mod_wsgi module install using pip is not installed into the Apache
modules directory, but the Python virtual environment.
LoadModule wsgi_module modules/mod_wsgi.so
mod_wsgi-express module-config
A bigger problem is whether mod_wsgi can be installed using pip. If they
do not include the httpd-dev package on the system, it will not be possible
to compile any additional Apache modules.
Does the program ‘apxs’ or ‘apxs2’ exist anywhere on the system? Usually
it would be in /usr/sbin.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2017-01-01 01:12:44 UTC
Permalink
"LoadModule wsgi_module modules/mod_wsgi.so" is written in two files created by amazon, conf.d/wsgi.conf and conf.modules.d/wsgi.conf (why it is twice is beyond me) so I need to delete one and overwrite the other to not include this line but define all the apache/mod_wsgi settings such as virtual host and stuff which might be quite painful since that file is created automatically by amazon and also uses deployment metadata for some values/paths so it probably gets overwritten every time.
httpd-dev package can probably be installed if it can be installed through "yum install xxx" since a few other packages are installed that way through the deployment file (such as postgres devel)
apxs or apxs2 are not there, only apachectl.
So I guess it would really be too much effort to try to use a different mod_wsgi version, and it's probably better to simply wait for amazon to release a new version of the linux machine for that.
From what I understand they do not update their machine images very often. They are also unlikely to upgrade to newer versions of anything, just patch security issues.
The logs doesn't ring any bell? It almost seems that it is the apache process that is actually stuck and is being shut down improperly, which is causing mod_wsgi processes that rely on it to not be shut down, can that be possible?
Shouldn’t be the case.
I'm really suspecting about apache mpm_event maybe having some issues, considering this happens way more frequently on production with a load balancer (its an aws load balancer which basically keeps long lived connections to apache) compared to dev that rarely happens but browsers hit apache directly, and other projects that use defult aws apache config ( mpm_prefork, ugh) doesn't seem to have this issue, but mpm_prefork is so bad with its multiple processes per request eating a lot of memory that probably means bigger machines to handle the same amount of traffic, reason mpm_event is used.
I'm unware of python2.7 daemon threads, or multiprocessing.pool.ThreadPool having issues with keeping processes alive.
Two things to look at.

1. Since you are creating background threads to do work, ensure that when you create them they are being marked as daemon threads. Eg:

self.monitor_thread = threading.Thread(target=self.monitor)
self.monitor_thread.setDaemon(True)
self.monitor_thread.start()

There is a fail safe in mod_wsgi to guard against problems caused by this not being done, in that if Python interpreter destruction hangs on process shutdown, the process should be forcibly killed off. Best still to ensure you are doing it.

2. If you are able to modify what options are supplied to WSGIDaemonProcess directive, add the option:

inactivity-timeout=60

If the process gets orphaned and is not accepting requests, thus is idle, this should result in process being forcibly shutdown after 60 seconds.

Just be aware that if the traffic of your web application is so light that it goes for periods of a minute when no traffic, the process will be restarted in that case as well. Generally this shouldn’t be an issue though, except for one special case in the old version of mod_wsgi you have. If you have long running requests and all requests threads block for 60 seconds with no data in or out, process will be restarted then as well.

Graham
The mod_wsgi module install using pip is not installed into the Apache modules directory, but the Python virtual environment.
LoadModule wsgi_module modules/mod_wsgi.so
mod_wsgi-express module-config
A bigger problem is whether mod_wsgi can be installed using pip. If they do not include the httpd-dev package on the system, it will not be possible to compile any additional Apache modules.
Does the program ‘apxs’ or ‘apxs2’ exist anywhere on the system? Usually it would be in /usr/sbin.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-01 17:17:31 UTC
Permalink
It seems I'm unable to modify any other param of the WSGIDaemonProcess
directive, at least not by normal means through their config file, maybe
through some hackish way like overwriting the whole wsgi file or attempting
to change that exact line through a linux command.
Does this have any default value? If this works it might be the best option
until the actual issue is actually found or fixed through an update.

All threads are started as dameon, and looking at the
multiprocessing.pool.ThreadPool code its internal threads are also started
as daemon so there's no way they are locking the process. This really is
something related to either apache or mod_wsgi since there's no way to
reproduce the issue locally, such as doing tests with threads and pools and
attempting to block the process.

I will set up the info logging level of apache and play around to see if I
can find something else.


El sábado, 31 de diciembre de 2016, 22:12:51 (UTC-3), Graham Dumpleton
Post by Cristiano Coelho
"LoadModule wsgi_module modules/mod_wsgi.so" is written in two files
created by amazon, conf.d/wsgi.conf and conf.modules.d/wsgi.conf (why it
is twice is beyond me) so I need to delete one and overwrite the other to
not include this line but define all the apache/mod_wsgi settings such as
virtual host and stuff which might be quite painful since that file is
created automatically by amazon and also uses deployment metadata for some
values/paths so it probably gets overwritten every time.
httpd-dev package can probably be installed if it can be installed through
"yum install xxx" since a few other packages are installed that way through
the deployment file (such as postgres devel)
apxs or apxs2 are not there, only apachectl.
So I guess it would really be too much effort to try to use a different
mod_wsgi version, and it's probably better to simply wait for amazon to
release a new version of the linux machine for that.
From what I understand they do not update their machine images very often.
They are also unlikely to upgrade to newer versions of anything, just patch
security issues.
The logs doesn't ring any bell? *It almost seems that it is the apache
process that is actually stuck and is being shut down improperly, which is
causing mod_wsgi processes that rely on it to not be shut down, can that be
possible?*
Shouldn’t be the case.
I'm really suspecting about apache mpm_event maybe having some issues,
considering this happens way more frequently on production with a load
balancer (its an aws load balancer which basically keeps long lived
connections to apache) compared to dev that rarely happens but browsers hit
apache directly, and other projects that use defult aws apache config (
mpm_prefork, ugh) doesn't seem to have this issue, but mpm_prefork is so
bad with its multiple processes per request eating a lot of memory that
probably means bigger machines to handle the same amount of traffic, reason
mpm_event is used.
I'm unware of python2.7 daemon threads, or multiprocessing.pool.ThreadPool
having issues with keeping processes alive.
Two things to look at.
1. Since you are creating background threads to do work, ensure that when
self.monitor_thread = threading.Thread(target=self.monitor)
self.monitor_thread.setDaemon(True)
self.monitor_thread.start()
There is a fail safe in mod_wsgi to guard against problems caused by this
not being done, in that if Python interpreter destruction hangs on process
shutdown, the process should be forcibly killed off. Best still to ensure
you are doing it.
2. If you are able to modify what options are supplied to
inactivity-timeout=60
If the process gets orphaned and is not accepting requests, thus is idle,
this should result in process being forcibly shutdown after 60 seconds.
Just be aware that if the traffic of your web application is so light that
it goes for periods of a minute when no traffic, the process will be
restarted in that case as well. Generally this shouldn’t be an issue
though, except for one special case in the old version of mod_wsgi you
have. If you have long running requests and all requests threads block for
60 seconds with no data in or out, process will be restarted then as well.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2017-01-01 20:27:59 UTC
Permalink
It seems I'm unable to modify any other param of the WSGIDaemonProcess directive, at least not by normal means through their config file, maybe through some hackish way like overwriting the whole wsgi file or attempting to change that exact line through a linux command.
Do they at least provide a way of inserting extra configuration inside of the generated VirtualHost definition?

If you can’t replace the existing WSGIDaemonProcess definition, then you could create a new WSGIDaemonProcess definition with different name and configuration and then set WSGIProcessGroup to override existing delegation to use it instead. It means there will be a redundant process there, but the WSGI application will never be loaded in it, so memory used would be minimal.
Does this have any default value?
No. Off by default.
If this works it might be the best option until the actual issue is actually found or fixed through an update.
As I have said, Amazon is not going to do anything to actively fix this or provide an update.

The only way you would get an update is when they shift their Ubuntu version to next LTS Ubuntu release. That may not help though as Debian/Ubuntu is getting so bad with providing an up to date mod_wsgi version there is a chance they still may provide the same old version of mod_wsgi depending on which LTS Ubuntu release they use. Do you know what LTS release they base on now?

Since you appear to have some level of root access to install new system packages, the only other option would be to install https-dev. Compile mod_wsgi from source code. Uninstall system mod_wsgi package and then install mod_wsgi module you compiled yourself such that newer mod_wsgi.so is in Apache modules directory at the same location as what there generate Apache configuration looks for it.

Even then, if this is due to the Apache version, that is still not going to help.

BTW, what Apache and mod_wsgi versions does your development environment use?

Graham
All threads are started as dameon, and looking at the multiprocessing.pool.ThreadPool code its internal threads are also started as daemon so there's no way they are locking the process. This really is something related to either apache or mod_wsgi since there's no way to reproduce the issue locally, such as doing tests with threads and pools and attempting to block the process.
I will set up the info logging level of apache and play around to see if I can find something else.
"LoadModule wsgi_module modules/mod_wsgi.so" is written in two files created by amazon, conf.d/wsgi.conf and conf.modules.d/wsgi.conf (why it is twice is beyond me) so I need to delete one and overwrite the other to not include this line but define all the apache/mod_wsgi settings such as virtual host and stuff which might be quite painful since that file is created automatically by amazon and also uses deployment metadata for some values/paths so it probably gets overwritten every time.
httpd-dev package can probably be installed if it can be installed through "yum install xxx" since a few other packages are installed that way through the deployment file (such as postgres devel)
apxs or apxs2 are not there, only apachectl.
So I guess it would really be too much effort to try to use a different mod_wsgi version, and it's probably better to simply wait for amazon to release a new version of the linux machine for that.
From what I understand they do not update their machine images very often. They are also unlikely to upgrade to newer versions of anything, just patch security issues.
The logs doesn't ring any bell? It almost seems that it is the apache process that is actually stuck and is being shut down improperly, which is causing mod_wsgi processes that rely on it to not be shut down, can that be possible?
Shouldn’t be the case.
I'm really suspecting about apache mpm_event maybe having some issues, considering this happens way more frequently on production with a load balancer (its an aws load balancer which basically keeps long lived connections to apache) compared to dev that rarely happens but browsers hit apache directly, and other projects that use defult aws apache config ( mpm_prefork, ugh) doesn't seem to have this issue, but mpm_prefork is so bad with its multiple processes per request eating a lot of memory that probably means bigger machines to handle the same amount of traffic, reason mpm_event is used.
I'm unware of python2.7 daemon threads, or multiprocessing.pool.ThreadPool having issues with keeping processes alive.
Two things to look at.
self.monitor_thread = threading.Thread(target=self.monitor)
self.monitor_thread.setDaemon(True)
self.monitor_thread.start()
There is a fail safe in mod_wsgi to guard against problems caused by this not being done, in that if Python interpreter destruction hangs on process shutdown, the process should be forcibly killed off. Best still to ensure you are doing it.
inactivity-timeout=60
If the process gets orphaned and is not accepting requests, thus is idle, this should result in process being forcibly shutdown after 60 seconds.
Just be aware that if the traffic of your web application is so light that it goes for periods of a minute when no traffic, the process will be restarted in that case as well. Generally this shouldn’t be an issue though, except for one special case in the old version of mod_wsgi you have. If you have long running requests and all requests threads block for 60 seconds with no data in or out, process will be restarted then as well.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2017-01-01 20:52:47 UTC
Permalink
Post by Graham Dumpleton
It seems I'm unable to modify any other param of the WSGIDaemonProcess directive, at least not by normal means through their config file, maybe through some hackish way like overwriting the whole wsgi file or attempting to change that exact line through a linux command.
Do they at least provide a way of inserting extra configuration inside of the generated VirtualHost definition?
If you can’t replace the existing WSGIDaemonProcess definition, then you could create a new WSGIDaemonProcess definition with different name and configuration and then set WSGIProcessGroup to override existing delegation to use it instead. It means there will be a redundant process there, but the WSGI application will never be loaded in it, so memory used would be minimal.
There is possibly another easier way to trick it into using your own VirtualHost definition which then allows you to take complete control as to how to set up the site, although ordering is important.

How does it include the conf file including the VirtualHost definition for your WSGI application?

I recollect you saying something like it includes all conf files in a certain directory. Thus something like:

Include sites/*.conf

Can you find what what it uses to include that file, and also show me the complete conf file it generates again.

Graham
Post by Graham Dumpleton
Does this have any default value?
No. Off by default.
If this works it might be the best option until the actual issue is actually found or fixed through an update.
As I have said, Amazon is not going to do anything to actively fix this or provide an update.
The only way you would get an update is when they shift their Ubuntu version to next LTS Ubuntu release. That may not help though as Debian/Ubuntu is getting so bad with providing an up to date mod_wsgi version there is a chance they still may provide the same old version of mod_wsgi depending on which LTS Ubuntu release they use. Do you know what LTS release they base on now?
Since you appear to have some level of root access to install new system packages, the only other option would be to install https-dev. Compile mod_wsgi from source code. Uninstall system mod_wsgi package and then install mod_wsgi module you compiled yourself such that newer mod_wsgi.so is in Apache modules directory at the same location as what there generate Apache configuration looks for it.
Even then, if this is due to the Apache version, that is still not going to help.
BTW, what Apache and mod_wsgi versions does your development environment use?
Graham
All threads are started as dameon, and looking at the multiprocessing.pool.ThreadPool code its internal threads are also started as daemon so there's no way they are locking the process. This really is something related to either apache or mod_wsgi since there's no way to reproduce the issue locally, such as doing tests with threads and pools and attempting to block the process.
I will set up the info logging level of apache and play around to see if I can find something else.
"LoadModule wsgi_module modules/mod_wsgi.so" is written in two files created by amazon, conf.d/wsgi.conf and conf.modules.d/wsgi.conf (why it is twice is beyond me) so I need to delete one and overwrite the other to not include this line but define all the apache/mod_wsgi settings such as virtual host and stuff which might be quite painful since that file is created automatically by amazon and also uses deployment metadata for some values/paths so it probably gets overwritten every time.
httpd-dev package can probably be installed if it can be installed through "yum install xxx" since a few other packages are installed that way through the deployment file (such as postgres devel)
apxs or apxs2 are not there, only apachectl.
So I guess it would really be too much effort to try to use a different mod_wsgi version, and it's probably better to simply wait for amazon to release a new version of the linux machine for that.
From what I understand they do not update their machine images very often. They are also unlikely to upgrade to newer versions of anything, just patch security issues.
The logs doesn't ring any bell? It almost seems that it is the apache process that is actually stuck and is being shut down improperly, which is causing mod_wsgi processes that rely on it to not be shut down, can that be possible?
Shouldn’t be the case.
I'm really suspecting about apache mpm_event maybe having some issues, considering this happens way more frequently on production with a load balancer (its an aws load balancer which basically keeps long lived connections to apache) compared to dev that rarely happens but browsers hit apache directly, and other projects that use defult aws apache config ( mpm_prefork, ugh) doesn't seem to have this issue, but mpm_prefork is so bad with its multiple processes per request eating a lot of memory that probably means bigger machines to handle the same amount of traffic, reason mpm_event is used.
I'm unware of python2.7 daemon threads, or multiprocessing.pool.ThreadPool having issues with keeping processes alive.
Two things to look at.
self.monitor_thread = threading.Thread(target=self.monitor)
self.monitor_thread.setDaemon(True)
self.monitor_thread.start()
There is a fail safe in mod_wsgi to guard against problems caused by this not being done, in that if Python interpreter destruction hangs on process shutdown, the process should be forcibly killed off. Best still to ensure you are doing it.
inactivity-timeout=60
If the process gets orphaned and is not accepting requests, thus is idle, this should result in process being forcibly shutdown after 60 seconds.
Just be aware that if the traffic of your web application is so light that it goes for periods of a minute when no traffic, the process will be restarted in that case as well. Generally this shouldn’t be an issue though, except for one special case in the old version of mod_wsgi you have. If you have long running requests and all requests threads block for 60 seconds with no data in or out, process will be restarted then as well.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-01 21:24:45 UTC
Permalink
Post by Graham Dumpleton
Do they at least provide a way of inserting extra configuration inside of
the generated VirtualHost definition?
If you can’t replace the existing WSGIDaemonProcess definition, then you
could create a new WSGIDaemonProcess definition with different name and
configuration and then set WSGIProcessGroup to override existing delegation
to use it instead. It means there will be a redundant process there, but
the WSGI application will never be loaded in it, so memory used would be
minimal.
The only way to insert additional configuration to the VirtualHost
definition is to overwrite the file completely, basically sending the same
file with different content into the exact same directy, this is how I
managed to change from mpm_prefork to mpm_event, although it is another
file that the only content is the mpm configuration so it was quite simply
to overwrite. For the wsgi file, it is a bit more difficult since it
contains a lot of configuration settings where some of them are populated
automatically from the provided amazon config file and the auto deploy
process


Since you appear to have some level of root access to install new system
Post by Graham Dumpleton
packages, the only other option would be to install https-dev. Compile
mod_wsgi from source code. Uninstall system mod_wsgi package and then
install mod_wsgi module you compiled yourself such that newer mod_wsgi.so
is in Apache modules directory at the same location as what there generate
Apache configuration looks for it.
Even then, if this is due to the Apache version, that is still not going to help.
BTW, what Apache and mod_wsgi versions does your development environment use?
Development and production use the exact environment, the only different is
that production is deployed behind a load balancer, but they are the exact
same machine setup through amazon elastic beanstalk deploy, so all versions
match, of pretty much everything.
I believe it is possible to install any additional package as long as it
can be installed through 'yum install xxx' and it is available given
ubuntu/aws repository, I don't know exactly where the packages from when
installed that way but I have had cases where some packages weren't
available. For next step, uninstall mod_wsgi and re compile a new version,
I don't think that can be easily done.
Basically you are given a config file that can contain a list of sequential
commands executed from the root directory of your app, plus a few kew/value
settings that are added automatically to the wsgi.conf file (like the
WSGIDaemonProcess process and thread values)


There is possibly another easier way to trick it into using your own
Post by Graham Dumpleton
VirtualHost definition which then allows you to take complete control as to
how to set up the site, although ordering is important.
How does it include the conf file including the VirtualHost definition for
your WSGI application?
I recollect you saying something like it includes all conf files in a
Include sites/*.conf
Can you find what what it uses to include that file, and also show me the
complete conf file it generates again.
I believe what you say about using my own VirtualHost definition would be
the same as above, simply send my own version of the wsgi.conf file (which
amazon uses to set the virtual host and all wsgi settings) overwriting the
existing one but I'm not 100% sure it can work since I don't know how
amazon generates that file from my conf file, the order can be that it is
created after all my commands have run, so my attempt to overwrite it might
fail.

The amazon httpd.conf file indeed uses:
Include conf.modules.d/*.conf and
IncludeOptional conf.d/*.conf
Post by Graham Dumpleton
LoadModule wsgi_module modules/mod_wsgi.so
WSGIPythonHome /opt/python/run/baselinenv
WSGISocketPrefix run/wsgi
WSGIRestrictEmbedded On
<VirtualHost *:80>
Alias /static/ /opt/python/current/app/static/
<Directory /opt/python/current/app/static/>
Order allow,deny
Allow from all
</Directory>
WSGIScriptAlias / /opt/python/current/app/central/wsgi.py
<Directory /opt/python/current/app/>
Require all granted
</Directory>
WSGIDaemonProcess wsgi processes=1 threads=10 display-name=%{GROUP} \
python-path=/opt/python/current/app:/opt/python/run/venv/lib64/python2.7/site-packages:/opt/python/run/venv/lib/python2.7/site-packages
user=wsgi group=wsgi \
home=/opt/python/current/app
WSGIProcessGroup wsgi
</VirtualHost>
LogFormat "%h (%{X-Forwarded-For}i) %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\"" combined
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2017-01-01 21:31:43 UTC
Permalink
Post by Graham Dumpleton
Do they at least provide a way of inserting extra configuration inside of the generated VirtualHost definition?
If you can’t replace the existing WSGIDaemonProcess definition, then you could create a new WSGIDaemonProcess definition with different name and configuration and then set WSGIProcessGroup to override existing delegation to use it instead. It means there will be a redundant process there, but the WSGI application will never be loaded in it, so memory used would be minimal.
The only way to insert additional configuration to the VirtualHost definition is to overwrite the file completely, basically sending the same file with different content into the exact same directy, this is how I managed to change from mpm_prefork to mpm_event, although it is another file that the only content is the mpm configuration so it was quite simply to overwrite. For the wsgi file, it is a bit more difficult since it contains a lot of configuration settings where some of them are populated automatically from the provided amazon config file and the auto deploy process
Since you appear to have some level of root access to install new system packages, the only other option would be to install https-dev. Compile mod_wsgi from source code. Uninstall system mod_wsgi package and then install mod_wsgi module you compiled yourself such that newer mod_wsgi.so is in Apache modules directory at the same location as what there generate Apache configuration looks for it.
Even then, if this is due to the Apache version, that is still not going to help.
BTW, what Apache and mod_wsgi versions does your development environment use?
Development and production use the exact environment, the only different is that production is deployed behind a load balancer, but they are the exact same machine setup through amazon elastic beanstalk deploy, so all versions match, of pretty much everything.
I believe it is possible to install any additional package as long as it can be installed through 'yum install xxx' and it is available given ubuntu/aws repository, I don't know exactly where the packages from when installed that way but I have had cases where some packages weren't available. For next step, uninstall mod_wsgi and re compile a new version, I don't think that can be easily done.
Basically you are given a config file that can contain a list of sequential commands executed from the root directory of your app, plus a few kew/value settings that are added automatically to the wsgi.conf file (like the WSGIDaemonProcess process and thread values)
There is possibly another easier way to trick it into using your own VirtualHost definition which then allows you to take complete control as to how to set up the site, although ordering is important.
How does it include the conf file including the VirtualHost definition for your WSGI application?
Include sites/*.conf
Can you find what what it uses to include that file, and also show me the complete conf file it generates again.
I believe what you say about using my own VirtualHost definition would be the same as above, simply send my own version of the wsgi.conf file (which amazon uses to set the virtual host and all wsgi settings) overwriting the existing one but I'm not 100% sure it can work since I don't know how amazon generates that file from my conf file, the order can be that it is created after all my commands have run, so my attempt to overwrite it might fail.
Nope. You wouldn’t be overriding the wsgi.conf file, but adding a new conf file in same directory. Apache will load files when using Include with wildcard in alphabetical order. You can therefore name the file such that is ordered before wsgi.conf. It will load that first. You can then rely on fact that Apache will use the first VirtualHost it finds when name based virtual hosts aren’t actually being used, which is the case here as the generated VirtualHost lacks a ServerName directive. You can therefore provide your own separate VirtualHost set up how you need it.
Post by Graham Dumpleton
Include conf.modules.d/*.conf and
IncludeOptional conf.d/*.conf
Which of those two directories is wsgi.conf in?

What else is in those two directories?

Graham
Post by Graham Dumpleton
LoadModule wsgi_module modules/mod_wsgi.so
WSGIPythonHome /opt/python/run/baselinenv
WSGISocketPrefix run/wsgi
WSGIRestrictEmbedded On
<VirtualHost *:80>
Alias /static/ /opt/python/current/app/static/
<Directory /opt/python/current/app/static/>
Order allow,deny
Allow from all
</Directory>
WSGIScriptAlias / /opt/python/current/app/central/wsgi.py
<Directory /opt/python/current/app/>
Require all granted
</Directory>
WSGIDaemonProcess wsgi processes=1 threads=10 display-name=%{GROUP} \
python-path=/opt/python/current/app:/opt/python/run/venv/lib64/python2.7/site-packages:/opt/python/run/venv/lib/python2.7/site-packages user=wsgi group=wsgi \
home=/opt/python/current/app
WSGIProcessGroup wsgi
</VirtualHost>
LogFormat "%h (%{X-Forwarded-For}i) %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-01 21:58:18 UTC
Permalink
Post by Graham Dumpleton
Nope. You wouldn’t be overriding the wsgi.conf file, but adding a new conf
file in same directory. Apache will load files when using Include with
wildcard in alphabetical order. You can therefore name the file such that
is ordered before wsgi.conf. It will load that first. You can then rely on
fact that Apache will use the first VirtualHost it finds when name based
virtual hosts aren’t actually being used, which is the case here as the
generated VirtualHost lacks a ServerName directive. You can therefore
provide your own separate VirtualHost set up how you need it.
This is a good advice, it would certainly allow me to add the timeout
setting and any other setting I could need for the WSGIDaemonProcess
directive. The only issue is that I need to match the exact settings and
file paths the amazon file uses, and if they change it for some reason
everything would stop working from one day to another, but I guess they
shouldn't change it at all unless the machine version is updated.
I will keep this idea as a last resource if I'm not able to find out the
exact cause of the process not being killed. I still have a few things to
test before giving up on finding the exact cause.


Which of those two directories is wsgi.conf in?
Post by Graham Dumpleton
What else is in those two directories?
wsgi.conf is at conf.d, at least the one with the virtual host and all
wsgi.conf setup
There's also another wsgi.conf at conf.modules.d but all it does is load
the wsgi module.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2017-01-01 22:01:37 UTC
Permalink
Post by Graham Dumpleton
Nope. You wouldn’t be overriding the wsgi.conf file, but adding a new conf file in same directory. Apache will load files when using Include with wildcard in alphabetical order. You can therefore name the file such that is ordered before wsgi.conf. It will load that first. You can then rely on fact that Apache will use the first VirtualHost it finds when name based virtual hosts aren’t actually being used, which is the case here as the generated VirtualHost lacks a ServerName directive. You can therefore provide your own separate VirtualHost set up how you need it.
This is a good advice, it would certainly allow me to add the timeout setting and any other setting I could need for the WSGIDaemonProcess directive. The only issue is that I need to match the exact settings and file paths the amazon file uses, and if they change it for some reason everything would stop working from one day to another, but I guess they shouldn't change it at all unless the machine version is updated.
I will keep this idea as a last resource if I'm not able to find out the exact cause of the process not being killed. I still have a few things to test before giving up on finding the exact cause.
For these process which you believe hang around, if you send them a SIGTERM or SIGHUP signal rather than doing a full Apache restart, do they go away? Best if can get LogLevel set at info when you test that as then mod_wsgi will log about what it is doing.
Post by Graham Dumpleton
Which of those two directories is wsgi.conf in?
What else is in those two directories?
wsgi.conf is at conf.d, at least the one with the virtual host and all wsgi.conf setup
There's also another wsgi.conf at conf.modules.d but all it does is load the wsgi module.
The one in conf.modules.d is added when you install the system mod_wsgi package.

Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-01 22:37:58 UTC
Permalink
I'm not really the one sending the apache restart signals but rather the
amazon environment. Even worse, I can't really restart apache manually
because it won't run the additional amazon commands, which causes a lot of
issues like not setting some environment variables.

I will give it a shot with info logging to see what's going on.
Post by Graham Dumpleton
Nope. You wouldn’t be overriding the wsgi.conf file, but adding a new conf
Post by Graham Dumpleton
file in same directory. Apache will load files when using Include with
wildcard in alphabetical order. You can therefore name the file such that
is ordered before wsgi.conf. It will load that first. You can then rely on
fact that Apache will use the first VirtualHost it finds when name based
virtual hosts aren’t actually being used, which is the case here as the
generated VirtualHost lacks a ServerName directive. You can therefore
provide your own separate VirtualHost set up how you need it.
This is a good advice, it would certainly allow me to add the timeout
setting and any other setting I could need for the WSGIDaemonProcess
directive. The only issue is that I need to match the exact settings and
file paths the amazon file uses, and if they change it for some reason
everything would stop working from one day to another, but I guess they
shouldn't change it at all unless the machine version is updated.
I will keep this idea as a last resource if I'm not able to find out the
exact cause of the process not being killed. I still have a few things to
test before giving up on finding the exact cause.
For these process which you believe hang around, if you send them a
SIGTERM or SIGHUP signal rather than doing a full Apache restart, do they
go away? Best if can get LogLevel set at info when you test that as then
mod_wsgi will log about what it is doing.
Which of those two directories is wsgi.conf in?
Post by Graham Dumpleton
What else is in those two directories?
wsgi.conf is at conf.d, at least the one with the virtual host and all wsgi.conf setup
There's also another wsgi.conf at conf.modules.d but all it does is load the wsgi module.
The one in conf.modules.d is added when you install the system mod_wsgi package.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Graham Dumpleton
2017-01-01 22:55:11 UTC
Permalink
I'm not really the one sending the apache restart signals but rather the amazon environment. Even worse, I can't really restart apache manually because it won't run the additional amazon commands, which causes a lot of issues like not setting some environment variables.
I will give it a shot with info logging to see what's going on.
What user account do you have access to the environment as?

The daemon processes runs as ‘wsgi’ user. You would need to be either ‘root’ or the ‘wsgi’ user to send the specific daemon process a signal. Not wanting to send Apache itself a signal, just the one problem process. If it stops when sent a SIGINT or SIGTERM, then it would indicate issue is more likely the Apache version or MPM being used.

Graham
Post by Graham Dumpleton
Nope. You wouldn’t be overriding the wsgi.conf file, but adding a new conf file in same directory. Apache will load files when using Include with wildcard in alphabetical order. You can therefore name the file such that is ordered before wsgi.conf. It will load that first. You can then rely on fact that Apache will use the first VirtualHost it finds when name based virtual hosts aren’t actually being used, which is the case here as the generated VirtualHost lacks a ServerName directive. You can therefore provide your own separate VirtualHost set up how you need it.
This is a good advice, it would certainly allow me to add the timeout setting and any other setting I could need for the WSGIDaemonProcess directive. The only issue is that I need to match the exact settings and file paths the amazon file uses, and if they change it for some reason everything would stop working from one day to another, but I guess they shouldn't change it at all unless the machine version is updated.
I will keep this idea as a last resource if I'm not able to find out the exact cause of the process not being killed. I still have a few things to test before giving up on finding the exact cause.
For these process which you believe hang around, if you send them a SIGTERM or SIGHUP signal rather than doing a full Apache restart, do they go away? Best if can get LogLevel set at info when you test that as then mod_wsgi will log about what it is doing.
Post by Graham Dumpleton
Which of those two directories is wsgi.conf in?
What else is in those two directories?
wsgi.conf is at conf.d, at least the one with the virtual host and all wsgi.conf setup
There's also another wsgi.conf at conf.modules.d but all it does is load the wsgi module.
The one in conf.modules.d is added when you install the system mod_wsgi package.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
Visit this group at https://groups.google.com/group/modwsgi <https://groups.google.com/group/modwsgi>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-01 23:18:10 UTC
Permalink
I have root access. I did the test, the stuck process died fine with a
SIGTERM signal.

From this I went back to the old mpm_prefork setup that amazon uses, rather
than mpm_event, I can't seem to make the issue happen, not sure if really
lucky or it is really that. It's a shame if I can't use mpm_event. The
issue also happened with mpm_worker. I will need to give it some time of
re-deploys to see if the issue still happens.

If this is the case, it really seems apache is failing to kill the wsgi
process when using mpm_event or mpm_prefork, perhaps something related to
the ancient version of mod_wsgi being used.
Post by Cristiano Coelho
I'm not really the one sending the apache restart signals but rather the
amazon environment. Even worse, I can't really restart apache manually
because it won't run the additional amazon commands, which causes a lot of
issues like not setting some environment variables.
I will give it a shot with info logging to see what's going on.
What user account do you have access to the environment as?
The daemon processes runs as ‘wsgi’ user. You would need to be either
‘root’ or the ‘wsgi’ user to send the specific daemon process a signal. Not
wanting to send Apache itself a signal, just the one problem process. If it
stops when sent a SIGINT or SIGTERM, then it would indicate issue is more
likely the Apache version or MPM being used.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Cristiano Coelho
2017-01-02 21:37:18 UTC
Permalink
Well, over a day and no issues, it really seems it is related to mpm_event
or mpm_worker, since mpm_prefork doesn't cause the process to be stuck.
Guess I will need to try again later with even once a new update is
released which will probably take a lot.

Thanks for all help! I hope this can save someone else's time if using
amazon.
Post by Cristiano Coelho
I have root access. I did the test, the stuck process died fine with a
SIGTERM signal.
From this I went back to the old mpm_prefork setup that amazon uses,
rather than mpm_event, I can't seem to make the issue happen, not sure if
really lucky or it is really that. It's a shame if I can't use mpm_event.
The issue also happened with mpm_worker. I will need to give it some time
of re-deploys to see if the issue still happens.
If this is the case, it really seems apache is failing to kill the wsgi
process when using mpm_event or mpm_prefork, perhaps something related to
the ancient version of mod_wsgi being used.
Post by Cristiano Coelho
I'm not really the one sending the apache restart signals but rather the
amazon environment. Even worse, I can't really restart apache manually
because it won't run the additional amazon commands, which causes a lot of
issues like not setting some environment variables.
I will give it a shot with info logging to see what's going on.
What user account do you have access to the environment as?
The daemon processes runs as ‘wsgi’ user. You would need to be either
‘root’ or the ‘wsgi’ user to send the specific daemon process a signal. Not
wanting to send Apache itself a signal, just the one problem process. If it
stops when sent a SIGINT or SIGTERM, then it would indicate issue is more
likely the Apache version or MPM being used.
Graham
--
You received this message because you are subscribed to the Google Groups "modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modwsgi+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.
Loading...