![]() |
![]() |
Subject: | Zenoss Core 4.2.5 su: cannot set user id: Resource temporarily unavailable |
Author: | Ken Jenkins |
Posted: | 2015-07-14 10:28 |
Team,
I am looking for some help. I am running Zenoss Core 4.2.5 on CentOS 6.6.
In the past two weeks, Zenoss ran into an issue where the portal was inaccessible and I could not login using the zenoss login. My first attempt to resolve was to reboot the server but this did not last long. Within a day, the issue came back.
I need to know what I need to do to isolate and resolve the Resource temporarily unavailable which is isolated to the zenoss login ID.
Below are troubleshooting steps run.
From another ID and using root:
sudo su - zenoss
su: cannot set user id: Resource temporarily unavailable
sudo lsof | grep zenoss | wc -l
3940
ps -U zenoss | wc -l
26
ps -U zenoss auxww | grep java
zenoss 14639 0.4 3.3 6764924 547660 Sl Jul13 4:57 java -server -XX:+HeapDumpOnOutOfMemoryError -DZENOSS_COMMAND=zeneventserver -DZENHOME=/opt/zenoss -Djetty.home=/opt/zenoss -Djetty.logs=/opt/zenoss/log -Dlogback.configurationFile=/opt/zenoss/etc/zeneventserver/logback.xml -DZENOSS_DAEMON=y -jar /opt/zenoss/lib/jetty-start-7.5.3.v20111011.jar --config=/opt/zenoss/etc/zeneventserver/jetty/start.config --ini=/opt/zenoss/etc/zeneventserver/jetty/jetty.ini --pre=etc/zeneventserver/jetty/jetty-logging.xml
zenoss 15296 0.0 0.3 2197956 50088 Sl Jul13 0:59 java -server -Xmx512m -cp ./*:/opt/zenoss/zenjmx-libs/*: com.zenoss.zenpacks.zenjmx.ZenJmxMain --configfile /opt/zenoss/etc/zenjmx.conf -zenjmxjavaport 9988 --configfile /opt/zenoss/etc/zenjmx.conf -v 20
rabbitmqctl -p /zenoss list_queues; rabbitmqctl list_connections
Listing queues ...
celery 0
zenoss.queues.zep.migrated.summary 0
zenoss.queues.zep.migrated.archive 0
zenoss.queues.zep.rawevents 0
zenoss.queues.zep.heartbeats 0
zenoss.queues.zep.zenevents 0
zenoss.queues.zep.signal 0
zenoss.queues.zep.modelchange 0
monitoring02.infra.viasatcloud.com.celeryd.pidbox 0
...done.
Listing connections ...
zenoss ::1 42873 running
zenoss 127.0.0.1 57425 running
zenoss ::1 42874 running
zenoss 127.0.0.1 57415 running
zenoss ::1 42860 running
zenoss ::1 42837 running
zenoss ::1 42857 running
zenoss ::1 42847 running
...done.
source /home/zenoss/.bash_profile
zenoss status
su: cannot set user id: Resource temporarily unavailable
zeneventserver stop
stopping...
su - zenoss (I am now able to login using the zenoss login)
[zenoss@~]$ zenup status
Product: zenoss-core-4.2.5 (id = zenoss-core-4.2.5)
Home: /opt/zenoss
Revision: 203
Upgrading: None
Minimum: 203
Updated On: Wed Apr 29 03:28:53 2015
[zenoss@monitoring02 ~]$ zenoss status
Daemon: zeneventserver not running *** (I stopped the zeneventserver) ***
Daemon: zopectl program running; pid=14728
Daemon: zenrrdcached program running; pid=14733
Daemon: zenhub program running; pid=14785
Daemon: zenjobs program running; pid=14827
Daemon: zeneventd program running; pid=14891
Daemon: zenping program running; pid=14949
Daemon: zensyslog program running; pid=15057
Daemon: zenstatus program running; pid=15038
Daemon: zenactiond program running; pid=15078
Daemon: zentrap program running; pid=15170
Daemon: zenmodeler program running; pid=15154
Daemon: zenperfsnmp program running; pid=15192
Daemon: zencommand program running; pid=15222
Daemon: zenprocess program running; pid=15253
Daemon: zredis program running; pid=15256
Daemon: zenjmx program running; pid=15291
Daemon: zenpython program running; pid=15366
After a few minutes, as zenoss, I run:
[zenoss@~]$ free
-bash: fork: retry: Resource temporarily unavailable
[root@~]# lsof | grep zenoss | wc -l
3724
[root@~]# ps -U zenoss | wc -l
25
Looks like there are a few zenevent processes still running ...
ps -ef | grep zenevent
zenoss 14891 1 0 Jul13 00:00:10 /opt/zenoss/bin/python /opt/zenoss/Products/ZenEvents/zeneventd.py --configfile /opt/zenoss/etc/zeneventd.conf --cycle --daemon
zenoss 14904 14891 0 Jul13 00:00:53 /opt/zenoss/bin/python /opt/zenoss/Products/ZenEvents/zeneventd.py --configfile /opt/zenoss/etc/zeneventd.conf --cycle --duallog
zenoss 14905 14891 0 Jul13 00:00:52 /opt/zenoss/bin/python /opt/zenoss/Products/ZenEvents/zeneventd.py --configfile /opt/zenoss/etc/zeneventd.conf --cycle --duallog
root 16761 10996 0 15:12 pts/0 00:00:00 grep zenevent
I killed the above and as root ...
[root@~]# zenoss status
su: cannot set user id: Resource temporarily unavailable
[root@~]# zeneventd stop
stopping...
already stopped
[root@~]# zenoss status
su: cannot set user id: Resource temporarily unavailable
[root@~]# zensyslog stop
stopping...
[root@~]# zenoss status
Daemon: zeneventserver Java >= 1.6 is required.
Daemon: zopectl program running; pid=14728
Daemon: zenrrdcached program running; pid=14733
Daemon: zenhub program running; pid=14785
Daemon: zenjobs program running; pid=14827
Daemon: zeneventd not running
Daemon: zenping program running; pid=14949
Daemon: zensyslog not running
Daemon: zenstatus program running; pid=15038
Daemon: zenactiond program running; pid=15078
Daemon: zentrap program running; pid=15170
Daemon: zenmodeler program running; pid=15154
Daemon: zenperfsnmp program running; pid=15192
Daemon: zencommand program running; pid=15222
Daemon: zenprocess program running; pid=15253
Daemon: zredis program running; pid=15256
Daemon: zenjmx program running; pid=15291
Daemon: zenpython program running; pid=15366
[root@~]# zeneventserver start
starting...
Waiting for zeneventserver to start.......
[root@~]# zeneventd start
starting...
[zenoss@ zenoss]$ ps -U zenoss
-bash: fork: retry: Resource temporarily unavailable
As root:
zopectl stop
Logged into zenoss and ran:
zenoss stop
zenoss start to recover.
Subject: | I forgot to mention that the |
Author: | Ken Jenkins |
Posted: | 2015-07-14 11:04 |
I forgot to mention that the /etc/security/limits.conf file is configured with soft and hard limits.
zenoss soft nofile 4096
zenoss hard nofile 10240
Subject: | Issue is back .. help |
Author: | Ken Jenkins |
Posted: | 2015-07-14 15:16 |
zenoss status
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
Subject: | I suspect java threads may be |
Author: | Ken Jenkins |
Posted: | 2015-07-14 16:59 |
I suspect java threads may be an issue but I need help to discern this ...
After a zenoss restart, I see ...
[zenoss@ 01-Reports]$ ps auxww | grep 9561; ps uH 9561 | wc -l; ps auxww | grep 8850; ps uH 8850 | wc -l;
zenoss 9561 0.1 0.2 2197956 40360 Sl 21:48 0:00 java -server -Xmx512m -cp ./*:/opt/zenoss/zenjmx-libs/*: com.zenoss.zenpacks.zenjmx.ZenJmxMain --configfile /opt/zenoss/etc/zenjmx.conf -zenjmxjavaport 9988 --configfile /opt/zenoss/etc/zenjmx.conf -v 20
zenoss 10071 0.0 0.0 103244 880 pts/0 S+ 21:57 0:00 grep 9561
23
zenoss 8850 7.5 4.3 6770160 704876 pts/0 Sl 21:47 0:46 java -server -XX:+HeapDumpOnOutOfMemoryError -DZENOSS_COMMAND=zeneventserver -DZENHOME=/opt/zenoss -Djetty.home=/opt/zenoss -Djetty.logs=/opt/zenoss/log -Dlogback.configurationFile=/opt/zenoss/etc/zeneventserver/logback.xml -DZENOSS_DAEMON=y -jar /opt/zenoss/lib/jetty-start-7.5.3.v20111011.jar --config=/opt/zenoss/etc/zeneventserver/jetty/start.config --ini=/opt/zenoss/etc/zeneventserver/jetty/jetty.ini --pre=etc/zeneventserver/jetty/jetty-logging.xml
zenoss 10075 0.0 0.0 103244 872 pts/0 R+ 21:57 0:00 grep 8850
55
What is the HeapDumpOnOutOfMemoryError java process used for
Thanks,
Ken
Subject: | > zenoss soft nofile 4096 |
Author: | Jan Garaj |
Posted: | 2015-07-14 17:39 |
> zenoss soft nofile 4096
> zenoss hard nofile 10240
Your limits can be overridden. Check real zenoss limits:
sudo su - zenoss
ulimit -Hn
ulimit -Sn
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | ulimits are good |
Author: | Ken Jenkins |
Posted: | 2015-07-14 18:13 |
Thanks .. I verified and I was set to unlimited.
$ ulimit -Hn
10240
$ ulimit -Sn
10240
I noticed that there were over 6000 events from syslog. I cleared those events out.
Can you think of other ways to isolate this resource issue
Thanks Jan!
Subject: | I am still experiencing |
Author: | Ken Jenkins |
Posted: | 2015-07-15 10:10 |
I am still experiencing resource issues. Help is appreciated.
Subject: | Checking the zredis.log, I |
Author: | Ken Jenkins |
Posted: | 2015-07-15 11:47 |
Checking the zredis.log, I found ... NOTE that I just updated the vm.overcommit_memory setting.
[30773] 28 Apr 20:30:07 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.con
f and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
in zenhub.log ...
2015-07-05 06:35:43,097 INFO zen.ZenHub: Starting new zenhubworker
2015-07-05 06:35:43,126 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:44,127 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:46,127 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:50,127 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:58,128 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: Resource temporarily unavailable
2015-07-05 06:35:58,130 WARNING zen.ZenHub: Worker (24759) exited with status: 254 (Fatal error signal: 126)
2015-07-05 06:35:58,130 INFO zen.ZenHub: Starting new zenhubworker
2015-07-05 06:35:58,164 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:59,165 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:36:01,167 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:36:05,168 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:36:13,169 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: Resource temporarily unavailable
2015-07-05 06:36:13,171 WARNING zen.ZenHub: Worker (24762) exited with status: 254 (Fatal error signal: 126)
Subject: | Still need assistance |
Author: | Ken Jenkins |
Posted: | 2015-07-15 18:29 |
I am looking for troubleshooting assistance.
Only the Zenoss user is impacted. Other logins do not report resource errors.
I checked all the obvious OS related resource issues. I need to know what else to check in Zenoss.
Thank you,
Ken
Subject: | More info that may help but I am not sure how to resolve this |
Author: | Ken Jenkins |
Posted: | 2015-07-15 18:44 |
The Zenoss MySQL Server graphs show a rise in aborted, bytes, commands and handlers. Would this play into the resource issue
Subject: | I received a MySQL deadlock |
Author: | Ken Jenkins |
Posted: | 2015-07-16 09:59 |
FYI -> Might this relate to the resource issue ... There is a count of 50 from yesterday after I restarted Zenoss. I came in today and the zenoss login was out of resources again. I stopped zenoss and rebooted my virtual instance. Below is the alert I see which continues to tally. What can I do to resolve this issue
LATEST DETECTED DEADLOCK ------------------------ 2015-07-16 14:52:36 7f46802d4700 *** (1) TRANSACTION: TRANSACTION 21783845, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 5 lock
Subject: | Continued issues with resources |
Author: | Ken Jenkins |
Posted: | 2015-07-16 14:20 |
2015-07-16 19:19:45,057 INFO zen.ZenHub: Worker (14186) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
Subject: | Update this morning |
Author: | Ken Jenkins |
Posted: | 2015-07-17 09:49 |
Ran into resource issues again. I stopped Zenoss and found these rogue processes still running after the stop.
zenoss 2927 1 0 Jul16 00:01:12 /opt/zenoss/bin/python /opt/zenoss/ZenPacks/ZenPacks.zenoss.PythonCollector-1.4.0-py2.7.egg/ZenPacks/zenoss/PythonCollector/zenpython.py --configfile /opt/zenoss/etc/zenpython.conf --cycle --daemon
root 20897 19901 0 14:42 pts/0 00:00:00 grep zenoss
zenoss 27796 1 0 Jul16 00:03:59 /opt/zenoss/bin/python /opt/zenoss/ZenPacks/ZenPacks.zenoss.PythonCollector-1.4.0-py2.7.egg/ZenPacks/zenoss/PythonCollector/zenpython.py --configfile /opt/zenoss/etc/zenpython.conf --cycle --daemon
Subject: | Jenkins, |
Author: | Andrew Kirch |
Posted: | 2015-07-17 10:45 |
Jenkins,
I believe this will be of help:
https://www.memonic.com/user/pneff/folder/linux/id/1gNgT
This is definitely a maxFD issue.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | maxFD issue follow up |
Author: | Ken Jenkins |
Posted: | 2015-07-17 12:12 |
I think you have me pointed in the right direction now. (Thank you!)
I performed due diligence to troubleshoot the zenoss processes running, memory, disk space, and java threads in use. Your mention of maxFD being the issue helped me focus on the cause a bit more.
What I found out is listed below:
(NOTE: The Zenoss documentation may need to be updated to reflect this.)
I checked sysctl at the root level and it was set to the following: (which seemed adequate)
cat /proc/sys/fs/file-max
1620946
sysctl fs.file-max
fs.file-max = 1620946
The zenoss instructions and recommendations in this post suggest we set soft and hard limits in the /etc/security/limits.conf file to below which is what I did.
zenoss soft nofile 4096
zenoss hard nofile 10240
In the zenoss .bash_profile I have this set:
if [ "${USER}" = "zenoss" ]; then
ulimit -n 10240
fi
I verfied the zenoss login that the limits set for the login:
[zenoss@monitoring02 ~]$ ulimit -Hn; ulimit -Sn
10240
10240
The problem I isolated seems to be isolated to the Zenoss recommended soft limitation set in the /etc/security/limits.conf file.
I ran a check on files in use by zenoss and it exceeds the 4096 soft limit.
---------------------
Zenoss files in use check ...
4115
---------------------
To fix this I upped the soft limit in the /etc/security/limits.conf file to 10240.
I will monitor the situation for the day to see if this resolves my issue.
I appreciate the quick response.
Thanks,
Ken
Subject: | I would strongly suggest |
Author: | Andrew Kirch |
Posted: | 2015-07-20 09:29 |
I would strongly suggest looking at underlying OS/kernel tuning. Scaling issues with the underling OS is a problem you will run into. I even more strongly suggest documenting the changes made in case you need to duplicate them later. You may need to increase these numbers much further. I know that the Hybrid IRCD (the first time I ran into scaling issues with FD's) wanted hundreds of thousands of FD's. Increasing them further shouldn't hurt anything.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | Thanks for the heads up. I |
Author: | Ken Jenkins |
Posted: | 2015-07-20 10:10 |
Thanks for the heads up. I doubled the FD's to 102400.
I think it strange that when I run an lsof on zenoss I only see about 4900 files in use and we see out of resources on the ID. is there something I am missing with running lsof on the zenoss login I did not attempt to verify how many rabbitmq or mysql files were running as root which were in use.
I will see if bumping FD associated with the zenoss login to 102400 helps. It is strange that this problem just started on July 4th. Zenoss Core 4 had been running smooth for several months prior.
I will monitor the situation.
Subject: | FD's go up as the number of |
Author: | Andrew Kirch |
Posted: | 2015-07-20 11:42 |
FD's go up as the number of systems monitored goes up.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | Issue still exists ... |
Author: | Ken Jenkins |
Posted: | 2015-07-21 19:33 |
I upped the FD to 500000 for the zenoss and root login. I will let this run overnight and see what happens. There is nothing indicating that I even exceeded 100000 files in use though. Suggestions are appreciated as to how I can isolate the FD's in use.
Subject: | I get the sense that I FD |
Author: | Ken Jenkins |
Posted: | 2015-07-22 16:15 |
I get the sense that I FD limits is not the issue now ... I upped my FD to 600000 but it looks like the processes threads keep spawning ...
Subject: | Still need assistance |
Author: | Ken Jenkins |
Posted: | 2015-07-22 20:06 |
Using Zenoss Core 4 I am only monitoring 106 devices.
The 500000 FD's in use ran longer but zenoss ran out of resources still. I bumped the ulimit soft and hard counts to 512000.
Zenoss files in use check ...
By User ID ...
--------------
USER = centos Count = 16 ...
USER = dbus Count = 4 ...
USER = memcached Count = 4 ...
USER = mysql Count = 61 ...
USER = ntp Count = 13 ...
USER = rabbitmq Count = 20 ...
USER = root Count = 524 ...
USER = rpc Count = 12 ...
USER = rpcuser Count = 4 ...
USER = smmsp Count = 4 ...
USER = zenoss Count = 4046 ...
Let me know if any tuning fixes come into mind.
Subject: | Thread counts ... |
Author: | Ken Jenkins |
Posted: | 2015-07-22 20:09 |
I plan to monitor thread count now for java, mysql and rabbitmq processes
Can someone tell me what the java HeapDump is used for
Thread count output below is PID process and thread count.
---------------------
Zenoss java threads in use ...
13290 java -server -XX:+HeapDumpO 53
15141 java -server -Xmx512m -cp . 22
---------------------
Zenoss mysql threads in use ...
1284 /bin/sh /usr/bin/mysqld_saf 1
---------------------
Zenoss rabbitmq threads in use ...
1504 /bin/sh /usr/sbin/rabbitmq- 1
---------------------
Subject: | Testing access |
Author: | Ken Jenkins |
Posted: | 2015-07-23 11:38 |
I was blocked from sending updates ... this is a test.
Subject: | Upgrade to the latest RPS |
Author: | Andrew Kirch |
Posted: | 2015-07-23 12:22 |
Upgrade to the latest RPS with ZenUp, that should help, but it may not fully resolve the issue.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | you can write a simple |
Author: | Andrew Kirch |
Posted: | 2015-07-23 16:20 |
you can write a simple command datasource to read, parse and return it in nagios format:
OK|var1=val1 var2=val2 and so on.
here is an example I wrote last year.
http://wiki.zenoss.org/Newsletter:7/Arduino_and_Zenoss
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | Zenoss up for a few days but python threads are an issue |
Author: | Ken Jenkins |
Posted: | 2015-07-30 09:28 |
* After applying the latest RUP and I restarted Zenoss, Zenoss crashed after a few hours.
Files in use and number of processes / thread counts I found to be high prior to the crash were.
---------------------
Zenoss files in use check ...
By User ID ...
--------------
USER = zenoss Count = 4075 ...
Zenoss threads in use ...
------------------------------
2940 /opt/zenoss/bin/python /opt 3931
My zenoss statistics cron stopped logging after this last count.
I updated my 90-nproc.conf file from 1024 to 4096 for all users
* soft nproc 4096
* After I tuned the kernel nproc parameters, I rebooted.
Findings:
1. I have a backup cron job which runs nightly and it restarts Zenoss.
2. After a restart, the zenoss python thread counts clear out and start low and continue to build up which will exceed 4096 threads unless I restart Zenoss just prior to this (i.e. when my backup runs and zenoss is restarted)
3. Once Zenoss is restarted the python thread counts drop low and the process repeats all over again.
Problem Identified:
The last RUP update provided a python collector update but it does not look like it fixed the continuous thread count build up. The python threads need to be cleared when done processing.
Can someone help me isolate this issue and / or recommend another python collector update to fix the thread count issue
Thanks,
Ken
Subject: | Ken, |
Author: | Andrew Kirch |
Posted: | 2015-07-30 10:10 |
Ken,
You may need to submit a bug detailing the problem you're seeing. http://jira.zenoss.com
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
< |
Previous Exporting Alerting Rules/Triggers/Notification Rules |
Next Zenoss 3.2.1 Zenpack zenHttpComponent error |
> |