TECHZEN Zenoss User Community ARCHIVE  

Zenoss Core 4.2.5 su: cannot set user id: Resource temporarily unavailable

Subject: Zenoss Core 4.2.5 su: cannot set user id: Resource temporarily unavailable
Author: Ken Jenkins
Posted: 2015-07-14 10:28

Team,

I am looking for some help. I am running Zenoss Core 4.2.5 on CentOS 6.6.

In the past two weeks, Zenoss ran into an issue where the portal was inaccessible and I could not login using the zenoss login. My first attempt to resolve was to reboot the server but this did not last long. Within a day, the issue came back.

I need to know what I need to do to isolate and resolve the Resource temporarily unavailable which is isolated to the zenoss login ID.

Below are troubleshooting steps run.

From another ID and using root:

sudo su - zenoss
su: cannot set user id: Resource temporarily unavailable

sudo lsof | grep zenoss | wc -l
3940

ps -U zenoss | wc -l
26

ps -U zenoss auxww | grep java
zenoss 14639 0.4 3.3 6764924 547660 Sl Jul13 4:57 java -server -XX:+HeapDumpOnOutOfMemoryError -DZENOSS_COMMAND=zeneventserver -DZENHOME=/opt/zenoss -Djetty.home=/opt/zenoss -Djetty.logs=/opt/zenoss/log -Dlogback.configurationFile=/opt/zenoss/etc/zeneventserver/logback.xml -DZENOSS_DAEMON=y -jar /opt/zenoss/lib/jetty-start-7.5.3.v20111011.jar --config=/opt/zenoss/etc/zeneventserver/jetty/start.config --ini=/opt/zenoss/etc/zeneventserver/jetty/jetty.ini --pre=etc/zeneventserver/jetty/jetty-logging.xml
zenoss 15296 0.0 0.3 2197956 50088 Sl Jul13 0:59 java -server -Xmx512m -cp ./*:/opt/zenoss/zenjmx-libs/*: com.zenoss.zenpacks.zenjmx.ZenJmxMain --configfile /opt/zenoss/etc/zenjmx.conf -zenjmxjavaport 9988 --configfile /opt/zenoss/etc/zenjmx.conf -v 20

rabbitmqctl -p /zenoss list_queues; rabbitmqctl list_connections
Listing queues ...
celery 0
zenoss.queues.zep.migrated.summary 0
zenoss.queues.zep.migrated.archive 0
zenoss.queues.zep.rawevents 0
zenoss.queues.zep.heartbeats 0
zenoss.queues.zep.zenevents 0
zenoss.queues.zep.signal 0
zenoss.queues.zep.modelchange 0
monitoring02.infra.viasatcloud.com.celeryd.pidbox 0
...done.
Listing connections ...
zenoss ::1 42873 running
zenoss 127.0.0.1 57425 running
zenoss ::1 42874 running
zenoss 127.0.0.1 57415 running
zenoss ::1 42860 running
zenoss ::1 42837 running
zenoss ::1 42857 running
zenoss ::1 42847 running
...done.

source /home/zenoss/.bash_profile

zenoss status
su: cannot set user id: Resource temporarily unavailable

zeneventserver stop
stopping...

su - zenoss (I am now able to login using the zenoss login)

[zenoss@~]$ zenup status

Product: zenoss-core-4.2.5 (id = zenoss-core-4.2.5)
Home: /opt/zenoss
Revision: 203
Upgrading: None
Minimum: 203
Updated On: Wed Apr 29 03:28:53 2015

[zenoss@monitoring02 ~]$ zenoss status
Daemon: zeneventserver not running *** (I stopped the zeneventserver) ***
Daemon: zopectl program running; pid=14728
Daemon: zenrrdcached program running; pid=14733
Daemon: zenhub program running; pid=14785
Daemon: zenjobs program running; pid=14827
Daemon: zeneventd program running; pid=14891
Daemon: zenping program running; pid=14949
Daemon: zensyslog program running; pid=15057
Daemon: zenstatus program running; pid=15038
Daemon: zenactiond program running; pid=15078
Daemon: zentrap program running; pid=15170
Daemon: zenmodeler program running; pid=15154
Daemon: zenperfsnmp program running; pid=15192
Daemon: zencommand program running; pid=15222
Daemon: zenprocess program running; pid=15253
Daemon: zredis program running; pid=15256
Daemon: zenjmx program running; pid=15291
Daemon: zenpython program running; pid=15366

After a few minutes, as zenoss, I run:

[zenoss@~]$ free
-bash: fork: retry: Resource temporarily unavailable

[root@~]# lsof | grep zenoss | wc -l
3724

[root@~]# ps -U zenoss | wc -l
25

Looks like there are a few zenevent processes still running ...

ps -ef | grep zenevent
zenoss 14891 1 0 Jul13 00:00:10 /opt/zenoss/bin/python /opt/zenoss/Products/ZenEvents/zeneventd.py --configfile /opt/zenoss/etc/zeneventd.conf --cycle --daemon
zenoss 14904 14891 0 Jul13 00:00:53 /opt/zenoss/bin/python /opt/zenoss/Products/ZenEvents/zeneventd.py --configfile /opt/zenoss/etc/zeneventd.conf --cycle --duallog
zenoss 14905 14891 0 Jul13 00:00:52 /opt/zenoss/bin/python /opt/zenoss/Products/ZenEvents/zeneventd.py --configfile /opt/zenoss/etc/zeneventd.conf --cycle --duallog
root 16761 10996 0 15:12 pts/0 00:00:00 grep zenevent

I killed the above and as root ...
[root@~]# zenoss status
su: cannot set user id: Resource temporarily unavailable

[root@~]# zeneventd stop
stopping...
already stopped

[root@~]# zenoss status
su: cannot set user id: Resource temporarily unavailable

[root@~]# zensyslog stop
stopping...
[root@~]# zenoss status
Daemon: zeneventserver Java >= 1.6 is required.
Daemon: zopectl program running; pid=14728
Daemon: zenrrdcached program running; pid=14733
Daemon: zenhub program running; pid=14785
Daemon: zenjobs program running; pid=14827
Daemon: zeneventd not running
Daemon: zenping program running; pid=14949
Daemon: zensyslog not running
Daemon: zenstatus program running; pid=15038
Daemon: zenactiond program running; pid=15078
Daemon: zentrap program running; pid=15170
Daemon: zenmodeler program running; pid=15154
Daemon: zenperfsnmp program running; pid=15192
Daemon: zencommand program running; pid=15222
Daemon: zenprocess program running; pid=15253
Daemon: zredis program running; pid=15256
Daemon: zenjmx program running; pid=15291
Daemon: zenpython program running; pid=15366

[root@~]# zeneventserver start
starting...
Waiting for zeneventserver to start.......
[root@~]# zeneventd start
starting...

[zenoss@ zenoss]$ ps -U zenoss
-bash: fork: retry: Resource temporarily unavailable

As root:
zopectl stop

Logged into zenoss and ran:

zenoss stop
zenoss start to recover.



Subject: I forgot to mention that the
Author: Ken Jenkins
Posted: 2015-07-14 11:04

I forgot to mention that the /etc/security/limits.conf file is configured with soft and hard limits.

zenoss soft nofile 4096
zenoss hard nofile 10240



Subject: Issue is back .. help
Author: Ken Jenkins
Posted: 2015-07-14 15:16

zenoss status
-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable



Subject: I suspect java threads may be
Author: Ken Jenkins
Posted: 2015-07-14 16:59

I suspect java threads may be an issue but I need help to discern this ...

After a zenoss restart, I see ...

[zenoss@ 01-Reports]$ ps auxww | grep 9561; ps uH 9561 | wc -l; ps auxww | grep 8850; ps uH 8850 | wc -l;

zenoss 9561 0.1 0.2 2197956 40360 Sl 21:48 0:00 java -server -Xmx512m -cp ./*:/opt/zenoss/zenjmx-libs/*: com.zenoss.zenpacks.zenjmx.ZenJmxMain --configfile /opt/zenoss/etc/zenjmx.conf -zenjmxjavaport 9988 --configfile /opt/zenoss/etc/zenjmx.conf -v 20
zenoss 10071 0.0 0.0 103244 880 pts/0 S+ 21:57 0:00 grep 9561
23

zenoss 8850 7.5 4.3 6770160 704876 pts/0 Sl 21:47 0:46 java -server -XX:+HeapDumpOnOutOfMemoryError -DZENOSS_COMMAND=zeneventserver -DZENHOME=/opt/zenoss -Djetty.home=/opt/zenoss -Djetty.logs=/opt/zenoss/log -Dlogback.configurationFile=/opt/zenoss/etc/zeneventserver/logback.xml -DZENOSS_DAEMON=y -jar /opt/zenoss/lib/jetty-start-7.5.3.v20111011.jar --config=/opt/zenoss/etc/zeneventserver/jetty/start.config --ini=/opt/zenoss/etc/zeneventserver/jetty/jetty.ini --pre=etc/zeneventserver/jetty/jetty-logging.xml
zenoss 10075 0.0 0.0 103244 872 pts/0 R+ 21:57 0:00 grep 8850
55

What is the HeapDumpOnOutOfMemoryError java process used for

Thanks,
Ken



Subject: > zenoss soft nofile 4096
Author: Jan Garaj
Posted: 2015-07-14 17:39

> zenoss soft nofile 4096
> zenoss hard nofile 10240
Your limits can be overridden. Check real zenoss limits:
sudo su - zenoss
ulimit -Hn
ulimit -Sn

Devops Monitoring Expert advice: Dockerize/automate/monitor all the things.

DevOps stack: Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform / Elasticsearch



Subject: ulimits are good
Author: Ken Jenkins
Posted: 2015-07-14 18:13

Thanks .. I verified and I was set to unlimited.

$ ulimit -Hn
10240
$ ulimit -Sn
10240

I noticed that there were over 6000 events from syslog. I cleared those events out.

Can you think of other ways to isolate this resource issue

Thanks Jan!



Subject: I am still experiencing
Author: Ken Jenkins
Posted: 2015-07-15 10:10

I am still experiencing resource issues. Help is appreciated.



Subject: Checking the zredis.log, I
Author: Ken Jenkins
Posted: 2015-07-15 11:47

Checking the zredis.log, I found ... NOTE that I just updated the vm.overcommit_memory setting.

[30773] 28 Apr 20:30:07 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.con
f and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

in zenhub.log ...

2015-07-05 06:35:43,097 INFO zen.ZenHub: Starting new zenhubworker
2015-07-05 06:35:43,126 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:44,127 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:46,127 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:50,127 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:58,128 INFO zen.ZenHub: Worker (24759) reports /opt/zenoss/bin/zenfunctions: fork: Resource temporarily unavailable
2015-07-05 06:35:58,130 WARNING zen.ZenHub: Worker (24759) exited with status: 254 (Fatal error signal: 126)
2015-07-05 06:35:58,130 INFO zen.ZenHub: Starting new zenhubworker
2015-07-05 06:35:58,164 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:35:59,165 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:36:01,167 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:36:05,168 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable
2015-07-05 06:36:13,169 INFO zen.ZenHub: Worker (24762) reports /opt/zenoss/bin/zenfunctions: fork: Resource temporarily unavailable
2015-07-05 06:36:13,171 WARNING zen.ZenHub: Worker (24762) exited with status: 254 (Fatal error signal: 126)



Subject: Still need assistance
Author: Ken Jenkins
Posted: 2015-07-15 18:29

I am looking for troubleshooting assistance.

Only the Zenoss user is impacted. Other logins do not report resource errors.

I checked all the obvious OS related resource issues. I need to know what else to check in Zenoss.

Thank you,
Ken



Subject: More info that may help but I am not sure how to resolve this
Author: Ken Jenkins
Posted: 2015-07-15 18:44

The Zenoss MySQL Server graphs show a rise in aborted, bytes, commands and handlers. Would this play into the resource issue



Subject: I received a MySQL deadlock
Author: Ken Jenkins
Posted: 2015-07-16 09:59

FYI -> Might this relate to the resource issue ... There is a count of 50 from yesterday after I restarted Zenoss. I came in today and the zenoss login was out of resources again. I stopped zenoss and rebooted my virtual instance. Below is the alert I see which continues to tally. What can I do to resolve this issue

LATEST DETECTED DEADLOCK ------------------------ 2015-07-16 14:52:36 7f46802d4700 *** (1) TRANSACTION: TRANSACTION 21783845, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 5 lock



Subject: Continued issues with resources
Author: Ken Jenkins
Posted: 2015-07-16 14:20

2015-07-16 19:19:45,057 INFO zen.ZenHub: Worker (14186) reports /opt/zenoss/bin/zenfunctions: fork: retry: Resource temporarily unavailable



Subject: Update this morning
Author: Ken Jenkins
Posted: 2015-07-17 09:49

Ran into resource issues again. I stopped Zenoss and found these rogue processes still running after the stop.

zenoss 2927 1 0 Jul16 00:01:12 /opt/zenoss/bin/python /opt/zenoss/ZenPacks/ZenPacks.zenoss.PythonCollector-1.4.0-py2.7.egg/ZenPacks/zenoss/PythonCollector/zenpython.py --configfile /opt/zenoss/etc/zenpython.conf --cycle --daemon
root 20897 19901 0 14:42 pts/0 00:00:00 grep zenoss
zenoss 27796 1 0 Jul16 00:03:59 /opt/zenoss/bin/python /opt/zenoss/ZenPacks/ZenPacks.zenoss.PythonCollector-1.4.0-py2.7.egg/ZenPacks/zenoss/PythonCollector/zenpython.py --configfile /opt/zenoss/etc/zenpython.conf --cycle --daemon



Subject: Jenkins,
Author: Andrew Kirch
Posted: 2015-07-17 10:45

Jenkins,

I believe this will be of help:
https://www.memonic.com/user/pneff/folder/linux/id/1gNgT

This is definitely a maxFD issue.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: maxFD issue follow up
Author: Ken Jenkins
Posted: 2015-07-17 12:12

I think you have me pointed in the right direction now. (Thank you!)

I performed due diligence to troubleshoot the zenoss processes running, memory, disk space, and java threads in use. Your mention of maxFD being the issue helped me focus on the cause a bit more.

What I found out is listed below:

(NOTE: The Zenoss documentation may need to be updated to reflect this.)

I checked sysctl at the root level and it was set to the following: (which seemed adequate)

cat /proc/sys/fs/file-max
1620946

sysctl fs.file-max
fs.file-max = 1620946

The zenoss instructions and recommendations in this post suggest we set soft and hard limits in the /etc/security/limits.conf file to below which is what I did.

zenoss soft nofile 4096
zenoss hard nofile 10240

In the zenoss .bash_profile I have this set:

if [ "${USER}" = "zenoss" ]; then
ulimit -n 10240
fi

I verfied the zenoss login that the limits set for the login:

[zenoss@monitoring02 ~]$ ulimit -Hn; ulimit -Sn
10240
10240

The problem I isolated seems to be isolated to the Zenoss recommended soft limitation set in the /etc/security/limits.conf file.

I ran a check on files in use by zenoss and it exceeds the 4096 soft limit.
---------------------
Zenoss files in use check ...

4115
---------------------

To fix this I upped the soft limit in the /etc/security/limits.conf file to 10240.

I will monitor the situation for the day to see if this resolves my issue.

I appreciate the quick response.

Thanks,
Ken



Subject: I would strongly suggest
Author: Andrew Kirch
Posted: 2015-07-20 09:29

I would strongly suggest looking at underlying OS/kernel tuning. Scaling issues with the underling OS is a problem you will run into. I even more strongly suggest documenting the changes made in case you need to duplicate them later. You may need to increase these numbers much further. I know that the Hybrid IRCD (the first time I ran into scaling issues with FD's) wanted hundreds of thousands of FD's. Increasing them further shouldn't hurt anything.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: Thanks for the heads up. I
Author: Ken Jenkins
Posted: 2015-07-20 10:10

Thanks for the heads up. I doubled the FD's to 102400.

I think it strange that when I run an lsof on zenoss I only see about 4900 files in use and we see out of resources on the ID. is there something I am missing with running lsof on the zenoss login I did not attempt to verify how many rabbitmq or mysql files were running as root which were in use.

I will see if bumping FD associated with the zenoss login to 102400 helps. It is strange that this problem just started on July 4th. Zenoss Core 4 had been running smooth for several months prior.

I will monitor the situation.



Subject: FD's go up as the number of
Author: Andrew Kirch
Posted: 2015-07-20 11:42

FD's go up as the number of systems monitored goes up.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: Issue still exists ...
Author: Ken Jenkins
Posted: 2015-07-21 19:33

I upped the FD to 500000 for the zenoss and root login. I will let this run overnight and see what happens. There is nothing indicating that I even exceeded 100000 files in use though. Suggestions are appreciated as to how I can isolate the FD's in use.



Subject: I get the sense that I FD
Author: Ken Jenkins
Posted: 2015-07-22 16:15

I get the sense that I FD limits is not the issue now ... I upped my FD to 600000 but it looks like the processes threads keep spawning ...



Subject: Still need assistance
Author: Ken Jenkins
Posted: 2015-07-22 20:06

Using Zenoss Core 4 I am only monitoring 106 devices.

The 500000 FD's in use ran longer but zenoss ran out of resources still. I bumped the ulimit soft and hard counts to 512000.

Zenoss files in use check ...

By User ID ...
--------------
USER = centos Count = 16 ...
USER = dbus Count = 4 ...
USER = memcached Count = 4 ...
USER = mysql Count = 61 ...
USER = ntp Count = 13 ...
USER = rabbitmq Count = 20 ...
USER = root Count = 524 ...
USER = rpc Count = 12 ...
USER = rpcuser Count = 4 ...
USER = smmsp Count = 4 ...
USER = zenoss Count = 4046 ...

Let me know if any tuning fixes come into mind.



Subject: Thread counts ...
Author: Ken Jenkins
Posted: 2015-07-22 20:09

I plan to monitor thread count now for java, mysql and rabbitmq processes

Can someone tell me what the java HeapDump is used for

Thread count output below is PID process and thread count.

---------------------
Zenoss java threads in use ...
13290 java -server -XX:+HeapDumpO 53
15141 java -server -Xmx512m -cp . 22
---------------------
Zenoss mysql threads in use ...
1284 /bin/sh /usr/bin/mysqld_saf 1
---------------------
Zenoss rabbitmq threads in use ...
1504 /bin/sh /usr/sbin/rabbitmq- 1
---------------------



Subject: Testing access
Author: Ken Jenkins
Posted: 2015-07-23 11:38

I was blocked from sending updates ... this is a test.



Subject: Upgrade to the latest RPS
Author: Andrew Kirch
Posted: 2015-07-23 12:22

Upgrade to the latest RPS with ZenUp, that should help, but it may not fully resolve the issue.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: you can write a simple
Author: Andrew Kirch
Posted: 2015-07-23 16:20

you can write a simple command datasource to read, parse and return it in nagios format:
OK|var1=val1 var2=val2 and so on.
here is an example I wrote last year.
http://wiki.zenoss.org/Newsletter:7/Arduino_and_Zenoss

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: Zenoss up for a few days but python threads are an issue
Author: Ken Jenkins
Posted: 2015-07-30 09:28

* After applying the latest RUP and I restarted Zenoss, Zenoss crashed after a few hours.

Files in use and number of processes / thread counts I found to be high prior to the crash were.
---------------------
Zenoss files in use check ...

By User ID ...
--------------
USER = zenoss Count = 4075 ...

Zenoss threads in use ...
------------------------------
2940 /opt/zenoss/bin/python /opt 3931

My zenoss statistics cron stopped logging after this last count.

I updated my 90-nproc.conf file from 1024 to 4096 for all users
* soft nproc 4096

* After I tuned the kernel nproc parameters, I rebooted.

Findings:

1. I have a backup cron job which runs nightly and it restarts Zenoss.
2. After a restart, the zenoss python thread counts clear out and start low and continue to build up which will exceed 4096 threads unless I restart Zenoss just prior to this (i.e. when my backup runs and zenoss is restarted)
3. Once Zenoss is restarted the python thread counts drop low and the process repeats all over again.

Problem Identified:

The last RUP update provided a python collector update but it does not look like it fixed the continuous thread count build up. The python threads need to be cleared when done processing.

Can someone help me isolate this issue and / or recommend another python collector update to fix the thread count issue

Thanks,
Ken



Subject: Ken,
Author: Andrew Kirch
Posted: 2015-07-30 10:10

Ken,

You may need to submit a bug detailing the problem you're seeing. http://jira.zenoss.com

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



< Previous
Exporting Alerting Rules/Triggers/Notification Rules
  Next
Zenoss 3.2.1 Zenpack zenHttpComponent error
>