TECHZEN Zenoss User Community ARCHIVE  

Zenoss runs for a while then quits responding

Subject: Zenoss runs for a while then quits responding
Author: Matt Carter
Posted: 2019-05-09 11:55


Im experiencing an issue where Zenoss will run and monitor with no issues for a couple weeks, then it just stops responding, cant get to the web interface for Zenoss or the control panel for the underlying processes. 

I end up having to PuTTY into the server itself and restart services.

When it comes back up, I have no data from whatever point it decided to quit. Without checking on it constantly I have no idea when it stops responding, I only discover it the next time I need to check on something. 

I did not have these kinds of issues in Z4, and the fact that despite having thrown tons of resources at the server, it continues to behave in this manner, is very quickly turning me away from the product altogether

Matt C

Subject: RE: Zenoss runs for a while then quits responding
Author: Jason Olson
Posted: 2019-05-09 16:24

What version are you running? Zenoss prior to v6.2.1 did have a memory leak in one of the core processes (and I'm blanking on which one as it's been awhile since it affected us) which would cause the instance to fail after all memory was consumed on the device. As well, is the Zenoss instance in a virtual machine or on a physical box? I've seen crashes in various processes happen due to dodgy RAM. Lastly, does it pull metrics for awhile, fail to gather some, then at some point later start collecting again? If so, that could be caused by an inability of Zenoss to query disk metrics for unnamed volumes on Windows servers, generally recovery partitions and Cluster volumes. Excluding those types in the zFileSystemMapIgnoreNames configuration property may help solve that issue.

Jason Olson

Subject: RE: Zenoss runs for a while then quits responding
Author: Jay Stanley
Posted: 2019-05-09 19:42

Are you monitoring the instance using the the ZenossRM device class and ControlCenter device class?

I would watch memory usage and disk usage of the pools. Is this a single host system or multi?


< Previous
install additional nagios plugins
[Monitoring processes] zFailSeverity and count threshold in OSProcess template