One way I do know things are "restarted" as a norm is the "config push" that happens every 6 hours where ZenHub re-tells every collection container what they are supposed to be doing. If things always worked perfectly, there would never be a need to re-push. Modeling is based on potential change, a config push every 6 hours is assuming it is needed for some reason.
Subject: |
RE: Regular restarts of containers 5/6.x |
Author: |
Jay Stanley |
Posted: |
2018-07-22 02:15 |
I agree with you, i hate "restart to fix". And while, I don't mind having something in place until a fix is completed, a restart will never be permanent in any of my environments. I even look for those restarts in a SaaS solution.
zenhub does that because collection daemons cache configurations for speed, so they do not have to pull configs for all devices every cycletime. 6 hours is the default force for those daemons to repull a configuration. Normally it is not needed, because of invalidations, but sometimes it is.
zope, zendebug, zenapi, zauth, zenreports all get restarted every night because, well, zope sucks with memory usage and two, there is a bug in zope where threads get "deadlocked".
I am not sure why you are restarting metricshipper, I don't see that break.
zenpython has had a rocky history with memory leaks, it's usually from a library being used by some custom datasource or just a poorly written datasource. Windows ZP comes to mind. I don't restart that one often, I wait until the memory usage is high, get Zenoss involved to find RC, then restart. Once they know what the issue is, then I will restart it as needed. I leave it as a manual process to be an annoyance to me, which in turn makes me pressure Zenoss for a fix. :)
------------------------------
jstanley
------------------------------
Subject: |
RE: Regular restarts of containers 5/6.x |
Author: |
Luke Lofgren |
Posted: |
2018-07-23 06:35 |
Thanks for the recap; our thinking is very similar...I restarted zenPython manually for over half a year until a recurring "task pileup" was fixed.
------------------------------
Luke
------------------------------
Subject: |
RE: Regular restarts of containers 5/6.x |
Author: |
Jay Stanley |
Posted: |
2018-07-23 08:43 |
Ah, yeah, the windows tasks. I setup monitoring of the queued and running tasks for that, and then did a controlled scripted restart every Monday or Friday. Did it with zencommand as well. But I have not seen the issue (task build up) with since moving to 6.x.
I am hoping the rework (to use .net) and separation of the ZP will fix a few issues.
------------------------------
jstanley
------------------------------
Subject: |
RE: Regular restarts of containers 5/6.x |
Author: |
Luke Lofgren |
Posted: |
2018-07-23 10:25 |
Just had it this weekend with ZenPython timing out on a Calculated metric for an SSH device. So running tasks is still relevant. Not sure that its a new-with-6.2-thing or just that we doubled the number of devices that we're calculated on.
I also want separate things to be able to be assigned to separate ZenPython...one for NetApp, one for WinRM, one for WSMan, one for Calculated metrics.
------------------------------
Luke
Subject: |
RE: Regular restarts of containers 5/6.x |
Author: |
Jay Stanley |
Posted: |
2018-07-23 13:10 |
I think there was a bug with that ZP and it was fixed. What ver you running?
Or maybe I am thinking of Duration ZP
------------------------------
jstanley
------------------------------
Subject: |
RE: Regular restarts of containers 5/6.x |
Author: |
Luke Lofgren |
Posted: |
2018-07-24 08:06 |
I'll post back here if support associates it with a fix; mainly I was just pointing out I do need to alert on the metric you listed.
------------------------------
Luke
------------------------------