TECHZEN Zenoss User Community ARCHIVE  

Regular restarts of containers 5/6.x

Subject: Regular restarts of containers 5/6.x
Author: Luke Lofgren
Posted: 2018-07-20 14:32

In the last few weeks we've been advised to daily restart: zauth, zope, zope reports, zope debug, and zope api, and metric shipper. We've also been advised there is a clear zenpython memory issue (I think its just the fundamental way python works) and to restart it (weekly?).

With implementing CC 1.5.1 we now have the option to restart things if they exceed more than 1-100% of the amount of memory we define them to use. That seems to be a better way to work with zenpython rather than just restarting it on a schedule. The others aren't related to a memory issue so regular restarts seem to be the only approach. 

Fundamentally, I find it disappointing to restart things, especially as frequently as once a day. I figure its a thing that happens in SaaS products without us knowing, but generally speaking it lets issues fester and grow until you can't get around them by restarting...and then you have severe/extended outages. 

But this post is primarily intended to find out if there are other things that people have decided to regularly restart and/or been advised by support to regularly restart. Secondary was to see if anyone else is concerned about that approach to managing things. 

I know there's a trend these days to "shoot the cow" (e.g. don't troubleshoot something broken, just restart it or if a VM with automated builds, just erase it and build a new one. I couldn't find a link to elaborate on this, if someone has one, I would love to have it. I did find this: New and Improved Economic Cow Jokes

WIRED remove preview
New and Improved Economic Cow Jokes
(((This kind of extension-and-embroidery of a time-honored joke is the very stuff of folk culture.))) This article has been reproduced in a new format and may be missing content or contain faulty links. Contact wiredlabs@wired.com to report an issue. A Lesson in Politics and Culture: SOCIALISM You have 2 cows.
View this on WIRED >

)

Anyway, to recap:

1. What other things do people regularly restart? 

2. Thoughts on the philosophy behind restarting Zenoss things on a schedule? 



------------------------------
Luke Lofgren
Infrastructure Architect
Acxiom Corporation (home based associate)
Waterford PA
------------------------------


Subject: RE: Regular restarts of containers 5/6.x
Author: Luke Lofgren
Posted: 2018-07-20 17:05

One way I do know things are "restarted" as a norm is the "config push" that happens every 6 hours where ZenHub re-tells every collection container what they are supposed to be doing. If things always worked perfectly, there would never be a need to re-push. Modeling is based on potential change, a config push every 6 hours is assuming it is needed for some reason. 

What other things in Zenoss "restart" or are "refreshed" on their own?

------------------------------
Luke
------------------------------


Subject: RE: Regular restarts of containers 5/6.x
Author: Jay Stanley
Posted: 2018-07-22 02:15

I agree with you, i hate "restart to fix". And while, I don't mind having something in place until a fix is completed, a restart will never be permanent in any of my environments. I even look for those restarts in a SaaS solution.

zenhub does that because collection daemons cache configurations for speed, so they do not have to pull configs for all devices every cycletime. 6 hours is the default force for those daemons to repull a configuration. Normally it is not needed, because of invalidations, but sometimes it is.

zope, zendebug, zenapi, zauth, zenreports all get restarted every night because, well, zope sucks with memory usage and two, there is a bug in zope where threads get "deadlocked".

I am not sure why you are restarting metricshipper, I don't see that break.

zenpython has had a rocky history with memory leaks, it's usually from a library being used by some custom datasource or just a poorly written datasource. Windows ZP comes to mind. I don't restart that one often, I wait until the memory usage is high, get Zenoss involved to find RC, then restart. Once they know what the issue is, then I will restart it as needed. I leave it as a manual process to be an annoyance to me, which in turn makes me pressure Zenoss for a fix. :)

------------------------------
jstanley
------------------------------


Subject: RE: Regular restarts of containers 5/6.x
Author: Luke Lofgren
Posted: 2018-07-23 06:35

Thanks for the recap; our thinking is very similar...I restarted zenPython manually for over half a year until a recurring "task pileup" was fixed.

------------------------------
Luke
------------------------------


Subject: RE: Regular restarts of containers 5/6.x
Author: Jay Stanley
Posted: 2018-07-23 08:43

Ah, yeah, the windows tasks. I setup monitoring of the queued and running tasks for that, and then did a controlled scripted restart every Monday or Friday. Did it with zencommand as well. But I have not seen the issue (task build up) with since moving to 6.x.

I am hoping the rework (to use .net) and separation of the ZP will fix a few issues.

------------------------------
jstanley
------------------------------


Subject: RE: Regular restarts of containers 5/6.x
Author: Luke Lofgren
Posted: 2018-07-23 10:25

Just had it this weekend with ZenPython timing out on a Calculated metric for an SSH device. So running tasks is still relevant. Not sure that its a new-with-6.2-thing or just that we doubled the number of devices that we're calculated on.

I also want separate things to be able to be assigned to separate ZenPython...one for NetApp, one for WinRM, one for WSMan, one for Calculated metrics. 

------------------------------
Luke


Subject: RE: Regular restarts of containers 5/6.x
Author: Jay Stanley
Posted: 2018-07-23 13:10

I think there was a bug with that ZP and it was fixed. What ver you running?

Or maybe I am thinking of Duration ZP

------------------------------
jstanley
------------------------------


Subject: RE: Regular restarts of containers 5/6.x
Author: Luke Lofgren
Posted: 2018-07-24 08:06

I'll post back here if support associates it with a fix; mainly I was just pointing out I do need to alert on the metric you listed.

------------------------------
Luke
------------------------------


< Previous
Zenmodeler timeouts via SNMP
  Next
Initial use of Resource Manager self-monitoring
>