Zenoss ZenTech Community

I have been trying to figure out how to successfully scale Zenoss Core 5 to large endpoint numbers. I am running a 5.0.9 instance with 12CPU and 96GB memory and I've found that this can stably support 1000 devices. However, once I try to add any number of devices more than that, I start to get the following 2 events (in journalctl -u serviced):

serviced[29704]: 2016/02/10 18:46:06 Unsolicited response received on idle HTTP channel starting with "H"; err=

serviced[29704]: 2016/02/10 18:46:06 Failed to connect to 127.0.0.1:2181: dial tcp 127.0.0.1:2181: i/o timeout

These 2 events result in health check timeouts on a few services, mainly metric consumer. GUI sessions on the dashboard are also impacted by constantly having that pop up box come up and say that you need to reload the page. Most importantly, this causes false critical events of IP down to come in on multiple random devices at once.

At first I thought CPU was my main constraint, seeing how the load averages increased along with the %used when I increased from 250 to 500 to 1000 devices, but there was no major difference in the %used or load averages between 1000 and 1200 devices, especially one that would cause these errors.

Memory is definitely not part of the issue, as adding large increments of devices does not result in much of an increase in memory usage.

I have been using the zenbatchload script within zenhub to load large batches of servers, usually between 200-500 at a time, and I have had no issues with this process until around 1100-1200 devices. That's when those two events in the journalctl show up and begin impacting the system as I described.

Does this seemt to be a resource issue Would adding a resource host help I know I'd have to add one with identical resources, but would that actually help I would prefer to build out collectors to balance out the load, but clearly that's not an option yet, and who knows how long it will actually be until it's developed (I have looked into trying to build my own version of a collector, but have not been too successful as of yet).

Can everyone else share their resources along with their device endpoints so we can compare It would be useful to have suggested resources for different numbers of devices in the Administration or Installation guides rather than just the minimum resources necessary to run Zenoss 5.

I'm really not sure where to go from here so if anyone has any suggestions, I'd appreciate them.

Subject:	it might be peaking out on
Author:	Andrew Kirch
Posted:	2016-03-21 08:40

it might be peaking out on bus or storage. You may also want to push mysql and hadoop onto seperate nodes in a Control Center cluster. That should get you some additional headroom.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard

Subject:	Zenoss 5 Scaling
Author:	[Not Specified]
Posted:	2016-02-10 13:35

Zenoss 5 Scaling