![]() |
![]() |
Subject: | Metric Shipper Crashing |
Author: | [Not Specified] |
Posted: | 2015-05-19 08:59 |
So everything was going well with my move into Zenoss 5 and then I noticed one day I wasn't getting stats on the graphs.
Turns out MetricShipper is breaking, which in turn is causing MetricConsumer to break and no info is being passed to OpenTSDB.
I thought maybe too little RAM or too many devices (688) but when I removed all devices the same problem persists.
I have no idea what these two services do or how they run so I need some help to know where to debug.
Thanks!!
(This is version 5.0.0)
Subject: | check the log file for metric |
Author: | Andrew Kirch |
Posted: | 2015-05-20 02:16 |
check the log file for metric shipper, and if you have questions pastebin the log file for review.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | Looks like one of the health |
Author: | [Not Specified] |
Posted: | 2015-05-20 12:50 |
Looks like one of the health checks is failing The logs look something like this:
2015/05/20 17:38:08 200 2.088ms POST /api/metrics/store
2015/05/20 17:38:23 200 3.076507ms POST /api/metrics/store
2015/05/20 17:38:29 200 3.259981ms POST /api/metrics/store
2015/05/20 17:38:31.386361 Registrar received 3 events
2015/05/20 17:38:38 200 2.412297ms POST /api/metrics/store
2015/05/20 17:38:53 200 2.681042ms POST /api/metrics/store
2015/05/20 17:38:59 200 1.792652ms POST /api/metrics/store
2015/05/20 17:39:01.386538 Registrar received 3 events
2015/05/20 17:39:08 200 2.459188ms POST /api/metrics/store
W0520 17:39:09.056406 00001 controller.go:777] Health check store_answering failed.
2015/05/20 17:39:23 200 3.111019ms POST /api/metrics/store
2015/05/20 17:39:29 200 3.3729ms POST /api/metrics/store
2015/05/20 17:39:31.385544 Registrar received 3 events
2015/05/20 17:39:38 200 3.229055ms POST /api/metrics/store
2015/05/20 17:39:53 200 2.150656ms POST /api/metrics/store
2015/05/20 17:39:59 200 3.459016ms POST /api/metrics/store
2015/05/20 17:40:01.387814 Registrar received 3 events
2015/05/20 17:40:08 200 11.759851ms POST /api/metrics/store
2015/05/20 17:40:23 200 4.441657ms POST /api/metrics/store
2015/05/20 17:40:29 200 1.924477ms POST /api/metrics/store
2015/05/20 17:40:31.386426 Registrar received 3 events
2015/05/20 17:40:38 200 3.279974ms POST /api/metrics/store
2015/05/20 17:40:53 200 3.618228ms POST /api/metrics/store
2015/05/20 17:40:59 200 2.761641ms POST /api/metrics/store
2015/05/20 17:41:01.385492 Registrar received 3 events
2015/05/20 17:41:08 200 5.060643ms POST /api/metrics/store
2015/05/20 17:41:23 200 3.170224ms POST /api/metrics/store
2015/05/20 17:41:29 200 5.183186ms POST /api/metrics/store
2015/05/20 17:41:31.387764 Registrar received 3 events
2015/05/20 17:41:38 200 1.96729ms POST /api/metrics/store
2015/05/20 17:41:53 200 1.847357ms POST /api/metrics/store
2015/05/20 17:41:59 200 1.786847ms POST /api/metrics/store
2015/05/20 17:42:01.385189 Registrar received 3 events
2015/05/20 17:42:08 200 4.218909ms POST /api/metrics/store
2015/05/20 17:42:23 200 4.361328ms POST /api/metrics/store
2015/05/20 17:42:29 200 2.002849ms POST /api/metrics/store
2015/05/20 17:42:31.387018 Registrar received 3 events
2015/05/20 17:42:38 200 2.797854ms POST /api/metrics/store
2015/05/20 17:42:53 200 2.970742ms POST /api/metrics/store
W0520 17:43:01.280346 00001 controller.go:785] Health check store_answering timeout.
2015/05/20 17:43:01.388283 Registrar received 3 events
2015/05/20 17:43:08 200 3.180024ms POST /api/metrics/store
2015/05/20 17:43:23 200 2.025702ms POST /api/metrics/store
2015/05/20 17:43:38 200 3.314738ms POST /api/metrics/store
W0520 17:43:41.287998 00001 controller.go:785] Health check store_answering timeout.
2015/05/20 17:43:53 200 4.358755ms POST /api/metrics/store
2015/05/20 17:44:08 200 3.198304ms POST /api/metrics/store
W0520 17:44:21.295042 00001 controller.go:785] Health check store_answering timeout.
2015/05/20 17:44:23 200 3.508664ms POST /api/metrics/store
2015/05/20 17:44:38 200 2.893379ms POST /api/metrics/store
2015/05/20 17:44:53 200 4.108687ms POST /api/metrics/store
W0520 17:45:01.304698 00001 controller.go:785] Health check store_answering timeout.
2015/05/20 17:45:08 200 3.425616ms POST /api/metrics/store
2015/05/20 17:45:23 200 4.164945ms POST /api/metrics/store
2015/05/20 17:45:38 200 3.265217ms POST /api/metrics/store
W0520 17:45:41.312979 00001 controller.go:785] Health check store_answering timeout.
2015/05/20 17:45:53 200 7.067161ms POST /api/metrics/store
2015/05/20 17:46:08 200 3.196869ms POST /api/metrics/store
W0520 17:46:21.321243 00001 controller.go:785] Health check store_answering timeout.
2015/05/20 17:46:23 200 3.322951ms POST /api/metrics/store
2015/05/20 17:46:38 200 3.825088ms POST /api/metrics/store
2015/05/20 17:46:53 200 3.310886ms POST /api/metrics/store
W0520 17:47:01.330191 00001 controller.go:785] Health check store_answering timeout.
Subject: | try restarting it. If you can |
Author: | Andrew Kirch |
Posted: | 2015-05-20 13:23 |
try restarting it. If you can't, there may be a bug.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | Hi, |
Author: | [Not Specified] |
Posted: | 2015-05-21 07:40 |
Hi,
Try to restart all the process from CLI. I faced the same issue and it got fixed by restarting the services.
stop resource manger.
stop serviced.
stop docker.
check all the services are stopped and not running then start it.
start docker
Check if docker is started then
start serviced
Check if serviced is started successfully then
start start resource manager
open the resource manger from browser and see the services. Wait for some time it will take some time to start.
Subject: | I have what sounds like a |
Author: | [Not Specified] |
Posted: | 2015-05-21 07:46 |
I have what sounds like a similar issue to yours. does your problem include the services hbase, regionserver, central query, and opentsdb services also being down
I've got a bug open on this topic if it's the same thing....
https://jira.zenoss.com/browse/CC-975filter=-2
Subject: | So after trying both |
Author: | [Not Specified] |
Posted: | 2015-05-25 08:34 |
So after trying both restarting the service and restarting docker/serviced the problem still persists. It will be fine for about <2hrs but then crashing afterwards.
Any ideas on how to fix this I'm thinking it could be a RAM issue since the VM is only provisioned for 16GB, which is below the minimum recommended.
Subject: | Check/show your performance |
Author: | Jan Garaj |
Posted: | 2015-05-25 09:17 |
Check/show your performance metric graphs of metric shipper service from control center and logs.
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | I posted the logs a few posts |
Author: | [Not Specified] |
Posted: | 2015-05-25 12:46 |
I posted the logs a few posts up, it looks like it is failing a health_check
Here are the graphs, let me know if the link doesn't work. http://imgur.com/D3nyjcn
Subject: | Peak 40P datapoints is |
Author: | Jan Garaj |
Posted: | 2015-05-25 13:51 |
Peak 40P datapoints is suspicious. This service uses 40MB memory + 20MB memory for cache => memory is not a problem. Check graphs of all services and try to search some suspicious peaks.
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | Metric Shipper Health Check Failing |
Author: | [Not Specified] |
Posted: | 2015-07-31 13:04 |
Hello. I'm seeing a similar issue however MetricShipper on my tester VM failed right away. There was no period where it was working and just stopped.
All of the process graphs look fairly regular with no spikes except for when the process is restarted. When starting the process I do get a WARN and CRIT level log messages which maybe an indication of the issue:
2015-07-31 17:44:52,921 WARN Included extra file "/opt/zenoss/etc/metricshipper/metricshipper_supervisor.conf" during parsing
2015-07-31 17:44:52,986 INFO RPC interface 'supervisor' initialized
2015-07-31 17:44:52,986 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-07-31 17:44:52,986 INFO supervisord started with pid 35
< |
Previous Modeler Plugins |
Next zenoss5 with external docker |
> |