TECHZEN Zenoss User Community ARCHIVE  

[Solved] OpenTSDB Failing Health Checks

Subject: [Solved] OpenTSDB Failing Health Checks
Author: [Not Specified]
Posted: 2016-12-09 16:20

I've recently installed the newest versions of Control Center and Zenoss.Core on a single-host deployment. For the most part everything went smoothly, but OpenTSDB seems to be running into some errors and wont pass health checks. When I look at the OpenTSDB container logs via the control center web gui it looks like it comes up cleanly, but for some reason after a few seconds opentsdb exits. It will spawn a new instance and then exit again. Any ideas as to why this might be the case I've included full logs below:

time="2016-12-09T21:16:52Z" level=info msg="Loaded delegate keys from file" keyfile="/etc/serviced/delegate.keys" location="localkeys.go:320" logger=auth
I1209 21:16:52.191104 00001 controller.go:235] Wrote config file /opt/zenoss/etc/opentsdb/opentsdb.conf
I1209 21:16:52.206857 00001 controller.go:206] Successfully ran command:'&{/usr/bin/chown [chown root:root /opt/zenoss/etc/opentsdb/opentsdb.conf] [] [] 0xc4203efaa0 exit status 0 true [0xc420384128 0xc420384140 0xc420384140] [0xc420384128 0xc420384140] [0xc420384138] [0x53f440] 0xc4203995c0 }' output:
I1209 21:16:52.216449 00001 controller.go:206] Successfully ran command:'&{/usr/bin/chmod [chmod 0644 /opt/zenoss/etc/opentsdb/opentsdb.conf] [] [] 0xc4203efc50 exit status 0 true [0xc420384150 0xc420384168 0xc420384168] [0xc420384150 0xc420384168] [0xc420384160] [0x53f440] 0xc4203996e0 }' output:
I1209 21:16:52.240205 00001 logstash.go:62] Using logstash resourcePath: /usr/local/serviced/resources/logstash
I1209 21:16:52.240394 00001 controller.go:235] Wrote config file /etc/filebeat.conf
I1209 21:16:52.240468 00001 controller.go:392] pushing network stats to: http://localhost:22350/api/metrics/store
I1209 21:16:52.246193 00001 instance.go:87] about to execute: /usr/local/serviced/resources/logstash/filebeat , [-c /etc/filebeat.conf][2]
I1209 21:16:52.247933 00001 controller.go:405] c.zkInfo: {ZkDSN:{"Servers":["10.254.254.145:2181"],"Timeout":15000000000} PoolID:default}
2016/12/09 21:16:52 Connected to 10.254.254.145:2181
2016/12/09 21:16:52 Authenticated: id=97079651105767462, timeout=15000
I1209 21:16:53.138058 00001 vif.go:60] vif subnet is: 10.3.0.0/16
I1209 21:16:53.138095 00001 controller.go:436] command: [/bin/sh -c " export CREATE_TABLES=1; /opt/opentsdb/start-opentsdb.sh localhost:2181"] [1]
I1209 21:16:53.167142 00001 controller.go:899] Got service endpoints for 2f306ir5f3vjk8hc2ufhznqug: map[tcp:443:[{ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.254.254.145 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:5443}] tcp:5042:[{ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.254.254.145 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042}] tcp:5043:[{ServiceID:controlplane_logstash_filebeat InstanceID:0 Application:controlplane_logstash_filebeat Purpose: HostID: HostIP:10.254.254.145 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043}] tcp:5601:[{ServiceID:controlplane_kibana_tcp InstanceID:0 Application:controlplane_kibana_tcp Purpose: HostID: HostIP:10.254.254.145 HostPort:5601 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5601 Protocol:tcp VirtualAddress: ProxyPort:5601}] tcp:8444:[{ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.254.254.145 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444}]]
I1209 21:16:53.167638 00001 controller.go:811] Kicking off health check answering.
I1209 21:16:53.167659 00001 controller.go:812] Setting up health check: wget --timeout=3 --tries=1 -q -O - http://localhost:4242/api/stats
I1209 21:16:54.170033 00001 controller.go:776] Running prereq command: wget -q -O- http://localhost:61000/status/cluster | grep '1 live servers'
I1209 21:17:04.993009 00001 controller.go:785] Passed prereq [HBase Regionservers up].
I1209 21:17:04.993052 00001 controller.go:789] Passed all prereqs.
I1209 21:17:04.994181 00001 controller.go:733] Starting service process for service 2f306ir5f3vjk8hc2ufhznqug
I1209 21:17:04.994221 00001 instance.go:87] about to execute: /bin/sh , [-c exec /bin/sh -c " export CREATE_TABLES=1; /opt/opentsdb/start-opentsdb.sh localhost:2181"][2]
Starting opentsdb with ZK_QUORUM=localhost:2181
2016-12-09 21:17:06,805 CRIT Supervisor running as root (no user in config file)
2016-12-09 21:17:06,924 INFO RPC interface 'supervisor' initialized
2016-12-09 21:17:06,924 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2016-12-09 21:17:06,961 INFO supervisord started with pid 36
2016/12/09 21:17:07 200 14.354126ms POST /api/metrics/store
2016-12-09 21:17:07,972 INFO spawned: 'tsdbwatchdog' with pid 40
2016-12-09 21:17:07,974 INFO spawned: 'opentsdb' with pid 41
2016-12-09 21:17:09,529 INFO success: tsdbwatchdog entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-12-09 21:17:13,280 INFO success: opentsdb entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2016/12/09 21:17:22 200 3.048571ms POST /api/metrics/store
2016/12/09 21:17:37 200 2.499624ms POST /api/metrics/store
2016/12/09 21:17:52 200 2.583561ms POST /api/metrics/store
2016/12/09 21:18:07 200 2.374775ms POST /api/metrics/store
2016/12/09 21:18:22 200 3.289305ms POST /api/metrics/store
2016/12/09 21:18:37 200 2.749067ms POST /api/metrics/store
2016-12-09 21:18:48,544 INFO exited: opentsdb (exit status 1; not expected)
2016-12-09 21:18:49,546 INFO spawned: 'opentsdb' with pid 106



Subject: So I was able to resolve this
Author: [Not Specified]
Posted: 2016-12-09 20:50

So I was able to resolve this by using:

https://support.zenoss.com/hc/en-us/articles/204643769-How-to-Recover-Control-Center-from-Hardware-Failure

Specifically theCorruption of Zenoss HBase (/opt/serviced/var/volumes/*/hbase-master) Files section

Sorry for the waste of time! Should have looked around a little more thoroughly before posting.



< Previous
Zenoss 4.2.5 - cannot add new graph to report
  Next
Installation of zenoss 5.1.9
>