Zenoss ZenTech Community

Zenoss 5.1.1 upgrade issues

Subject:

Zenoss 5.1.1 upgrade issues

Author:

[Not Specified]

Posted:

2016-03-10 13:06

I'm having issues upgrading from Zenoss core 5.0.10 to 5.1.1. I am running on CentOS 7 with 32GB of memory and 4vCPUs. The system has been running and monitoring ~250-300 devices for 9-10 months. I have two issues: one, metricshipper will not start; two, I cannot login to Zenoss after the upgrade.

I followed the upgrade guide. I was able to change both btrfs partitions to devicemapper and upgrade control center from 1.0.10 to 1.1.2. Zenoss was running just fine during each "phase" of the upgrade process until I upgraded Zenoss core.

I've attempted three upgrades this week and all have failed. The second and third upgrades show the same issues, so I am hoping someone can shed some light on the issues. Both metricshipper processes fail health checks with this error code: "W0310 18:47:43.354601 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1)". I don't see any recommendations in the upgrade guide about how to proceed. I can't find a place to upload a log file, so here it is:

I0310 18:47:22.170208 00001 vif.go:58] vif subnet is: 10.3 I0310 18:47:22.170328 00001 lbClient.go:77] ControlPlaneAgent.GetServiceInstance() I0310 18:47:22.272618 00001 controller.go:296] Allow container to container connections: true I0310 18:47:22.297938 00001 controller.go:229] Wrote config file /opt/zenoss/etc/metricshipper/metricshipper.yaml I0310 18:47:22.306807 00001 controller.go:200] Successfully ran command:'&{/usr/bin/chown [chown zenoss:zenoss /opt/zenoss/etc/metricshipper/metricshipper.yaml] [] [] 0xc2080e5c80 exit status 0 true [0xc208036048 0xc2080360b8 0xc2080360b8] [0xc208036048 0xc2080360b8] [0xc2080360a0] [0x53e840] 0xc2080425a0}' output: I0310 18:47:22.317484 00001 controller.go:200] Successfully ran command:'&{/usr/bin/chmod [chmod 0644 /opt/zenoss/etc/metricshipper/metricshipper.yaml] [] [] 0xc2080e5de0 exit status 0 true [0xc2080360e0 0xc208036100 0xc208036100] [0xc2080360e0 0xc208036100] [0xc2080360f8] [0x53e840] 0xc2080426c0}' output: I0310 18:47:22.323467 00001 logstash.go:55] Using logstash resourcePath: /usr/local/serviced/resources/logstash I0310 18:47:22.324824 00001 controller.go:229] Wrote config file /etc/logstash-forwarder.conf I0310 18:47:22.324984 00001 controller.go:385] pushing network stats to: http://localhost:22350/api/metrics/store I0310 18:47:22.325101 00001 instance.go:87] about to execute: /usr/local/serviced/resources/logstash/logstash-forwarder , [-idle-flush-time=5s -old-files-hours=26280 -config /etc/logstash-forwarder.conf][4] I0310 18:47:22.326858 00001 endpoint.go:131] c.zkInfo: {ZkDSN:{"Servers":["10.209.8.12:2181"],"Timeout":15000000000} PoolID:default} 2016/03/10 18:47:22 publisher init 2016/03/10 18:47:22 { "network": { "servers": [ "127.0.0.1:5043" ], "ssl certificate": "/usr/local/serviced/resources/logstash/logstash-forwarder.crt", "ssl key": "/usr/local/serviced/resources/logstash/logstash-forwarder.key", "ssl ca": "/usr/local/serviced/resources/logstash/logstash-forwarder.crt", "timeout": 15 }, "files": [ { "paths": [ "/opt/zenoss/log/metricshipper.log" ], "fields": {"instance":"0","service":"7fhb0p5uqjof9stbvitno6f8k","type":"metricshipper"} } ] } 2016/03/10 18:47:22.331233 Launching harvester on new file: /opt/zenoss/log/metricshipper.log 2016/03/10 18:47:22.331293 Loading client ssl certificate: /usr/local/serviced/resources/logstash/logstash-forwarder.crt and /usr/local/serviced/resources/logstash/logstash-forwarder.key 2016/03/10 18:47:22.334636 Starting harvester: /opt/zenoss/log/metricshipper.log 2016/03/10 18:47:22.334668 Current file offset: 2061 I0310 18:47:22.337790 00001 endpoint.go:172] getting service state: 7fhb0p5uqjof9stbvitno6f8k 0 I0310 18:47:22.340636 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_redis]: {endpointID:redis instanceID:0 virtualAddress: purpose:import port:6379} I0310 18:47:22.340684 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_zproxy]: {endpointID:zproxy instanceID:0 virtualAddress: purpose:import port:8080} I0310 18:47:22.340704 00001 controller.go:398] command: [su - zenoss -c "cd /opt/zenoss && /bin/supervisord -n -c etc/metricshipper/supervisord.conf"] [1] I0310 18:47:22.392720 00001 controller.go:913] Got service endpoints for 7fhb0p5uqjof9stbvitno6f8k: map[tcp:5043:[{ServiceID:controlplane_logstash_lumberjack InstanceID:0 Application:controlplane_logstash_lumberjack Purpose: HostID: HostIP:10.209.8.12 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043}] tcp:8444:[{ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.209.8.12 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444}] tcp:443:[{ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.209.8.12 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:443}] tcp:5042:[{ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.209.8.12 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042}]] I0310 18:47:22.392942 00001 controller.go:925] changing key from tcp:443 to chzv9aasqaqwbifwju30sjm8c_controlplane: {ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.209.8.12 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:443} I0310 18:47:22.393016 00001 controller.go:925] changing key from tcp:5042 to chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp: {ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.209.8.12 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042} I0310 18:47:22.393047 00001 controller.go:925] changing key from tcp:5043 to chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack: {ServiceID:controlplane_logstash_lumberjack InstanceID:0 Application:controlplane_logstash_lumberjack Purpose: HostID: HostIP:10.209.8.12 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043} I0310 18:47:22.393082 00001 controller.go:925] changing key from tcp:8444 to chzv9aasqaqwbifwju30sjm8c_controlplane_consumer: {ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.209.8.12 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444} I0310 18:47:22.393140 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane -> {ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.209.8.12 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:443} I0310 18:47:22.393236 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane -> proxy[{controlplane 0 controlplane 10.209.8.12 443 127.0.0.1 443 tcp 443}; &{%!s(*net.netFD=&{{0 0 0} 10 2 1 false tcp4 0xc20818a3f0 {140633420050960}})}]=>[] I0310 18:47:22.393461 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane]: {endpointID:controlplane instanceID:0 virtualAddress: purpose:import port:443} I0310 18:47:22.393501 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp -> {ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.209.8.12 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042} I0310 18:47:22.393578 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp -> proxy[{controlplane_logstash_tcp 0 controlplane_logstash_tcp 10.209.8.12 5042 127.0.0.1 5042 tcp 5042}; &{%!s(*net.netFD=&{{0 0 0} 11 2 1 false tcp4 0xc20818a870 {140633420050768}})}]=>[] I0310 18:47:22.393706 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp]: {endpointID:controlplane_logstash_tcp instanceID:0 virtualAddress: purpose:import port:5042} I0310 18:47:22.393745 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack -> {ServiceID:controlplane_logstash_lumberjack InstanceID:0 Application:controlplane_logstash_lumberjack Purpose: HostID: HostIP:10.209.8.12 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043} I0310 18:47:22.393853 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack -> proxy[{controlplane_logstash_lumberjack 0 controlplane_logstash_lumberjack 10.209.8.12 5043 127.0.0.1 5043 tcp 5043}; &{%!s(*net.netFD=&{{0 0 0} 12 2 1 false tcp4 0xc20818ac30 {140633420050576}})}]=>[] I0310 18:47:22.394652 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack]: {endpointID:controlplane_logstash_lumberjack instanceID:0 virtualAddress: purpose:import port:5043} I0310 18:47:22.394701 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane_consumer -> {ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.209.8.12 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444} I0310 18:47:22.394791 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane_consumer -> proxy[{controlplane_consumer 0 controlplane_consumer 10.209.8.12 8443 127.0.0.1 8443 tcp 8444}; &{%!s(*net.netFD=&{{0 0 0} 13 2 1 false tcp4 0xc20818aff0 {140633420050384}})}]=>[] I0310 18:47:22.394924 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane_consumer]: {endpointID:controlplane_consumer instanceID:0 virtualAddress: purpose:import port:8443} I0310 18:47:22.395011 00001 controller.go:722] No prereqs to pass. I0310 18:47:22.416856 00001 endpoint.go:376] Starting watch for tenantEndpointKey chzv9aasqaqwbifwju30sjm8c_zproxy: I0310 18:47:22.416930 00001 endpoint.go:376] Starting watch for tenantEndpointKey chzv9aasqaqwbifwju30sjm8c_redis: I0310 18:47:22.425347 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_zproxy -> {ServiceID:chzv9aasqaqwbifwju30sjm8c InstanceID:0 Application:zproxy Purpose:export HostID:d10a0c08 HostIP:10.209.8.12 HostPort:33757 ContainerID:20df2a99b04b2b283984479d6b6ad55db8d1243abd8802164f4ddbf4c7d87127 ContainerIP:172.17.0.35 ContainerPort:8080 Protocol:tcp VirtualAddress: ProxyPort:8080} I0310 18:47:22.425478 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_zproxy -> proxy[{chzv9aasqaqwbifwju30sjm8c 0 zproxy export d10a0c08 10.209.8.12 33757 20df2a99b04b2b283984479d6b6ad55db8d1243abd8802164f4ddbf4c7d87127 172.17.0.35 8080 tcp 8080}; &{%!s(*net.netFD=&{{0 0 0} 14 2 1 false tcp4 0xc208a85890 {140633420050192}})}]=>[] I0310 18:47:22.425705 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_redis -> {ServiceID:2s87lenozczborbz01kbs9fke InstanceID:0 Application:redis Purpose:export HostID:d10a0c08 HostIP:10.209.8.12 HostPort:33751 ContainerID:ad6d2631e2399a12afafea1ace676734056c286835f9647fa7733eeda5396f4a ContainerIP:172.17.0.31 ContainerPort:6379 Protocol:tcp VirtualAddress: ProxyPort:6379} I0310 18:47:22.425801 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_redis -> proxy[{2s87lenozczborbz01kbs9fke 0 redis export d10a0c08 10.209.8.12 33751 ad6d2631e2399a12afafea1ace676734056c286835f9647fa7733eeda5396f4a 172.17.0.31 6379 tcp 6379}; &{%!s(*net.netFD=&{{0 0 0} 15 2 1 false tcp4 0xc208a85bf0 {140633420050000}})}]=>[] I0310 18:47:22.475228 00001 controller.go:777] Kicking off health check redis_answering. I0310 18:47:22.475288 00001 controller.go:778] Setting up health check: /opt/zenoss/bin/healthchecks/redis_answering I0310 18:47:22.475304 00001 controller.go:777] Kicking off health check running. I0310 18:47:22.475315 00001 controller.go:778] Setting up health check: pgrep -u zenoss metricshipper I0310 18:47:22.475326 00001 controller.go:777] Kicking off health check store_answering. I0310 18:47:22.475336 00001 controller.go:778] Setting up health check: /opt/zenoss/bin/healthchecks/MetricShipper/store_answering I0310 18:47:22.480751 00001 controller.go:664] Starting service process for service 7fhb0p5uqjof9stbvitno6f8k I0310 18:47:22.480822 00001 instance.go:87] about to execute: /bin/sh , [-c exec su - zenoss -c "cd /opt/zenoss && /bin/supervisord -n -c etc/metricshipper/supervisord.conf"][2] 2016/03/10 18:47:22.694006 Setting trusted CA from file: /usr/local/serviced/resources/logstash/logstash-forwarder.crt 2016/03/10 18:47:22.694497 Connecting to 127.0.0.1:5043 (127.0.0.1) 2016/03/10 18:47:22.789613 Connected to 127.0.0.1 Trying to connect to logstash server... 127.0.0.1:5042 Connected to logstash server. 2016-03-10 18:47:24,766 WARN Included extra file "/opt/zenoss/etc/metricshipper/metricshipper_supervisor.conf" during parsing 2016-03-10 18:47:24,919 INFO RPC interface 'supervisor' initialized 2016-03-10 18:47:24,919 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2016-03-10 18:47:24,920 INFO supervisord started with pid 37 2016-03-10 18:47:25,924 INFO spawned: 'metricshipper' with pid 40 2016/03/10 18:47:27.345029 Registrar received 3 events 2016-03-10 18:47:31,284 INFO success: metricshipper entered RUNNING state, process has stayed up for > than 5 seconds (startsecs) W0310 18:47:33.154986 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1) 2016/03/10 18:47:34.836647 Registrar received 7 events 2016/03/10 18:47:37 200 6.28856ms POST /api/metrics/store 2016/03/10 18:47:42.340250 Registrar received 7 events W0310 18:47:43.354601 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1) 2016/03/10 18:47:49.835728 Registrar received 8 events 2016/03/10 18:47:52 200 3.532288ms POST /api/metrics/store W0310 18:47:53.450854 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1) 2016/03/10 18:47:54.835128 Registrar received 4 events 2016/03/10 18:47:59.839456 Registrar received 5 events

The other issue is I can't login to Zenoss, not even with the two local accounts I created when I built the server. Both of these accounts worked before upgrading Zenoss core to 5.1.1.

Any help would be appreciated.

Subject:	Another day, another upgrade,
Author:	[Not Specified]
Posted:	2016-03-11 09:49

Another day, another upgrade, same failures.

Subject:	http://jira.zenoss.com.
Author:	Andrew Kirch
Posted:	2016-03-11 10:58

http://jira.zenoss.com. Submit a P1 S1 bug.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard

Subject:	I have the exact same problem
Author:	[Not Specified]
Posted:	2016-03-30 10:19

I have the exact same problem!!! Upgrade to fix one bug only to encounter another!

Subject:	I tried to file a bug, but I
Author:	[Not Specified]
Posted:	2016-03-30 12:56

I tried to file a bug, but I didn't have the option, so I filed a defect on 3/11. On 3/15 it was moved to "Backlog" and I've heard nothing since. I've deployed another server, fresh with 5.1.1 and I did not have this issue. I'm now using 5.1.1, but I lost months worth of data along the way.

Subject:	I have around 1300 devices in
Author:	[Not Specified]
Posted:	2016-03-30 12:59

I have around 1300 devices in my instance, no way am i rebuilding all that. I will wait onthe fix.

Subject:	what's the bug number?
Author:	Andrew Kirch
Posted:	2016-03-30 13:28

what's the bug number "backlog" means it's in the line for a developer to fix it.

Subject:	submit the bugs and lets get
Author:	Andrew Kirch
Posted:	2016-03-30 14:21

submit the bugs and lets get them fixed! We can't fix what we don't know about. Post the bugs back to the thread so that people having this problem can track the solutions!

Subject:	In the same boat
Author:	[Not Specified]
Posted:	2016-03-30 16:36

I also have just upgraded from 5.0.9 to 5.1.1following:

https://www.zenoss.com/sites/default/files/documentation/Zenoss_Core_Upgrade_Guide_r5.1.1_d1091.16.068.pdf

I also have metricshpper and login problems.

I did test that login worked after the docker upgrade and failed after the zenoss upgrade script. It is a local account. I have four installations so I can test the fix, although I to can not afford to rebuild.

What is the bug number

Thanks,

Brian

Subject:	I submitted this one. I didn
Author:	[Not Specified]
Posted:	2016-03-30 18:36

I submitted this one. I didn't see anything else in the backlog that relates.

https://jira.zenoss.com/browse/ZEN-22758

Subject:	Mine is https://jira.zenoss
Author:	[Not Specified]
Posted:	2016-03-31 07:41

Mine ishttps://jira.zenoss.com/browse/ZEN-22446

I had both my new system and old system running for two weeks after submitting the bug. Unfortunately, I had to free up resources and removed this old system this past Monday.

Subject:	Just wanted to say that I
Author:	Chad Cottrill
Posted:	2016-03-31 22:55

Just wanted to say that I encountered the same issue as Bschimm but I would like to add that it also lists 5.0.7. as the Zenoss version under Application Templates. Hopefully that helps.

Subject:	Hey all!
Author:	[Not Specified]
Posted:	2016-04-21 11:46

Hey all!

You all aren't alone. Even in a canned test I've had great difficulty getting the upgrade to complete successfully. I've had much better luck pulling a copy of the mariadb and opentsdb and swapping it into a freshly built 5.1 system.

If anyone is interested in hearing how to do this let me know!

Take care!

ZenMaster Shane William Scott (Hackman238)

CTO

GoVanguard Inc.

sscott@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard