TECHZEN Zenoss User Community ARCHIVE  

control center services failing

Subject: control center services failing
Author: [Not Specified]
Posted: 2015-05-12 14:53

I believe my control center is timing out.

when i start the zenoss.core serivces the services opentsdb, centralquery, hmaster, and regionserver are failing to startup. when i check their logs i see that they are all displaying 127.0.0.1 ack/ i/o timeout. this only starts happening after I've modeled a few devices. up until i start adding devices they seem to have no problem. then i reboot and they never come back....

does nayone else see this happening or know why it may be a problem

Thanks,
Beer



Subject: which version of Zenoss 5 are
Author: Andrew Kirch
Posted: 2015-05-15 10:25

which version of Zenoss 5 are you running Please update to 5.0.2 (instructions are in the install guide) and let me know if this persists. Thanks!

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: Version info below. sadly, i
Author: [Not Specified]
Posted: 2015-05-17 23:11

Version info below. sadly, i'm already on version 5.0.2 Appreciate any help though. at this time it appears the graph data is not showing as i assume it must rely on the opentsdb opentsdb log states its waiting on region server, and region server is getting the ack timeout as referenced in my original post.

Zenoss Zenoss 5.0.2
OS Linux (x86_64) 3.16.0 (Linux 797b881ba064 3.16.0-37-generic #51~14.04.1-Ubuntu SMP Wed May 6 15:23:14 UTC 2015 x86_64)
Zope Zope 2.13.13
Python Python 2.7.5
Database MySQL 5.5.40 (5.5.40-MariaDB)
Twisted Twisted 13.2.0
RabbitMQ RabbitMQ 3.3.5
Erlang Erlang 5.10.4
NetSnmp NetSnmp 5.5.2
PyNetSnmp PyNetSnmp 0.30.10



Subject: As you're on the latest
Author: Andrew Kirch
Posted: 2015-05-20 02:09

As you're on the latest version, file a bug.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: bug has been filed.
Author: [Not Specified]
Posted: 2015-05-20 09:01

bug has been filed.



Subject: I had this problem over and
Author: [Not Specified]
Posted: 2015-05-20 16:33

I had this problem over and over again. It ultimately led me back to 4.2.5. I never determined the cause of the problem, but it appeared that there was some sort of conflict - perhaps multiple instances of regionserver or something else bound to it's port.

I did manage to temporarily clear up these issues by stopping Zenoss.core and serviced, then running the following :
src: https://github.com/control-center/serviced/wiki/Control-Center-Tips-and-...

docker ps -q | xargs docker stop
docker ps -qa | xargs docker rm -fv
docker images |awk '/:5000/{print $1 ":" $2}' | xargs docker rmi

But after a few days the I'd log in to find the same problem again.



Subject: Sadly i tried these commands.
Author: [Not Specified]
Posted: 2015-05-26 15:46

Sadly i tried these commands. i was unable to nativly run either of them. i'm guessing they are meant for a RHEL server, not an ubuntu server. still experiencing the issue after days and multiple reboots as well.



Subject: those should work in Ubuntu
Author: Andrew Kirch
Posted: 2015-05-28 10:04

those should work in Ubuntu as well. 5.0.3 is now out, please upgrade to that, and file a bug so that we can figure out what's crashing. Include hardware/OS info in the bug.

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: is the update process from 5
Author: [Not Specified]
Posted: 2015-05-28 13:13

is the update process from 5.0.2 to 5.0.3 documented somewhere

I've got a bug open complete with version information, but lets see if this update resolves the problem.



Subject: Found the update
Author: [Not Specified]
Posted: 2015-05-29 07:48

Found the update documentation which ultimately failed. i'm so shallow in my deployment right now i had no issues with blowing away the 5.0.2 template then reloading with the 5.0.3 template. problem still exists though. OpenTSDB fails to start as it's waiting on regionserver which gets the following i/o timeout as before....

2015/05/29 12:35:00 200 66.770429ms POST /api/metrics/store
2015/05/29 12:35:02.776327 Read error looking for ack: read tcp 127.0.0.1:5043: i/o timeout
2015/05/29 12:35:02.776412 Loading client ssl certificate: /usr/local/serviced/resources/logstash/logstash-forwarder.crt and /usr/local/serviced/resources/logstash/logstash-forwarder.key
2015/05/29 12:35:02.920813 Setting trusted CA from file: /usr/local/serviced/resources/logstash/logstash-forwarder.crt
2015/05/29 12:35:02.921040 Connecting to 127.0.0.1:5043 (127.0.0.1)
2015/05/29 12:35:02.980291 Connected to 127.0.0.1
W0529 12:35:06.688912 00001 controller.go:777] Health check answering failed.
W0529 12:35:07.300602 00001 controller.go:777] Health check cluster_member failed.
2015/05/29 12:35:15 200 60.513729ms POST /api/metrics/store
W0529 12:35:17.401173 00001 controller.go:777] Health check cluster_member failed.

HMaster and CentralQUery services are also still failed.



Subject: what hardware are you running
Author: Andrew Kirch
Posted: 2015-05-29 10:40

what hardware are you running on

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: Ubuntu 14.04 x64 server via
Author: [Not Specified]
Posted: 2015-05-29 15:00

Ubuntu 14.04 x64 server via ESXi 5.5.



Subject: how many cores/how much ram?
Author: Andrew Kirch
Posted: 2015-05-29 16:03

how many cores/how much ram

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: I am having this same issue
Author: Benjamin Dronen
Posted: 2015-06-02 09:50

I am having this same issue on an Ubuntu 14.04 server running Zenoss 5.0.3 w/ 4 cores and 19gb of ram.

The hbase-regionserver.log within the RegionServer container shows this error, even though ZooKeeper is shown as running inside of Control Center:

2015-06-02 14:36:06,045 WARN [regionserver60200] zookeeper.RecoverableZooKeeper: Unable to create ZooKeeper Connection
java.net.UnknownHostException: zk1
at java.net.InetAddress.getAllByName0(InetAddress.java:1250)
at java.net.InetAddress.getAllByName(InetAddress.java:1162)
at java.net.InetAddress.getAllByName(InetAddress.java:1098)
at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:61)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:140)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:756)
at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:729)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:865)
at java.lang.Thread.run(Thread.java:745)
2015-06-02 14:36:06,045 WARN [regionserver60200] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=zk1:2181, exception=org.apache.zookeeper.KeeperException$OperationTimeoutException: KeeperErrorCode = OperationTimeout



Subject: Warheads: file a bug
Author: Andrew Kirch
Posted: 2015-06-02 09:51

Warheads: file a bug

Andrew Kirch

akirch@gvit.com

Need Zenoss support, consulting or custom development Look no further. Email or PM me!

Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard



Subject: Trelane, i've filed two bugs
Author: [Not Specified]
Posted: 2015-06-02 22:35

Trelane, i've filed two bugs as of now regarding this problem. the first under the control center section and a second under zenoss itself. i've provided logs, but from what i can see no one has any ideas why this is failing. I tried an alternate route however. i abandoned my Ubuntu installation in exchange for Centos. I finished the install monday morning, scanned my network ranges, and as yet have not been able to reproduce this problem. perhaps it's OS related. Perhaps i got lucky.

I ran the same battery of tests as on my ubuntu server to attempt to recreate the issue.

At this time in centos i've tested the following.

1) clean power down/restart (shtudown zenoss services, thens erviced, then docker) = successfully came back up automatically
2) Dirty power down/restart (issued reboot command without stopping services) = successfully came back up
3) Server crash (using VM controls shotdown the server, i.e. no reboot,shutdown but a simulation of lost power) = came back up but required manually restart of zenoss services.

in all of the above scenarios all services appeared to start normally. interestingly enough i see the same i/o timeout messages in the logs but the system keeps running. I cannot take my testing much further as i'm not a linux admin, and know only enough to make people uncomfortable about the OS. using the minimal installation and zenoss installation guide i had no problems setting this up. this was not the case in 4.2. If not for anything else this may help or inspire others who are having the same issue as I am to step over to the centos world. If anyone is interested i'll be doing a step by step documentation of my own installation procedure (mines a bit different because i had to change the docker IP address), and would be happy to post my steps. I plan to allow the system to run through the weekend, and if everything still appears to be running correctly I'll rebuild my server for production use, and write up my documentation.



Subject: I had the same issue with
Author: [Not Specified]
Posted: 2015-06-13 11:32

I had the same issue with Zenoss 5.0.3 on Ubuntu (even with a fresh installation). Multiple attempts to install failed due to OpenTSDB/RegionServer/Zookeeper issues on Ubuntu. In the end I set up Zenoss 5 on a CentOS 7 box.



Subject: I'm having this issue with
Author: [Not Specified]
Posted: 2015-07-02 08:26

I'm having this issue with Zenoss 5.0.2 and 5.0.2 on a fresh installation of Ubuntu 14.04. I have done multiple installs and all result in failed services.
The services that fail are the following:

CentralQuery: answering
Hbase
HMaster: cluster_healthy, rest_answering
RegionServer: answering, cluster_member
opentsdb: answering

I was unable to find any bug reports so I created a new one number ZEN-18583 ( https://jira.zenoss.com/browse/ZEN-18583 )
Please vote for the bug if you have the same issue.



Subject: https://jira.zenoss.com
Author: [Not Specified]
Posted: 2015-07-02 10:28

https://jira.zenoss.com/browse/CC-975filter=-2

as posted above, it appears to only be an issue with Ubuntu. me and at least one other have converted to Centos7 which resolved the problem. i tried rebuilding multiple times as well on ubuntu. The sad thing is there is no longer any type of support or community for Zenoss like there was in the past. The link above is for my bug, it was looked at initially but now sits untouched for quite some time. I assume it never will be.



< Previous
Zenoss metric names in opentsdb
  Next
deploying other applications in ControlCenter
>