![]() |
![]() |
Subject: | Removing 400K Events |
Author: | [Not Specified] |
Posted: | 2014-11-14 14:59 |
I think I've roped myself into a corner here somehow, and I need help getting out. About a month ago, I stopped getting new Events into the Event Console. I gracefully stopped all the services & rebooted, but still no dice. I then noticed that when I look at the Event Console, it says, "Displaying entries 1-31 of 390,157". Question: have a reached a threshold where no more new events will be logged When I try to Acknowledge, or Close any one single event (versus shift-clicking a range), I always get, "ZepConnectionError". What log can I look at, to determine why I'm not getting anything into the EventConsole anymore ---I do know that SNMP queries are still hitting the clients on my network, but the events are showing up...
Subject: | There is no Zenoss limit for |
Author: | Jan Garaj |
Posted: | 2014-11-14 16:28 |
There is no Zenoss limit for events (my stat: ~1,7M events = ~12GB events_summary table). It depends only on your DB and DB seems to be problem in your case (ZepConnectionError).
1.) Try to find more info in logs:
grep -rin 'zep' /opt/zenoss/log/*.log | grep ERROR
2.) Check DBserver logs.
3.) Try direct SQL access to zen_zep DB, event_summary table with your zenoss zep credentials (grep -rin 'zep' /opt/zenoss/etc/zeneventserver.conf)
If you want to drop all events, just truncate event_summary table. I can't guarantee that action is safe so BACKUP table/DB before any actions for rollback.
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | Disk util... |
Author: | [Not Specified] |
Posted: | 2014-11-14 16:42 |
I got an error this time that said ZenEventserver could not be contacted, so I bounced it by itself. It came back just fine, and the 390,157 objects in the Event Console suddenly became 402,383. Also, looking at 'df -h' I see that the root filesystem available-space drops by 1MB every 3-4mins (and its currently at 98% util!). It looks like I have a queue that's backed up; and as it's being processed, it's running the system out of space. I guess there's a threshold that prevents absorption / processing of new events under a certain amnt of free disk space Two things: where are the favorite places under '/' that Zenoss likes to stick things, and how can I adjust the amnt of data that hits the event log It's logging lots of things that I don't care about. I inherited this Zenoss system, so I didn'nt configure it.... It needs a bit of tuning obviously. Your help is appreciated! I bet there are just a few key things that need adjusting, so this server won't keep busting its buttons..... Thanks again!
Subject: | If you don't want to store |
Author: | Jan Garaj |
Posted: | 2014-11-14 19:37 |
If you don't want to store some events, then you must drop these events -> UTFG: zenoss event transformation drop
Also you can play with your event life cycle time settings (Advanced -> Setting -> Events).
My "favorite" Zenoss action - it will remove events DB indexes from the file system - they can be huge - they will be rebuilded again:
zeneventserver stop && cd $ZENHOME/var/zeneventserver/index && rm -rf summary && rm -rf archive && zeneventserver start
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | OK, here's what I did this |
Author: | [Not Specified] |
Posted: | 2014-11-17 09:04 |
OK, here's what I did this morning... I found 10GB of index files under the /var/zeneventserver/index/* directory. I stopped Zenoss, deleted them all, then restarted Linux entirely. all 8GB of phy RAM and 2GB of swap were used up. I noticed that when rabbit-mq starts up, it sucks up 2GB of phy RAM right away. I read that this could be because of 'stale' data inside the /var/lib/rabbitmq/mnesia/ directory. I looked and there's one HUGE directory, /var/lib/rabbitmq/mnesia/rabbit@MyServername/msg_store_persistent/ that has 700 files in in, making up 17GB! What kind of cleanup, if any, can be done in this directory The server is still busy (7.00 load average) -- rebuilding those indices for the EventServer. i'll let it sit for a while then log in and see what things look like.
Subject: | More info.... both ZenEventD |
Author: | [Not Specified] |
Posted: | 2014-11-17 11:37 |
More info.... both ZenEventD and ZenActionD both complain that the amqp connection was closed unexpectedly. I saw this means rabbitmq died. Sure enough, 'service rabbitmq-server status' showed it was stopped. I started it. 'top' shows its taking 25% of the machine's available memory, and 99% of the CPU (the topic of lots of other threads) - but one thing at a time.. I managed to get a few more events to pop up from October 25th (no other events appear since then).. There are exactly 14 error messages about "localhost xxxxxxxxx heartbeat failure" where X is a value below. After this 'heartbeat failure' no other events are in the Event Console..... I've gracefully shut down zenoss & rebooted linux twice since this happened.
zencommand
zeneventd
zeneventlog
zenhub
zenjmx
zenmodeler
zenperfsnmp
zenping
zenprocess
zenstatus
zensyslog
zentrap
zenwin
zenwinperf
Subject: | You need to find root of your |
Author: | Jan Garaj |
Posted: | 2014-11-17 14:02 |
You need to find root of your problems, RabbitMQ+MySQL+huge file size are symptoms only.
0.) If you have a lot of events, then you MySQL can be overloaded/overfilled.
1.) If you have some problem with MySQL, then zeneventd is not able to "move" events from RabbitMQ to MySQL.
2.) If messages are not consumed from RabbitMQ then folder /var/lib/rabbitmq/ can be huge.
3.) If don't have more than 10% free space on the partition, where is /var/lib/rabbitmq/, then RabbitMQ will drop connections.
4.) If RabbitMQ doesn't work, then Zenoss daemons are not able to communicate -> and you have huge problems
I recommend you to ask directly on IRC channel #zenoss.
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | I solved the disk-full issue |
Author: | [Not Specified] |
Posted: | 2014-11-17 16:44 |
I solved the disk-full issue by re-building the zenoss indices. I now have 40% free space. however, when Rabbit is running,and all the Daemons start, all 8GB physical RAM is taken up, and the system will also take most of its 2GB swap file. Then, I guess logically, some Daemons will stop themselves because of low-mem conditions, then bits and pieces of Zenoss won't work. Also remember that /var/lib/rabbitmq/mnesia is about 17GB in size -- that's where my Events are stuck In the morning, I'm going to try a graceful shutdown / reboot, because i've been restarting daemons during the day today, and I bet I have a problem with the order in which I've started / stopped them. Disk-space is solved, but I'll have to bump-up the RAM on this virual-machine from 8 to 12 so I can get this beast running again. NOTE: This setup 'used to' work just fine & send ou8t email alerts, etc etc.. then, I think, the disk ran out of space, which caused other cascading problems. I'll work on bumping up to 12GB tommorow too.
Subject: | /var/lib/rabbitmq/mnesia is |
Author: | Jan Garaj |
Posted: | 2014-11-17 17:30 |
/var/lib/rabbitmq/mnesia is about 17GB => you a have a huge amount of messages in your RabbitMQ (1-2M) - that's probably also reason of memory lack. These messages are waiting to be processed - probably most of them are in rawevents queue and they will be processed in your real DB events (but there is also deduplication).
To be honest, you should to ask someone to solve your problem on site, not offline. Some who knows your setup, zenoss version, monitoring requirements, ....
From my point of view the best solution is to clean/purge rawevents/all RabbitMQ queues (but you will lost data/events) and then start all zenoss daemons one by one (zenhub,zenventd,....) with daemon log checks + monitoring of your Zenoss server(s).
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | It does seem that I need to |
Author: | [Not Specified] |
Posted: | 2014-11-18 07:44 |
It does seem that I need to do a grand reset of some sort; I've got multiple problems.. Is that IRC channel available on this website, or do I have to download an IRC chat client to get to #ZENOSS I think I tried using IRC about 17 years ago :-)
Subject: | IRC is still there, freenode |
Author: | Andrew Kirch |
Posted: | 2014-11-18 09:31 |
IRC is still there, freenode has a webchat http://webchat.freenode.net/, look forward to seeing you!
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
< |
Previous How to enable Zenoss to receive snmp v3 traps |
Next AutoHealing ? |
> |