TECHZEN Zenoss User Community ARCHIVE  

How to delete old metrics

Subject: How to delete old metrics
Author: Martin
Posted: 2020-03-04 03:25

Hi All,

unfortunately my Zenoss ran out of disk space and is now in emergency shutdown. I already searched for a wa to delete old metrics to free up disk space but unfortunatel couldn't find something helpful. Can someone please let me know how to free up disk space?

Thanks, Martin (whose network is currently not monitored)

------------------------------
Martin
------------------------------


Subject: RE: How to delete old metrics
Author: Arthur
Posted: 2020-03-04 16:45

Hi Martin

1) Before you start cleaning up any data make sure you have a valid backup just in case.

2) Be  really sure which disk (volume) runned out of space.
   # serviced volume status
   # docker info |grep Data
   # vgs, lvs

3) Try to get some space back by running serviced-fstrim

4) Delete hbase data. !!This will delete all performance data!!
  
https://support.zenoss.com/hc/en-us/articles/204643769-How-to-Recover-Control-Center-from-Hardware-Failure

Corruption of Zenoss HBase (/opt/serviced/var/volumes/*/hbase-master) Files


Part: To remove existing data to enable HBase to start, perform the following:

Replace Zenoss.resmgr with Zenoss.core

Regards


   


​​​

------------------------------
Arthur
------------------------------

Subject: RE: How to delete old metrics
Author: Ryan Matte
Posted: 2020-03-05 17:13

You can't selectively delete metrics, however you can change the retention period for the metrics.  To do this though you would need to have the hmaster service in Resource Manager up and running, which means you'd need to get past the emergency shutdown.  Generally you need to add more storage space to the server to do that since there aren't really any areas in the serviced volume where you can easily recover space from.  Once you have the hmaster service up and running you can adjust the retention settings by doing the following...

serviced service attach hmaster
su - hbase # you'll see a warning message after doing this, ignore it
cd /opt/hbase/bin
./hbase shell
list # to display the table names, in the next step you'll want to use the table that's <instance id>-tsdb, i.e: 9o8ar2pwn2duch7b90nfs1pxe-tsdb
describe '9o8ar2pwn2duch7b90nfs1pxe-tsdb'

You'll see output from this, the TTL value is what you're interested in, it'll look like: TTL => '7776000 SECONDS (90 DAYS)'

Then you set a new TTL by doing...

​​​alter '9o8ar2pwn2duch7b90nfs1pxe-tsdb', {NAME=>'t', TTL=>'864000'}

Where 864000 is the new TTL value that you'd like to set (in seconds).

Keep in mind that the TTL only controls the length of time that the data is kept, it's not a hard limit on data size, so if you have limited space you'll probably want to set that fairly conservatively.

I would recommend restarting the hmaster service after the above has been done then repeat the steps to describe that table again and make sure that the new TTL value is shown there.

Another thing to consider in terms of filesystem space is your event aging and retention settings.  You can see these in the Resource Manager UI under Advanced -> Settings -> Events.  You'll probably want to set Don't Age This Severity and Above to "Age All Events" and then set the event aging and archive thresholds to reasonable values as well as the Delete Archived Events Older Than (days) value.

If you're absolutely stuck and can't add any more space then you can choose to blow away all of the hbase data or all of the events data to get that space back.  That involves removing the data directories for either of those from the serviced volume and deleting the hidden .initialized file for that service in the serviced volume as well (to tell it to re-create that directory when the service starts).  I would consider that an absolute last resort though, and only if you're willing to lose that data.  Arthur already linked to a document describing how to do that for the hbase data.

Hope that helps.



------------------------------
Ryan Matte
------------------------------


Subject: RE: How to delete old metrics
Author: Arthur
Posted: 2020-03-06 12:32

Another area to look for are snapshots. They also take some space!

# serviced snapshot list



------------------------------
Arthur
------------------------------

You can't selectively delete metrics, however you can change the retention period for the metrics.  To do this though you would need to have the hmaster service in Resource Manager up and running, which means you'd need to get past the emergency shutdown.  Generally you need to add more storage space to the server to do that since there aren't really any areas in the serviced volume where you can easily recover space from.  Once you have the hmaster service up and running you can adjust the retention settings by doing the following...

serviced service attach hmaster
su - hbase # you'll see a warning message after doing this, ignore it
cd /opt/hbase/bin
./hbase shell
list # to display the table names, in the next step you'll want to use the table that's <instance id>-tsdb, i.e: 9o8ar2pwn2duch7b90nfs1pxe-tsdb
describe '9o8ar2pwn2duch7b90nfs1pxe-tsdb'

You'll see output from this, the TTL value is what you're interested in, it'll look like: TTL => '7776000 SECONDS (90 DAYS)'

Then you set a new TTL by doing...

​​​alter '9o8ar2pwn2duch7b90nfs1pxe-tsdb', {NAME=>'t', TTL=>'864000'}

Where 864000 is the new TTL value that you'd like to set (in seconds).

Keep in mind that the TTL only controls the length of time that the data is kept, it's not a hard limit on data size, so if you have limited space you'll probably want to set that fairly conservatively.

I would recommend restarting the hmaster service after the above has been done then repeat the steps to describe that table again and make sure that the new TTL value is shown there.

Another thing to consider in terms of filesystem space is your event aging and retention settings.  You can see these in the Resource Manager UI under Advanced -> Settings -> Events.  You'll probably want to set Don't Age This Severity and Above to "Age All Events" and then set the event aging and archive thresholds to reasonable values as well as the Delete Archived Events Older Than (days) value.

If you're absolutely stuck and can't add any more space then you can choose to blow away all of the hbase data or all of the events data to get that space back.  That involves removing the data directories for either of those from the serviced volume and deleting the hidden .initialized file for that service in the serviced volume as well (to tell it to re-create that directory when the service starts).  I would consider that an absolute last resort though, and only if you're willing to lose that data.  Arthur already linked to a document describing how to do that for the hbase data.

Hope that helps.



------------------------------
Ryan Matte
------------------------------


< Previous
HTTP monitoring
  Next
LDAPS authentication with Enterprise CA
>