![]() |
![]() |
Subject: | SMART monitoring for HDDs and SSDs |
Author: | [Not Specified] |
Posted: | 2015-01-07 14:47 |
Hello,
I am trying to configure zenoss to graph data received from the smartctl -A command. So far I have found a way to parse all of the output into a single line instead of a list.
from this:
smartctl -A /dev/sdb
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1767
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 43068228240
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 084 082 000 Old_age Always - 16 (Min/Max 14/20)
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 16
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 2342
226 Load-in_Time 0x0032 100 100 000 Old_age Always - 0
227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 0
228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 1809
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 2342
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 17872
to this:
smartctl -A /dev/sdb | sed -e 's/ //g'
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-34-pve] (local build)Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net=== START OF READ SMART DATA SECTION ===SMART Attributes Data Structure revision number: 1Vendor Specific SMART Attributes with Thresholds:ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1767 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 7170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 43068359312183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0190 Airflow_Temperature_Cel 0x0022 084 082 000 Old_age Always - 16 (Min/Max 14/20)192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 16197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 2342226 Load-in_Time 0x0032 100 100 000 Old_age Always - 0227 Torq-amp_Count 0x0032 100 100 000 Old_age Always - 0228 Power-off_Retract_Count 0x0032 100 100 000 Old_age Always - 1811232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 2342242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 17872
however when i execute this with in a shellscript on zenoss I receive the following error
/etc/smartmon
Exception while performing command for server1
Type:
Has anyone had any luck doing this already or is there a zenpack for something like this
Subject: | IMHO Command is executed with |
Author: | Jan Garaj |
Posted: | 2015-01-07 17:11 |
IMHO Command is executed with zenoss user permissions, but they are not sufficient for smartctl command. The similar problem has RabbitMQ zenpack, it needs to run rabbitmqctl command with root permissions.
Mimic http://wiki.zenoss.org/ZenPack:RabbitMQ sudo solution for your zCommandUsername user - see section "Using a Non-Root User " and your problem should be solved.
Devops Monitoring Expert advice:
Dockerize/automate/monitor all the things.
DevOps stack:
Docker / Kubernetes / Mesos / Zabbix / Zenoss / Grafana / Puppet / Ansible / Vagrant / Terraform /
Elasticsearch
Subject: | You are also going to have to |
Author: | Andrew Kirch |
Posted: | 2015-01-08 10:23 |
You are also going to have to convert the output into something zenoss will understand, eg:
OK|something1=valaue somethingelse=value and so on. I've got an upcoming newsletter article on command datasources, and I will post a real world output example for you:
OK|temp=70 humid=40 fan1=3000 fan2=3000 fan3=3000 fan4=1289
I would also recommend AGAINST monitoring every parameter in SMART, instead identify relevant parameters which warn of failure instead of everything. (everything tends to chew up resources)
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | output |
Author: | [Not Specified] |
Posted: | 2015-01-12 14:12 |
Ok been working on formating the output into something zenoss can read and have come up with the following bash file:
#!/bin/bash
smartctl -A /dev/sdb | awk -F ' ' '{print $2,"=",$4}' | sed -e 's/ //g' > /etc/smartcmd-output.txt
sed '1,7d' /etc/smartcmd-output.txt | tr -d '\n'
rm /etc/smartcmd-output.txt
which gives me this output:
Reallocated_Sector_Ct=100Power_On_Hours=100Power_Cycle_Count=100Unknown_Attribute=100Unknown_Attribute=100Unknown_Attribute=100Unknown_Attribute=100Program_Fail_Count_Chip=100Runtime_Bad_Block=100End-to-End_Error=100Reported_Uncorrect=100Airflow_Temperature_Cel=083Power-Off_Retract_Count=100Temperature_Celsius=100Current_Pending_Sector=100UDMA_CRC_Error_Count=100Load_Cycle_Count=100Load-in_Time=100Torq-amp_Count=100Power-off_Retract_Count=100Available_Reservd_Space=100Media_Wearout_Indicator=100Unknown_Attribute=100Total_LBAs_Written=100Total_LBAs_Read=100=
but still running into these errors when trying to execute on a device within zenoss
Preparing Command...
Executing command /etc/smartmon_cmd against device
/etc/smartmon_cmd: line 2: /etc/smartcmd-output.txt: Permission denied
awk:
(FILENAME=- FNR=4) warning: error writing standard output (Broken pipe)
sed: can't read /etc/smartcmd-output.txt: No such file or directory
rm:
cannot remove `/etc/smartcmd-output.txt': No such file or directory
DONE in 0 seconds
Subject: | your output isn't quite right |
Author: | Andrew Kirch |
Posted: | 2015-01-19 08:10 |
your output isn't quite right, you need something more like:
OK|Reallocated_Sector_Ct=100 Power_On_Hours=100 Power_Cycle_Count=100 ...
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | looks like the user running |
Author: | Andrew Kirch |
Posted: | 2015-01-27 13:39 |
looks like the user running the script can't write to /etc
in fact giving a longer answer, you should NEVER write to anyplace that isn't /tmp unless you're actually doing a remediation action.
Andrew Kirch
akirch@gvit.com
Need Zenoss support, consulting or custom development Look no further. Email or PM me!
Ready for Distributed Topology (collectors) for Zenoss 5 Coming May 1st from GoVanguard
Subject: | zenoss output |
Author: | [Not Specified] |
Posted: | 2015-02-02 11:45 |
Ok I finally got past the premissions issues and the command says it is running when executed from zenoss but I am not seeing the output that the command produces when it is run on the devices local CLI.
Zenoss output:
Preparing Command...
Executing command /bin/smartcmd against Device1
DONE in 0 seconds
Output from Devices local CLI:
Status OK|Power_On_Hours=100 Power_Cycle_Count=100 Unknown_Attribute=100 Unknown_Attribute=100 Unknown_Attribute=100 Unknown_Attribute=100 Program_Fail_Count_Chip=100 Runtime_Bad_Block=100 End-to-End_Error=100 Reported_Uncorrect=100 Airflow_Temperature_Cel=082 Power-Off_Retract_Count=100 Temperature_Celsius=100 Current_Pending_Sector=100 UDMA_CRC_Error_Count=100 Load_Cycle_Count=100 Load-in_Time=100 Torq-amp_Count=100 Power-off_Retract_Count=100 Available_Reservd_Space=100 Media_Wearout_Indicator=100 Unknown_Attribute=100 Total_LBAs_Written=100 Total_LBAs_Read=100 =
Subject: | Output |
Author: | [Not Specified] |
Posted: | 2015-02-17 14:36 |
Any Ideas why my command output does not show up when running the command from the zenoss interface. Command runs and produces output fine on the devices CLI.
zenoss output
Preparing Command...
Executing command /bin/smartcmd against device1
DONE in 0 seconds
device1 CLI output:
Status OK|Reallocated_Sector_Ct=100 Power_On_Hours=100 Power_Cycle_Count=100 Unknown_Attribute=100 Unknown_Attribute=100 Unknown_Attribute=100 Unknown_Attribute=100 Program_Fail_Count_Chip=100 Runtime_Bad_Block=100 End-to-End_Error=100 Reported_Uncorrect=100 Airflow_Temperature_Cel=084 Power-Off_Retract_Count=100 Temperature_Celsius=100 Current_Pending_Sector=100 UDMA_CRC_Error_Count=100 Load_Cycle_Count=100 Load-in_Time=100 Torq-amp_Count=100 Power-off_Retract_Count=100 Available_Reservd_Space=100 Media_Wearout_Indicator=100 Unknown_Attribute=100 Total_LBAs_Written=100 Total_LBAs_Read=100
< |
Previous 5x metric updates fail with traceback |
Next custom zenpython datasource issue |
> |