CentOS/Linux:如何检测硬盘的坏道

January 8, 2015 by osetc

Category: CentOS, Linux, Redhat 0 Comments

本文将会讲述如何使用硬盘状态的测试工具来检测硬盘的坏道。下面主要通过操作三个硬盘测试工具来进行.
硬盘状态测试工具：smartctl、Badblocks、hdparm
安装smartctl 工具
输入下面的命令：

[root@qy ~]#yum install smartmontools -y

启动SMART

# smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda

对/dev/sda分区进行检查
输入下面的命令：

smartctl -a /dev/sda

命令输出：

smartctl 5.42 2011-10-20 r3458[i686-linux-2.6.18-194.el5PAE] (local build)
Copyright (C) 2002-11 by Bruce Allen,http://smartmontools.sourceforge.net
 Vendor:               SEAGATE
Product:              ST3146356SS
vision:             HS09
User Capacity:        146,815,733,760 bytes [146 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c50004fa837f
Serial number:        3QN0EL91
Device type:          disk
Transport protocol:   SAS
Local Time is:        Fri Oct 31 10:45:58 2014 CST
Device supports SMART and is Enabled
Temperature Warning Disabled or NotSupported
SMART Health Status: OK   #版本的不通这里显示的也不一样。
 Current Drive Temperature:     30 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 0  #才是出坏道，俗称成长坏道。
Vendor (Seagate) cache information
 Blocks sent to initiator = 3752023409
 Blocks received from initiator = 3916316860
 Blocks read from cache and sent to initiator = 4025399956
 Number of read and write commands whose size <= segment size =3339079605
 Number of read and write commands whose size > segment size = 2746
Vendor (Seagate/Hitachi) factoryinformation
 number of hours powered up = 34120.02
 number of minutes until next internal SMART test = 1
 Error counter log:
 Errors Corrected by          Total   Correction     Gigabytes    Total
              ECC          rereads/    errors  algorithm      processed    uncorrected
          fast | delayed   rewrites  corrected invocations   [10^9 bytes]  errors
read:  248894024        0         0 248894024   248894024      85241.186           0
write:         0        0         0         0          0     30998.996           0
verify:  340001        0        0    340001     340001        141.757           0
Non-medium errorcount:       51  #非介质错误。意思是说不是盘的问题，一般是电缆、传输、校验问题，可以忽略的。
No self-tests have been logged
Long (extended) Self Test duration: 1740seconds [29.0 minutes]

可以用命令直接查看硬盘的好坏:

[root@qy ~]# smartctl -H /dev/sda
smartctl 5.42 2011-10-20 r3458[i686-linux-2.6.18-194.el5PAE] (local build)
Copyright (C) 2002-11 by Bruce Allen,http://smartmontools.sourceforge.net
 SMART Health Status: OK

[root@localhost ~]# smartctl -H /dev/sda
smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 === START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

OK和PASSED都属于磁盘是正常的。
Badblocks工具测试正常，无坏道信息：
badblocks命令可以检查磁盘装置中损坏的区块。执行该指令时须指定所要检查的磁盘装置，及此装置的磁盘区块数。
badblocks -s//显示进度 -v//显示执行详细情况 /dev/sda1
badblocks -s//显示进度 -w//以写去检测 -v//显示执行详细情况 /dev/sda2
注意，不能以写的方式检测已经挂载的硬盘

[root@qy ~]# badblocks -s -v /dev/sda
Checking blocks 0 to 143374740
Checking for bad blocks (read-only test):done
Passcompleted, 0 bad blocks found.

此磁盘通过测试，没有坏道（坏块）。您可以放心使用。
不论是什么类型的坏道，均建议您首先进行数据备份！把重要数据进行备份然后再尝试修复。如果您有重要数据却无法读取（磁盘出现异常），那么请立即停止使用此磁盘并找专业人员进行修复。
使用hdparm测试
测试硬盘读写速度

# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads:
1918 MB in  2.00 seconds = 959.62 MB/sec
Timing buffered disk reads:  184 MB in  3.00 seconds =  61.26 MB/sec

hdparm可检测，显示与设定IDE或SCSI硬盘的参数。
可以使用sg_vpd命令查看硬盘转速，sg_vpd命令是sg3_utils其中一个工具.
下载地址：http://sg.danny.cz/sg/sg3_utils.html
[root@qy sg3_utils-1.39]# sg_vpd /dev/sda

关于smart检测硬盘命令补充:
smartctl -a 检查该设备是否已经打开SMART技术。
smartctl -s on 如果没有打开SMART技术，使用该命令打开SMART技术。
smartctl -t short 后台检测硬盘，消耗时间短；
smartctl -t long 后台检测硬盘，消耗时间长；
smartctl -C -t short 前台检测硬盘，消耗时间短；
smartctl -C -t long 前台检测硬盘，消耗时间长。其实就是利用硬盘SMART的自检程序。 s
martctl -X 中断后台检测硬盘。
smartctl -l selftest 显示硬盘检测日志。
smartctl -l error 显示硬盘错误汇总。
首先通过dmesg工具，确认一下硬盘的设备符号。例如一个IDE硬盘连接到Primary IDE 总线上的Slave位置，硬盘设备符号是/dev/hdb，hdb中的h代表IDE，如果显示为sdb，则代表SATA和SCSI，最后一个字幕b代表Primary总线，第二块硬盘即Slave位置，确认硬盘是否打开了SMART支持：

# smartctl -i /dev/sda
smartctl 5.40 2010-10-16 r3189 [i386-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model:     HITACHI HTS543225L9SA00
Serial Number:    090131FB2F32YLG28JEA
Firmware Version: FBEZC48C
User Capacity:    250,059,350,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Vrsion is:   8
ATA Standard is:  ATA-8-ACS revision 3f
Local Time is:    Wed May 25 10:10:39 2011 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
//表示启用了smart支持如果看到SMART support is: Disabled表示SMART未启用，执行如下命令，启动SMART

# smartctl --smart=on --offlineauto=on --saveauto=on /dev/sda
smartctl 5.40 2010-10-16 r3189 [i386-redhat-linux-gnu] (local build)
opyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.
SMARTAttribute Autosave Enabled.
SMART Automatic Offline Testing Enabled every four hours.

现在硬盘的SMART功能已经被打开，执行如下命令查看硬盘的健康状况

# smartctl -H /dev/sda
smartctl 5.40 2010-10-16 r3189 [i386-redhat-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

请注意result后边的结果：PASSED，这表示硬盘健康状态良好；如果这里显示Failure，那么最好立刻给服务器更换硬盘。SMART只能报告磁盘已经不再健康，但是报警后还能继续运行多久是不确定的。通常，SMART报警参数是有预留的，磁盘报警后，不会当场坏掉，一般能坚持一段时间，有的硬盘SMART报警后还继续跑了好几年，有的硬盘SMART报错后几天就坏了。但是一旦出现报警，侥幸心里是万万不能的……
原文：51cto

CentOS/Linux:如何检测硬盘的坏道

Previous

Next

0 Comments

Leave a comment Cancel reply