ともちゃ日記(Tomo cha) - 元大学生のＯＬ日記-

TOP / Wiki / Diary / Profile / BBS / Mail

わたしの日記は日々の出来事の鬱憤晴らしの毒だし日記がメインです。相当病んでいます。くだを巻いています。許容出来る方のみのアクセスをお願いします。また、この日記へのリンクは原則自由にして頂いても結構ですが、写真への直リンクを張るのはご遠慮下さい。内容に関しては、一切保証致しません。

カテゴリ一覧 Network, Internet, IPv6, DC, NTT, Comp, Linux, Debian, FreeBSD, Windows, Server, Security, IRC, 大学, Neta, spam, 食, 生活, 遊び, Drive, TV, 仕事,

過去日記:

2013年05月05日(日) [晴れ]

＊ [Comp] ここまでこわれると豪快だよね

去年11月にSASディスクが死んで、ディスク交換したと日記に書いたが、また、派手に逝ったらしく...

もう、このサーバを捨てようとして、ちょっとずつ引っ越しし始めていた矢先。

構成を組み替えて、新しいサーバを新調したのだったが...

旧サーバの構成は、

LSI_1064E *2枚
SAS HP 74GB (10krpm) *2 - RAID 1
SATA HITACHI 2TB *2, Seagate 2TB *1 - RAID Z2

というシンプルな構成。

トラブルが多いので、呪われていると感じ、新しいサーバへ移植すべく、作業を行うまでの間に、

最初SATAディスクが2発死に、RAIDZ2がデグレートし始めて、昨年の夏頃までに1発交換済み→まだ、デグレート中。
さらに、秋口にRAID1を構成しているSASディスク1本が死亡し、11月にディスク交換。
年明けに、RAIDZ2を構成しているSATAディスクが1本死に、デグレートし、リーチがかかった。
本日、SASディスクが死亡し、RAID1デグレート。
なに、この障害の多いディスク....呪われているにも程がありますぞ....

話をさかのぼり、呪われていると感じ、昨年の11月頃に、新しいサーバを新調したが、初期不良を踏んでしまい、再度作り直し。
またまた、呪われてしまった...

気を取り直し、年明けに、車で帰ってくるタイミングで、新しくなったサーバに移植開始すべく、旧呪われたサーバから、パーツを引き抜く。
パーツはRAIDコントローラで、コントローラが2枚しかないので、この問題の発生したサーバから1枚コントローラを奪い取り、どうせ、デグレートしているRAIDZ2(3発)うち、2発しかディスクがないので、RAID1+RAIDZ2(2発)の構成変更、マイグレーションを行う。

で、完全に今回問題の起きたサーバを取り除くため、作業をしていたら、RAID1を構成しているディスク1発がお亡くなりに...

現在の状態はこんな感じ...

root@storage01:/var# raidctl -l
Controller: 2
        Volume:c2t0d0
        Disk: 0.1.0
        Disk: 0.4.0

root@storage01:/var# raidctl -l c2t0d0
Volume                  Size    Stripe  Status   Cache  RAID
        Sub                     Size                    Level
                Disk
----------------------------------------------------------------
c2t0d0                  67.9G   N/A     DEGRADED ON     RAID1
                0.1.0   67.9G           GOOD
                N/A     67.9G           FAILED

root@storage01:/var# zpool status -v
  pool: datapool6
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 2.23G in 0h14m with 0 errors on Mon Dec 17 15:58:34 2012
config:

        NAME        STATE     READ WRITE CKSUM
        datapool6   DEGRADED     0     0     0
          raidz2-0  DEGRADED     0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t3d0  REMOVED      0     0     0
            c4t4d0  OFFLINE      0     0     0

errors: No known data errors

  pool: rpool1
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool1      ONLINE       0     0     0
          c2t0d0s0  ONLINE       0     0     0

errors: No known data errors

なんで、こんなにトラブルの.....SASディスクもHPとSeagateで意図的にロットもかえているのに...

★ smartctlで...:

root@Microknoppix:~# smartctl -a -S on /dev/sdb
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

Device: SEAGATE  ST973402SS       Version: 0002
Serial number: 3NP2S2N800XXXXXXX
Device type: disk
Transport protocol: SAS
Local Time is: Sun May  5 20:37:04 2013 JST
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK

Current Drive Temperature:     41 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 3653891794
  Blocks received from initiator = 785771848
  Blocks read from cache and sent to initiator = 2754279779
  Number of read and write commands whose size <= segment size = 618681852
  Number of read and write commands whose size > segment size = 123432
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 29365.82
  number of minutes until next internal SMART test = 1

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   106057393      171         0  106057564   106057564       5261.937           0
write:         0     5493        49      5542       5542   2780282693.144        5495
verify:  4969051        0         0   4969051    4969051        937.761           0

Non-medium error count: 101280456

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -     140                 - [-   -    -]

Long (extended) Self Test duration: 1070 seconds [17.8 minutes]

不良セクタの数とか、そういう情報は見れないのね....にしても、

SMART Health Status: OK

これは、どういう意味なんだろう？ErrorCountはたくさん出てるんだけど。。別の意味なのかなぁ。調べ中。

別のSATAディスク

root@Microknoppix:~# smartctl -a -S on /dev/sdd
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    XXXXXXX
Firmware Version: JKAOA3EA
User Capacity:    2,000,398,934,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun May  5 20:46:07 2013 JST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Attribute Autosave Enabled.

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                           was suspended by an interrupting command from host.
                           Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                           without error or no self-test has ever
                           been run.
Total time to complete Offline
data collection:            (22624) seconds.
Offline data collection
capabilities:               (0x5b) SMART execute Offline immediate.
                           Auto Offline data collection on/off support.
                           Suspend Offline collection upon new
                           command.
                           Offline surface scan supported.
                           Self-test supported.
                           No Conveyance Self-test supported.
                           Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                           power-saving mode.
                           Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                           General Purpose Logging supported.
Short self-test routine
recommended polling time:   (   1) minutes.
Extended self-test routine
recommended polling time:   ( 255) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                           SCT Error Recovery Control supported.
                           SCT Feature Control supported.
                           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   128   128   054    Pre-fail  Offline      -       118
  3 Spin_Up_Time            0x0007   197   197   024    Pre-fail  Always       -       472 (Average 261)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       48
  5 Reallocated_Sector_Ct   0x0033   097   097   005    Pre-fail  Always       -       234
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   121   121   020    Pre-fail  Offline      -       35
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       16963
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       48
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       86
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       86
194 Temperature_Celsius     0x0002   136   136   000    Old_age   Always       -       44 (Lifetime Min/Max 18/46)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 16962 hours (706 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 51 4a fd 63 e3 09  Error: IDNF 74 sectors at LBA = 0x09e363fd = 165897213

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 0a 83 f7 3d 49 00  38d+18:56:51.833  WRITE DMA EXT
  35 00 0a 15 bc 3d 49 00  38d+18:51:54.016  WRITE DMA EXT
  35 00 0e 01 bc 3d 49 00  38d+18:51:54.015  WRITE DMA EXT
  35 00 0e f1 bb 3d 49 00  38d+18:51:54.012  WRITE DMA EXT
  35 00 08 41 74 f6 48 00  38d+18:51:54.008  WRITE DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Seek_Error_Rate = 0 で、Reallocated_Sector_Ct = 234 なんだ。。電源落としたから、0になったのかしら?

# anonymous 『ともちゃだから、誰も問題ない！で、済ませると思うよ』

[ コメントを読む(1) | コメントする ]

＊ [Comp][Other] アサヒペンエアーコンプレッサー AIRBOXY ABX-09 アサヒペン

なぜほしいかというと、エアダスターの使用量が半端無く、困るから(汗
そして、コンピュータ内部の掃除や車の中の掃除など、お掃除グッズとしては、小型でよさげ。
気になるのは圧力とタンクの容量が少ない。
7Lタイプの物でも、1万ちょっとで買えるけど、それからすると断然小さい。
だけど、エアダスターで事足りるような範囲で、それだけもの大容量なんて、いらないっしょと割り切りもあり。

値段は13,000円ぐらいかぁ....

主要スペックは下記の通り。

周波数:50Hz/60Hz
消費電力(W):840(50Hz)/900(60Hz)
空気吐出(L/min):at0.3MPa:63(50Hz)/71(60Hz)・at0.6MPa:27(50Hz)/31(60Hz)
モーター停止圧力(Mpa):0.78[8kgf/cm2]
モーター復帰圧力(Mpa):0.59[6kgf/cm2]
安全弁作動圧力(Mpa):0.88[9kgf/cm2]
空気タンク容量:1.2L
エアー取り出し口:カプラ（ソケット）
電力コード:1.8m

収納場所をきっちりと用意できるなら、ナカトミ(NAKATOMI) オイルレスコンプレッサー CP-100とかでもいいんだろうね。こっちは、ガンは別売。

[ コメントを読む(0) | コメントする ]

Diary for 1 day(s)

2013年05月05日(日) [晴れ]

＊ [Comp] ここまでこわれると豪快だよね

＊ [Comp][Other] アサヒペン エアーコンプレッサー AIRBOXY ABX-09 アサヒペン

＊ [Comp][Other] アサヒペンエアーコンプレッサー AIRBOXY ABX-09 アサヒペン