12cRAC, 災演後, cluster 服務無法啟動
環境
Solaris 11 SPARC (LDOM)
Oracle 12.2.0.1 EE RAC
Oracle 12.2.0.1 EE RAC
異常
OS 重開後, GI 無法啟動, osysmond.bin 也沒有啟動細節
EMRRAC1
root@EMRRAC1:~# /oracle/app/grid/bin/crsctl start clusterCRS-2672: Attempting to start 'ora.crf' on 'emrrac1'
CRS-2672: Attempting to start 'ora.cssd' on 'emrrac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'emrrac1'
CRS-2676: Start of 'ora.diskmon' on 'emrrac1' succeeded
CRS-2676: Start of 'ora.crf' on 'emrrac1' succeeded
CRS-2674: Start of 'ora.cssd' on 'emrrac1' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'emrrac1'
CRS-2681: Clean of 'ora.cssd' on 'emrrac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'emrrac1'
CRS-2677: Stop of 'ora.crf' on 'emrrac1' succeeded
CRS-4000: Command Start failed, or completed with errors.
EMRRAC2
root@EMRRAC2:~# /oracle/app/grid/bin/crsctl start clusterCRS-2672: Attempting to start 'ora.crf' on 'emrrac2'
CRS-2672: Attempting to start 'ora.cssd' on 'emrrac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'emrrac2'
CRS-2676: Start of 'ora.diskmon' on 'emrrac2' succeeded
CRS-2676: Start of 'ora.crf' on 'emrrac2' succeeded
CRS-2674: Start of 'ora.cssd' on 'emrrac2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'emrrac2'
CRS-2681: Clean of 'ora.cssd' on 'emrrac2' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'emrrac2'
CRS-2677: Stop of 'ora.crf' on 'emrrac2' succeeded
CRS-4000: Command Start failed, or completed with errors.
HISRAC2
root@HISRAC2:~# /oracle/app/grid/bin/crsctl start clusterCRS-2672: Attempting to start 'ora.crf' on 'hisrac2'
CRS-2672: Attempting to start 'ora.cssd' on 'hisrac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'hisrac2'
CRS-2676: Start of 'ora.diskmon' on 'hisrac2' succeeded
CRS-2676: Start of 'ora.crf' on 'hisrac2' succeeded
CRS-2674: Start of 'ora.cssd' on 'hisrac2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'hisrac2'
CRS-2681: Clean of 'ora.cssd' on 'hisrac2' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'hisrac2'
CRS-2677: Stop of 'ora.crf' on 'hisrac2' succeeded
CRS-4000: Command Start failed, or completed with errors.
三台的 OS log 如下, (在 vdisk offline 後, 沒有緊接著 vdisk online 的訊息)
May 4 11:39:17 EMRRAC1 genunix: [ID 390243 kern.info] Creating /etc/devices/devid_cacheMay 4 11:39:18 EMRRAC1 hwmgmtd[1051]: [ID 702911 daemon.notice] hwmgmtd version 2.4.2.2 r20727 started.
May 4 11:39:20 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@1 is offline
May 4 11:39:20 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@2 is offline
May 4 11:39:20 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@3 is offline
May 4 11:39:20 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@4 is offline
May 4 11:39:20 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@5 is offline
May 4 11:39:20 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@6 is offline
May 4 11:39:24 EMRRAC1 oracleoks: [ID 123267 kern.notice] NOTICE: OKSK-00028: In memory kernel log buffer address: 0x304004ee
90, size: 10485760
EMRRAC2
May 4 11:39:34 EMRRAC2 root: [ID 702911 user.error] Starting execution of Oracle Clusterware init.ohasdMay 4 11:39:36 EMRRAC2 hwmgmtd[856]: [ID 702911 daemon.notice] hwmgmtd version 2.4.2.2 r20727 started.
May 4 11:39:39 EMRRAC2 vdc: [ID 990228 kern.info] vdisk@1 is offline
May 4 11:39:39 EMRRAC2 vdc: [ID 990228 kern.info] vdisk@2 is offline
May 4 11:39:39 EMRRAC2 vdc: [ID 990228 kern.info] vdisk@3 is offline
May 4 11:39:39 EMRRAC2 vdc: [ID 990228 kern.info] vdisk@4 is offline
May 4 11:39:39 EMRRAC2 vdc: [ID 990228 kern.info] vdisk@5 is offline
May 4 11:39:39 EMRRAC2 vdc: [ID 990228 kern.info] vdisk@6 is offline
May 4 11:39:42 EMRRAC2 oracleoks: [ID 123267 kern.notice] NOTICE: OKSK-00028: In memory kernel log buffer address: 0x304005d3fd9f0, size:
10485760
May 4 11:39:42 EMRRAC2 oracleoks: [ID 863671 kern.notice] NOTICE: OKSK-00027: Oracle kernel distributed lock manager hash size is 31251
HISRAC2
May 4 11:52:42 HISRAC2 root: [ID 702911 user.error] Starting execution of Oracle Clusterware init.ohasd
May 4 11:52:47 HISRAC2 vdc: [ID 990228 kern.info] vdisk@1 is offline
May 4 11:52:47 HISRAC2 hwmgmtd[1117]: [ID 702911 daemon.notice] hwmgmtd version 2.4.2.2 r20727 started.
Workaround
EMRRAC1檢查 /dev/rdsk , 發現其中三顆 disk (voting disk)權限跑掉了
root@EMRRAC1# ls -l /devices/virtual-devices\@100/channel-devices\@200 |grep a,raw
crw------- 1 root sys 279, 0 May 4 11:39 disk@0:a,raw
crw-rw---- 1 grid asmadmin 279, 8 Apr 27 18:09 disk@1:a,raw
crw-rw---- 1 grid asmadmin 279, 16 Apr 27 18:09 disk@2:a,raw
crw-rw---- 1 grid asmadmin 279, 24 Apr 27 18:09 disk@3:a,raw
crw------- 1 root sys 279, 32 May 4 11:39 disk@4:a,raw
crw------- 1 root sys 279, 40 May 4 11:39 disk@5:a,raw
crw------- 1 root sys 279, 48 May 4 11:39 disk@6:a,raw
root@EMRRAC1# chown grid:asmadmin /dev/rdsk/c1d4* /dev/rdsk/c1d5* /dev/rdsk/c1d6*
root@EMRRAC1# chmod 0660 /dev/rdsk/c1d4* /dev/rdsk/c1d5* /dev/rdsk/c1d6*
crw------- 1 root sys 279, 0 May 4 11:39 disk@0:a,raw
crw-rw---- 1 grid asmadmin 279, 8 Apr 27 18:09 disk@1:a,raw
crw-rw---- 1 grid asmadmin 279, 16 Apr 27 18:09 disk@2:a,raw
crw-rw---- 1 grid asmadmin 279, 24 Apr 27 18:09 disk@3:a,raw
crw-rw---- 1 grid asmadmin 279, 32 May 4 13:30 disk@4:a,raw
crw-rw---- 1 grid asmadmin 279, 40 May 4 13:30 disk@5:a,raw
crw-rw---- 1 grid asmadmin 279, 48 May 4 13:30 disk@6:a,raw
root@EMRRAC1# /oracle/app/grid/bin/crsctl start cluster
CRS-2672: Attempting to start 'ora.crf' on 'emrrac1'
CRS-2672: Attempting to start 'ora.cssd' on 'emrrac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'emrrac1'
CRS-2676: Start of 'ora.diskmon' on 'emrrac1' succeeded
CRS-2676: Start of 'ora.crf' on 'emrrac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'emrrac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'emrrac1'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'emrrac1'
CRS-2676: Start of 'ora.ctssd' on 'emrrac1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'emrrac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'emrrac1'
CRS-2676: Start of 'ora.asm' on 'emrrac1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'emrrac1'
CRS-2676: Start of 'ora.storage' on 'emrrac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'emrrac1'
CRS-2676: Start of 'ora.crsd' on 'emrrac1' succeeded
EMRRAC2檢查 /dev/rdsk , 這台更誇張, 全部的權限都跑掉了
root@EMRRAC2:/devices/virtual-devices@100/channel-devices@200# ls -l *:a,raw
crw------- 1 root sys 279, 0 May 4 11:39 disk@0:a,raw
crw------- 1 root sys 279, 8 May 4 11:39 disk@1:a,raw
crw------- 1 root sys 279, 16 May 4 11:39 disk@2:a,raw
crw------- 1 root sys 279, 24 May 4 11:39 disk@3:a,raw
crw------- 1 root sys 279, 32 May 4 11:39 disk@4:a,raw
crw------- 1 root sys 279, 40 May 4 11:39 disk@5:a,raw
crw------- 1 root sys 279, 48 May 4 11:39 disk@6:a,raw
HISRAC2檢查 /dev/rdsk , 這台只有一顆 shared disk
root@HISRAC2:~# ls -l /devices/virtual-devices\@100/channel-devices\@200 |grep a,raw
crw------- 1 root sys 279, 0 May 4 11:52 disk@0:a,raw
crw------- 1 root sys 279, 8 May 4 11:52 disk@1:a,raw
這三台修正完異後, 再以 reboot -- -r 重開, 沒有再發生權限跑掉的異常. ,
root@EMRRAC1:~# dmesg |grep vdisk
May 4 13:39:03 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@3 is online using ldc@9,0
May 4 13:39:04 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@4 is online using ldc@12,0
May 4 13:39:04 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@5 is online using ldc@13,0
May 4 13:39:04 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@6 is online using ldc@14,0
May 4 13:39:17 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@1 is offline
May 4 13:39:17 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@2 is offline
May 4 13:39:17 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@3 is offline
May 4 13:39:17 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@4 is offline
May 4 13:39:17 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@5 is offline
May 4 13:39:17 EMRRAC1 vdc: [ID 990228 kern.info] vdisk@6 is offline
May 4 13:39:21 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@1 is online using ldc@7,0
May 4 13:39:21 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@2 is online using ldc@8,0
May 4 13:39:21 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@3 is online using ldc@9,0
May 4 13:39:21 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@4 is online using ldc@12,0
May 4 13:39:21 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@5 is online using ldc@13,0
May 4 13:39:21 EMRRAC1 vdc: [ID 625787 kern.info] vdisk@6 is online using ldc@14,0
留言
張貼留言