Thursday 15 September 2016

How to corrupt a datablock and recover it


For this post, I am using the Linux DEV VM.

Oracle stores data in cells within datafiles referred to as datablocks. By default, those blocks are squared off into 8196 byte partitions. The blocksize usually comes into play on database creation and should be the same as the OS blocksize for improved efficiency. More info on blocks here. On a long enough time line, provided the drives survive long enough, corruption will occur at some point or another within the database. They can be recovered provided corruption is detected shortly after it occurs before it persists the days or weeks of backups. Corruption is usually reported first in the RMAN backup log therefore it is important to view the backup log daily to spot corruption.

To see the block size of an instance I run:
SQL> show parameter db_block_size

NAME                        TYPE        VALUE
--------------------------- ----------- -------
db_block_size               integer     8192

Enough about that, moving onto corrupting a datablock.

Create a new tablespace:

CREATE TABLESPACE blockbreaker 
DATAFILE '/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf' 
SIZE 10M 
LOGGING EXTENT MANAGEMENT LOCAL 
SEGMENT SPACE MANAGEMENT AUTO;

ALTER DATABASE DATAFILE '/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf' AUTOEXTEND OFF;

CREATE TABLE BLOCKMAKER 
(
VALUE VARCHAR(100)

TABLESPACE BLOCKBREAKER;

Now populate the table with data:

begin
  for i in 1..10000 loop
    insert into blockmaker 
values (to_char(sysdate,'DDMMYYYY HH24:MI:SS')||'XXXXXXXXX_'||to_char(i));

commit;
  end loop;
end;
/

alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;

Once created, take a full system backup - in this example I am performing a warm backup considering my database is in archivelog mode.
run {
    crosscheck backup;
    delete noprompt obsolete;
    backup as compressed backupset full database tag WARM_FULL_BACKUP format '/home/oracle/app/oracle/backup/%d_%T_%s_%p_WARM_FULL.bk';
    sql 'alter system archive log current';
    backup tag ARCHIVELOG_BACKUP format '/home/oracle/app/oracle/backup/%d_%T_%s_%p_ARCHIVE.bk' archivelog all delete all input ;
    backup tag CONTROL_BACKUP current controlfile format '/home/oracle/app/oracle/backup/%d_%T_%s_%p_CONTROL.bk';
    delete noprompt obsolete;
    }


Now the fun part. I am using a Linux VM for the following, I have not done something like  this in Windows before but I am sure that there are commands that perform similar functions in a Windows environment.
I am going to
-corrupt a block in my new table
-verify it is corrupted with DBV
-Dump the block out
-Check the corruption view to show the corruption was registered
-Recover the bad blocks


In sqlplus I run the following:

set trimspool on
set heading off
set lines 1000
set pages 1000

column file_name new_val file_name;
column file_id new_val file_id;
column block_id new_val block_id;

select f.name as file_name,
       to_char(f.block_id) as block_id,
       to_char(f.file#) as file_id   
  from (
       select * 
    from v$datafile vd, 
     dba_extents de 
   where lower(vd.name) like '%blockbreaker%' 
 and vd.file# = de.file_id 
     order by dbms_random.random 
   ) f 
  where rownum < 2
/
  
select 'dd of=&file_name bs=8192 conv=notrunc seek=&block_id << EOF' as string from dual
union all
  select 'BLOCKBREAKER IS HERE' as string from dual
union all
  select 'EOF' as string from dual
union all
  select ' ' as string from dual
union all
  select 'dbv FILE='''||name||''' blocksize=8192' as string from (select * from v$datafile where lower(name) like '%blockbreaker%' order by file#)  where rownum < 2
union all
  select 'echo ''alter system dump datafile &file_id block &block_id;'' | sqlplus sys/oracle@cdb1 as sysdba' as string from dual
union all
  select ' ' as string from dual
union all
  select 'cd '||value from v$diag_info where name='Diag Trace'
union all
  select 'ls -trl *ora*.trc' from dual;
/

exit;

The output sould look like this:
dd of=/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf bs=8192 conv=notrunc seek=168 << EOF
BLOCKBREAKER IS HERE
EOF

dbv FILE='/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf' blocksize=8192
echo 'alter system dump datafile 31 block 168;' | sqlplus sys/oracle@cdb1 as sysdba

cd /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace
ls -trl *ora*.trc

I copy the output and run in the shell:

[oracle@localhost backup]$ dd of=/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf bs=8192 conv=notrunc seek=168 << EOF
> BLOCKBREAKER IS HERE
> EOF
0+1 records in
0+1 records out
21 bytes (21 B) copied, 0.000171222 s, 123 kB/s
[oracle@localhost backup]$
[oracle@localhost backup]$ dbv FILE='/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf' blocksize=8192

DBVERIFY: Release 12.1.0.2.0 - Production on Thu Sep 15 09:24:06 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = /home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf
Page 168 is marked corrupt
Corrupt block relative dba: 0x07c000a8 (file 31, block 168)
Bad header found during dbv:
Data in bad block:
 type: 66 format: 4 rdba: 0x4552424b
 last change scn: 0x4920.52454b41 seq: 0x53 flg: 0x20
 spare1: 0x4f spare2: 0x43 spare3: 0x4552
 consistency value in tail: 0x6ac40603
 check value in block header: 0x4548
 block checksum disabled



DBVERIFY - Verification complete

Total Pages Examined         : 1280
Total Pages Processed (Data) : 57
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 133
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 1089
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 7566501 (0.7566501)
[oracle@localhost backup]$ echo 'alter system dump datafile 31 block 168;' | sqlplus sys/oracle@cdb1 as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Sep 15 09:24:06 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL>
System altered.

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
[oracle@localhost backup]$
[oracle@localhost backup]$ cd /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace
[oracle@localhost trace]$ ls -trl *ora*.trc

-rw-r-----. 1 oracle oracle   2205 Sep 15 09:24 cdb1_ora_31184.trc

[oracle@localhost trace]$ 

The commands should place the shell in the trace location. The latest TRC file should contain the contents of the newly corrupted block.

I cat the latest trace file out below:

[oracle@localhost trace]$ cat cdb1_ora_31184.trc

Trace file /home/oracle/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_31184.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
ORACLE_HOME = /home/oracle/app/oracle/product/12.1.0/dbhome_1
System name:    Linux
Node name:      localhost.localdomain
Release:        3.8.13-68.1.3.el7uek.x86_64
Version:        #2 SMP Wed Apr 22 11:51:54 PDT 2015
Machine:        x86_64
Instance name: cdb1
Redo thread mounted by this instance: 1
Oracle process number: 67
Unix process pid: 31184, image: oracle@localhost.localdomain


*** 2016-09-15 09:24:06.877
*** SESSION ID:(44.23378) 2016-09-15 09:24:06.877
*** CLIENT ID:() 2016-09-15 09:24:06.877
*** SERVICE NAME:(cdb1) 2016-09-15 09:24:06.877
*** MODULE NAME:(sqlplus@localhost.localdomain (TNS V1-V3)) 2016-09-15 09:24:06.877
*** CLIENT DRIVER:(SQL*PLUS) 2016-09-15 09:24:06.877
*** ACTION NAME:() 2016-09-15 09:24:06.877
*** CONTAINER ID:(1) 2016-09-15 09:24:06.877

Start dump data blocks tsn: 6 file#:31 minblk 168 maxblk 168
Block dump from cache:
Dump of buffer cache at level 4 for pdb=1 tsn=6 rdba=130023592
BH (0x78bdb158) file#: 31 rdba: 0x07c000a8 (31/168) class: 1 ba: 0x78884000
  set: 3 pool: 3 bsz: 8192 bsi: 0 sflg: 0 pwc: 0,0
  dbwrid: 0 obj: 93452 objn: 93452 tsn: [1/6] afn: 31 hint: f
  hash: [0x8290d588,0x8290d588] lru: [0x78bdb380,0x78bdb100]
  ckptq: [NULL] fileq: [NULL]
  objq: [0x78bdb128,0x78bdd1a8] objaq: [0x78bdd1b8,0x78bdb138]
  st: XCURRENT md: NULL fpin: 'ktspbwh2: ktspfmdb' fscn: 0x0.736877 tch: 1
  flags: block_written_once
  LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [1]
Block dump from disk:
Encrypted block <6, 130023592> content will not be dumped. Dumping header only.
buffer tsn: 6 rdba: 0x4552424b (277/1196619)
scn: 0x4920.52454b41 seq: 0x53 flg: 0x20 tail: 0x6ac40603
frmt: 0x04 chkval: 0x4548 type: 0x42=unknown
Hex dump of corrupt header 4 = CORRUPT
Dump of memory from 0x00007FD7120A1E00 to 0x00007FD7120A1E14
7FD7120A1E00 434F4C42 4552424B 52454B41 20534920  [BLOCKBREAKER IS ]
7FD7120A1E10 45524548                             [HERE]
End dump data blocks tsn: 6 file#: 31 minblk 168 maxblk 168
[oracle@localhost trace]$

The block should be registered in the dynamic corruption view table once discovered by DBV - if it has not been registered, I can discover it myself:

--Flush buffer cache to force the db to retrieve the blocks from disk SQL> alter system flush buffer_cache;

System altered.

SQL> select * from v$database_block_corruption;

no rows selected

SQL> select * from blockmaker;

VALUE
--------------------------------------------------------------------------------
............................
15092016 09:16:08XXXXXXXXX_6098
15092016 09:16:08XXXXXXXXX_6099
15092016 09:16:08XXXXXXXXX_6100
15092016 09:16:08XXXXXXXXX_6101
ERROR:
ORA-01578: ORACLE data block corrupted (file # 31, block # 168)
ORA-01110: data file 31:
'/home/oracle/app/oracle/oradata/cdb1/orcl/blockbreaker01.dbf'

6885 rows selected.

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO     CON_ID
---------- ---------- ---------- ------------------ --------- ----------
        31        168          1                  0 CORRUPT            0

SQL>

Now I will recover the corrupted block. This involves RMAN unpacking the last backup of the corrupted datafile and applying the logs in the backup as well as the logs generated up until this point.

RMAN> blockrecover corruption list;

Starting recover at 15-SEP-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=43 device type=DISK

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00031
channel ORA_DISK_1: reading from backup piece /home/oracle/app/oracle/backup/CDB1_20160915_9_1_WARM_FULL.bk
channel ORA_DISK_1: piece handle=/home/oracle/app/oracle/backup/CDB1_20160915_9_1_WARM_FULL.bk tag=WARM_FULL_BACKUP
channel ORA_DISK_1: restored block(s) from backup piece 1
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:01

starting media recovery
media recovery complete, elapsed time: 00:00:03

Finished recover at 15-SEP-16

RMAN>

the corruption flag has been cleared:

SQL> select * from v$database_block_corruption;

no rows selected

And I am able to now select from my table:

..................
VALUE
--------------------------------------------------------------------------------
15092016 09:16:08XXXXXXXXX_10000

10000 rows selected.


SQL> l
  1* select * from blockmaker
SQL>


No comments:

Post a Comment