Monday, 26 September 2016

Creating a Physical Standby Database in 11.2.0.3

Image courtesy of the movie "the shining". Go watch it, it is very good. The twins represent DR sites in a way in that they are mirrored. They also represent database systems in that when it is time to open DR, the administrator is likely to be scared: he or she is either dealing with auditors or a non recoverable disaster - both quite scary. An Oracle standby database is a database that is synchronized at a data block level, so when the DR system is bust open in an emergency, the system should be identical to the production database it was tethered to - at least up to the last log the DR site consumed. There are no Oracle Streams or Microsoft replication type tethering mechanisms, it is kept in sync using archivelog files shed from the primary database. It is also a one way operation, at least in standard edition.

Below are notes to setup a standby database in Windows 2012.

The steps in this guide assume the the server has the exact same drives as the primary production. This means that I am not going to use any *convert* parameters to handle differences in file names. If the drives do differ, then I would use convert to handle the difference in lettering.

1. Install the database service:
In an administration cmd window, I execute the following:

set ORACLE_SID=<STANDBY_SID>
set ORACLE_HOME=E:\Oracle\Product\11.2.0.3\dbhome_1
set PATH=%ORACLE_HOME%\bin;E:\Oracle\product\11.2.0.3\dbhome_1;%PATH%

E:\Oracle\product\11.2.0.3\dbhome_1\BIN\oradim -DELETE -SID <STANDBY_SID> 

E:\Oracle\product\11.2.0.3\dbhome_1\BIN\oradim -new -SID <STANDBY_SID> -startmode manual -pfile 'E:\Oracle\Product\11.2.0.3\dbhome_1\database\init<STANDBY_SID>.ora'

If there is no pfile, make or copy one across using the primary database.
The next step is to create the folders where the control, data and redolog files will be restored.

2. Create the directories:
In windows command, execute the following:
mkdir f:\Oracle\Oradata\<STANDBY_SID>\
mkdir g:\Oracle\Oradata\<STANDBY_SID>\
mkdir h:\Oracle\Oradata\archive\<STANDBY_SID>
mkdir h:\Oracle\Oradata\<STANDBY_SID>

3. Copy the password file and a pfile:
Copy the following key files from the primary server to the standby database database directory:
prod01:E:\Oracle\prduct\11.2.0.3\dbhome_1\database\PWD<STANDBY_SID>.oraprod01:E:\Oracle\prduct\11.2.0.3\dbhome_1\database\init<STANDBY_SID>.ora
To
standby1:E:\Oracle\prduct\11.2.0.3\dbhome_1\database\

4. Restore the spfile
Now I begin the restoration process. If not done already, take a backup of the primary database. You will also need a standby controlfile from the primary site;

create standby controlfile as 'C:\<FOLDER>\<STANDBY_SID>_standby_control.rman'

Copy the backup pieces from the primary database server into a holding location on the standby database server:
C:\RMAN_Backups\<STANDBY_SID>_2016_04_05\

Execute the following in CMD:
set ORACLE_SID=<STANDBY_SID>
set ORACLE_HOME=E:\Oracle\Product\11.2.0.3\dbhome_1

rman target / nocatalog

Once done, I startup the database with force:

startup force nomount;

OUTPUT
RMAN> startup force nomount;

Oracle instance started

Total System Global Area   12827369472 bytes

Fixed Size                     2267184 bytes
Variable Size               5704255440 bytes
Database Buffers            7113539584 bytes
Redo Buffers                   7307264 bytes

Now I restore the database. I restore the spfile from the backup piece containing the spfile file. I can find the piece holding this file by reviewing the RMAN log file generated during the backup.
The clue is the piece containing the following meta data;
  SPFILE Included: Modification time: 20-SEP-16

I then plug the piece name into the following command and restore:
restore spfile to 'E:\Oracle\product\11.2.0.3\dbhome_1\database\spfile<STANDBY_SID>.ora' from 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_
.RMAN';

OUTPUT
RMAN> restore spfile to 'E:\Oracle\product\11.2.0.3\dbhome_1\database\spfile<STANDBY_SID>.ora' from 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_
.RMAN';

Starting restore at 26-SEP-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1712 device type=DISK

channel ORA_DISK_1: restoring spfile from AUTOBACKUP C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_1.RMAN
channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete
Finished restore at 26-SEP-16

RMAN>

5. Restore the control files
Now restore the control files. Find the control file backup piece from the log file. It should be the same as the spfile piece used in the previous step. Similarly, the clue is to find the piece with the following metadata identifier:  

Control File Included: Ckp SCN: 19783506609   Ckp time: 25-SEP-16

Load up RMAN
rman target / nocatalog

Execute the following:
restore controlfile from 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_1.RMAN';

OUTPUT
RMAN> restore controlfile from 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_1.RMAN';

Starting restore at 26-SEP-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1712 device type=DISK

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=F:\ORACLE\ORADATA\<STANDBY_SID>\CONTROL01.CTL
output file name=G:\ORACLE\ORADATA\<STANDBY_SID>\CONTROL02.CTL
output file name=H:\ORACLE\ORADATA\<STANDBY_SID>\CONTROL03.CTL
Finished restore at 26-SEP-16

Mount the database once the control files are deployed:

mount database;

OUTPUT
RMAN> mount database;

database mounted
released channel: ORA_DISK_1

6. Catalog the backup pieces
Now to catalog the full backupset:

OUTPUT
RMAN> catalog start with 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\';

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of catalog command at 09/26/2016 15:34:17
RMAN-06189: current DBID 1020978749 does not match target mounted database (1025452237)

The problem here is I need to exit rman and connect again with nocatalog after restoring and mounting the control files:

exit
rman target / nocatalog
catalog start with 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\';

OUTPUT:

RMAN> catalog start with 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\';

searching for all files that match the pattern C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\

List of Files Unknown to the Database

=====================================
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_2016_09_25_backup_list.log
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_20RGMRRN_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_22RGMSGG_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_23RGMSGN_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_24RGMSGV_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_25RGMSH6_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_26RGMSHE_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_27RGMSHM_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_28RGMSHT_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_29RGMSI5_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_2ARGMSI9_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_STANDBY_CONTROL.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_TRACE_CONTROL.RMAN

Do you really want to catalog the above files (enter YES or NO)? YES

cataloging files...
cataloging done

List of Cataloged Files

=======================
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_20RGMRRN_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_21RGMSG7_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_22RGMSGG_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_23RGMSGN_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_24RGMSGV_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_25RGMSH6_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_26RGMSHE_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_27RGMSHM_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_28RGMSHT_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_29RGMSI5_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_2ARGMSI9_1_1.RMAN
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_STANDBY_CONTROL.RMAN

List of Files Which Where Not Cataloged

=======================================
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_2016_09_25_backup_list.log
  RMAN-07517: Reason: The file header is corrupted
File Name: C:\RMAN_RESTORE\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_TRACE_CONTROL.RMAN
  RMAN-07517: Reason: The file header is corrupted

7. Restore the database
Once cataloged, I restore the database files:

restore database;

OUTPUT
RMAN> restore database;

Starting restore at 26-SEP-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1712 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to F:\ORACLE\ORADATA\<STANDBY_SID>\SYSTEM01.DBF
channel ORA_DISK_1: restoring datafile 00002 to F:\ORACLE\ORADATA\<STANDBY_SID>\SYSAUX01.DBF
channel ORA_DISK_1: restoring datafile 00003 to F:\ORACLE\ORADATA\<STANDBY_SID>\UNDOTBS1_01.DBF
channel ORA_DISK_1: restoring datafile 00004 to F:\ORACLE\ORADATA\<STANDBY_SID>\APEX_DATA01.DBF
channel ORA_DISK_1: restoring datafile 00005 to F:\ORACLE\ORADATA\<STANDBY_SID>\AUDIT01.DBF
channel ORA_DISK_1: restoring datafile 00006 to F:\ORACLE\ORADATA\<STANDBY_SID>\USERS01.DBF
channel ORA_DISK_1: restoring datafile 00007 to F:\ORACLE\ORADATA\<STANDBY_SID>\ISSUES01.DBF
channel ORA_DISK_1: restoring datafile 00008 to F:\ORACLE\ORADATA\<STANDBY_SID>\PERF_DATA01.DBF
channel ORA_DISK_1: restoring datafile 00009 to F:\ORACLE\ORADATA\<STANDBY_SID>\UNDOTBS2_01.DBF
channel ORA_DISK_1: restoring datafile 00010 to F:\ORACLE\ORADATA\<STANDBY_SID>\USERS01.DBF
channel ORA_DISK_1: restoring datafile 00011 to F:\ORACLE\ORADATA\<STANDBY_SID>\ISSUE02.DBF
channel ORA_DISK_1: restoring datafile 00012 to F:\ORACLE\ORADATA\<STANDBY_SID>\USERS02.DBF
channel ORA_DISK_1: reading from backup piece H:\RMAN_BACKUPS\<STANDBY_SID>_20RGMRRN_1_1.RMAN


RMAN> exit

Recovery Manager complete.

E:\Oracle\product\11.2.0.3\dbhome_1\BIN>

8. Restore the standby control file
Now I restore the standby control file backup taken from the original backup:

rman target / nocatalog
shutdown immediate
startup nomount
restore controlfile from 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_STANDBY_CONTROL.RMAN';
exit

OUTPUT
E:\Oracle\product\11.2.0.3\dbhome_1\BIN>rman target / nocatalog

Recovery Manager: Release 11.2.0.3.0 - Production on Tue Sep 26 

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: <STANDBY_SID> (DBID=1010768491, not open)
using target database control file instead of recovery catalog

RMAN> shutdown immediate

database dismounted
Oracle instance shut down

RMAN> startup nomount

connected to target database (not started)
Oracle instance started

Total System Global Area   12827369472 bytes

Fixed Size                     2293232 bytes
Variable Size               5838471696 bytes
Database Buffers            6979321856 bytes
Redo Buffers                   7282688 bytes

RMAN> restore controlfile from 'C:\RMAN_Restore\<STANDBY_SID>_2016_09_25\<STANDBY_SID>_STANDBY_CONTROL.RMAN';

Starting restore at 26-SEP-16
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=1522 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=F:\ORACLE\ORADATA\<STANDBY_SID>\CONTROL01.CTL
output file name=G:\ORACLE\ORADATA\<STANDBY_SID>\CONTROL02.CTL
output file name=H:\ORACLE\ORADATA\<STANDBY_SID>\CONTROL03.CTL
Finished restore at 26-SEP16

RMAN> exit
Recovery Manager complete.

E:\Oracle\product\11.2.0.3\dbhome_1\BIN>

9. Mount the database as a physcial standby

Startup and mount the database in standby mode using:
alter database mount standby database;

Note that the key words "standby database" are optional according to the documentation, the db will know what it is to be mounted as per the control file.
SQL> startup nomount;
ORACLE instance started.

Total System Global Area 1.2827E+10 bytes
Fixed Size                  2267184 bytes
Variable Size            7918847952 bytes
Database Buffers         4898947072 bytes
Redo Buffers                7307264 bytes
SQL> alter database mount standby database;

Database altered.

SQL>
10. Checkout the new database
Check the database status
C:\Users\Administrator>sqlplus /nolog

SQL*Plus: Release 11.2.0.3.0 Production on Tue Sep 26 
Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> conn / as sysdba
Connected.
SQL> select open_mode,database_role from v$database;

OPEN_MODE            DATABASE_ROLE
-------------------- ----------------
MOUNTED              PHYSICAL STANDBY

SQL>

11. Apply database logs
Now all I do is load the log files into the archive log dest or sync them using robocopy and periodically run the following through a scheduler:

recover automatic standby database;
cancel
exit

If done correctly, my alert log should have a surge of log files consumed through media recovery, with many lines reading as follows:
Media Recovery Log G:\ORACLE\ORADATA\ARCHIVE\<STANDBY_SID>\<STANDBY_SID>_0922977743_0001_0000001053.ARC
Media Recovery Log G:\ORACLE\ORADATA\ARCHIVE\<STANDBY_SID>\<STANDBY_SID>_0922977743_0001_0000001054.ARC
Media Recovery Log G:\ORACLE\ORADATA\ARCHIVE\<STANDBY_SID>\<STANDBY_SID>_0922977743_0001_0000001055.ARC

Eventually, the alert log should spit out the following error:
Errors with log G:\ORACLE\ORADATA\ARCHIVE\<STANDBY_SID>\<STANDBY_SID>_0922977743_0001_0000001055.ARC
ORA-00308: cannot open archived log 'G:\ORACLE\ORADATA\ARCHIVE\<STANDBY_SID>\<STANDBY_SID>_0922977743_0001_0000001055.ARC'

That is to be expected - the database has eaten all the logs, and it would like more, but it cannot have them until the primary database sheds more logs. So don't be surprised to see Media Recovery Log messages punctuated with error logs - each error depicts the next log the standby database is expecting.

12. Monitoring
Once the system is self reliant and able to copy and apply logs, I will need to monitor the difference in time between the primary and standby. I do this by measuring the difference in hours between the current date and the timestamp of the last applied log. I can use the below script with an email scheduler to help keep an eye on the pair. If the time begins falling behind and the hour difference becomes > 1 (considering the pair in close proximity within the site), I will know there is a problem and I will need to investigate.

conn / as sysdba

define p_log_dir='&1'

set echo off
set verify off
set linesize 1000
set pages 100

spool &1.\standby_hours_behind.log

PROMPT <STANDBY_SID> DATABASE
PROMPT ======================

select trunc(24*(sysdate-max(first_time)),2) as hours_behind,
       max(sequence#) as last_applied_sequence_number,
       max(sequence#)+1 as next_sequence_number
  from v$log_history;

spool off


OUTPUT:
SQL> @H:\scripts\sql\standby_hours_behind.sql h:\scripts
Connected.
<STANDBY_SID> DATABASE
======================

HOURS_BEHIND LAST_APPLIED_SEQUENCE_NUMBER NEXT_SEQUENCE_NUMBER
------------ ---------------------------- --------------------
         .36                         1136                 1137

The system is just over 20 minutes apart - considering it rotates logs every 10 minutes, this is acceptable for the site.

No comments:

Post a Comment