
For the original discovery of the bug - see here
For the fragility of the cluster - see here
A recap: the command that is left open every 6 hours during the CVU health check is:
C:\Windows\system32\cmd.exe /K E:\OracleGrid\11.2.0.3\bin\cluvfy comp health -_format
A screenshot can be found here when the orphaned session grows in numbers:
For the memory leak fix, I did the following.
I backup a copy of the bat file: E:\OracleGrid\11.2.0.3\bin\cluvfy.bat
I then open the original cluvfy,bat in notepad and carefully make the following changes
CHANGE 1
FROM:
if not (%CRSHOME%)==() (
@set "CV_HOME=%CRSHOME%"
)
set CMDPATH=%~dp0
TO:
if not (%CRSHOME%)==() (
@set "CV_HOME=%CRSHOME%"
)
set EXIT_OPTION=
if "%CVU_RESOURCE_OPTIONS%"=="" set EXIT_OPTION=/B
set CMDPATH=%~dp0
CHANGE 2
FROM:
exit /B %errorlevel%
goto done
:ERROR
exit /B 1
TO:
exit %EXIT_OPTION% %errorlevel%
goto done
:ERROR
exit %EXIT_OPTION% 1
Once the change is made, save and copy the file across to node 2. I can also move the CVU to avoid creating any unwanted issues using while carrying out the above changes
srvctl relocate cvu -n <node name>
However, considering the batch file is executed every 6 hours, the chances are slim.
The idea to carry this out came after looking at the source code of the 11.2.0.4 home - I noted a bug fix by a certain Oracle developer @ Oracle. I diff'd the old and new CVU, evaluated the developers intention, and took only what I needed from his fix to manually patch the CVU myself. I first recreated the problem in the 11.2.0.4 test environment, monitored it for a day, patched it with my fix, and monitored for a few days. Once done satisfied, I took the 'patch' live. This fix will allow the in house DBA and his manager to have a cluster that no longer crashes once a month - forever. The nightmare is over and this solution will suffice until we create a new cluster for 12c. Done fixed it.
No comments:
Post a Comment