Thursday, May 13, 2010

ORA-02019 : connection description for remote database not found

TRADUCCION EN PROGRESO ...

This error has been appearing on a radom base at the alert.log file. There is no apparent reason for this to appear. And when I take a look at the detailes provided on the trace file the only thing I see is the same error meaningless pattern, the connection descriptor was not found.
In this particular case the contents at the alert.log file showed this information:

Tue Mar 09 15:45:43 2010

Errors in file ...\trace\orapro_reco_4892.trc:

ORA-02019: connection description for remote database not found

From the generated trace file name, it states that it is the RECO background process the one responsible for this trace. RECO is used in distributed transactions, and it happens that when a database goes down RECO tries to resend the pending transactions according to the Two Phase Commit procedure.
In this case there it used to be a database attached to the main database, but it was gone long ago, however some non applied transactions remained in the transaction queue, making RECO to keep on retrying forever. The database was deconfigured and all the related files and file descriptors were removed.

By taking a look at the DBA_2PC_PENDING view there there were a couple of transactions that remained in the queue and which were never applied.

SELECT LOCAL_TRAN_ID,GLOBAL_TRAN_ID,STATE,MIXED,COMMIT#
FROM DBA_2PC_PENDING;


LOCAL_TRAN_ID GLOBAL_TRAN_ID STATE MIXED COMMIT#
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
9.21.102540 ORCL.ee36125b.9.21.102540 collecting no 9462810864
8.32.86528 ORCL.ee36125b.8.32.86528 collecting no 9462814781


In this case, and since there is no target database to apply the transactions to, then it is just enough to get rid of them by means of the PURGE_LOST_DB_ENTRY procedure:

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('8.32.86528');
execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('9.21.102540');
commit;


That's it, this way the ORA-02019 error is gone.


Wednesday, May 12, 2010

Data Guard ORA-16789 Health Check Warning

TRADUCCION EN PROCESO

After setting up the Physical Dataguard environment some errors still showed up. Even though it looked fine and the Standby database was properly receiving and applying the archivelogs, the Dataguard Broker still showed some errors

DGMGRL> show configuration verbose;

Configuration
Name: SBDBCONFIG
Enabled: YES
Protection Mode: MaxPerformance
Databases:
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":
Warning: ORA-16608: one or more databases have warnings

Dataguard broker uses an evaluation criteria that is some sort of everything or nothing at all. Either it works fine, or it doesn't but there is no intermediate point. The issue here is that sometimes you have to be patience to find the errors. This is an 11g release, and though the solution found can work on 10g too you should be aware that after the DIAGNOSTIC_DEST parameter in 11g the path to find the log files has changed : ORACLE_BASE/diag/rdbms/ORACLE_SID/ORACLE_SID/trace/drcORACLE_SID.log, the errors displayed there were as follows:


Error Displayed

RSM Error: LogXptMode value 'ASYNC' of requires this database to have status redo logs, but they are not configured.
RSM0: HEALTH CHECK WARNING: ORA-16789: standby redo logs not configured
Operation CTL_GET_STATUS continuing with warning, status = ORA-16789


Solution

In this case it is pretty obvious why it is failing ... there are two solutions, one is to add the required Standby logfiles after the configured LogXptMode ASYNC mode, or change the LogXptMode to ARCH. I choosed the first one,

SQL> ALTER DATABASE ADD STANDBY LOGFILE GROUP 4
2 'C:\ORACLE\APP\ORADATA\SBDB\SBREDO04.RDO' SIZE 16M;
Database altered.


The command is repeated for group 5 and 6. The sizes and paths shown in this example are just for instructional purposes, on the actual production environment the size of the standby redo logs should be the same used for the production logs.


A Second Error Shows Up

The second error shown refers to the fact that since adding the standby redo log files required the recovery process to be stopped, the Data guard manager complained about it.

DGMGRL> show configuration verbose;

Configuration
Name: SBDBCONFIG
Enabled: YES
Protection Mode: MaxPerformance
Databases:
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED


Current status for "SBDBCONFIG":
Warning: ORA-16607: one or more databases have failed


This time the Dataguard logfile reports the following:
DMON: HEALTH CHECK ERROR: ORA-16766: Redo Apply is stopped
Operation CTL_GET_STATUS canceled during phase 1, error = ORA-16766


In this case it was pretty obvious too, the redo apply process was stopped to be able to add the Standby Database while the Active Data Guard was open.

Just restart the Recovery process and that's it.


Solution

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Database altered.

DGMGRL> show configuration verbose;

Configuration
Name: SBDBCONFIG
Enabled: YES
Protection Mode: MaxPerformance
Databases:
PRIDB - Primary database
SBDB - Physical standby database

Fast-Start Failover: DISABLED

Current status for "SBDBCONFIG":
SUCCESS


It works as it is supposed to.


References

Oracle® Data Guard Broker
11g Release 2 (11.2)

Part Number E10702-01

Thursday, May 6, 2010

Error al actualizar el repositorio de RMAN a 11.1.0.7.0

En proceso de traduccion ...

WARNING.


The following post describes a particular issue I wanted to document and share, the solution found for this issue worked for my particular environment, which does not mean it will work with yours. Dealing with internal Oracle structures without Oracle Support Services approval and supervision will render your database unsupported and it may (and most probably will) compromise its availability.

When performing the upgrade of the recovery catalog database an error showed up. This error has to do with a known issue when upgrading the recovery catalog to 11.1.0.7.0 after applying the patchset against the database.
The upgrade procedure corrupts the DBMS_RCVCAT procedure, so it is enough to get rid of it and have the rman catalog owner to upgrade its catalog from an rman prompt, as follows:

1. Connected to a SQL*Plus prompt as the recovery catalog owner issue the following commands:
sqlplus rcatOwner/rcatPassword
SQL> drop package DBMS_RCVCAT;

2. From a Recovery Manager prompt issue the following command:
rman catalog rcatOwner/rcatPassword
RMAN> UPGRADE CATALOG;

recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> UPGRADE CATALOG;

recovery catalog upgraded to version 11.01.00.07
DBMS_RCVMAN package upgraded to version 11.01.00.07
DBMS_RCVCAT package upgraded to version 11.01.00.07

At this moment the procedure has successfully upgraded the repository and the DBMS_RCVCAT stored unit has been properly rebuilt.

BUT ... Yes, the big BUT. In my case a particular issue arose. Instead of reading the above successful message this is what showed up instead:

RMAN> connect catalog rman/rman
connected to recovery catalog database
recovery catalog is partially upgraded to 11.01.00.07; UPGRADE CATALOG again

RMAN> upgrade catalog;
recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> upgrade catalog;
error creating upgcat_57
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-06004: ORACLE error from recovery catalog database: ORA-02293: cannot validate (RMAN.CDF_C_STATUS) - check constraint violated

I hate whe RMAN complains. And it likes to pretty often.

RMAN.CDF_C_STATUS stands for a constraint that is added to the CDF table. This table stores information about the registered datafile copies. This constraint is defined in the ?/rdbms/admin/recover.bsq file and it literally reads as follows:
<<< define upgcat_57
>>>
alter table cdf add constraint cdf_c_status check (status in ('A','U','D','X'))


Notice the clause define upgcat_57 and the error message above, error creating upgcat_57.
Originally the CDF table is created with an offline constraint that can be found in the same file:

CONSTRAINT cdf_c_status CHECK (status in ('A','U','D','X','F')).

But when the upgcat_57 step is executed it magically leaves the 'F' value out.
In my particular case the CDF table, had four registered datafile copies, these copies were already obsoleted, but they were there any way with a 'F' flag. Since the add constraint applied against the repository table it crashed and made the upgrade procedure abort.
The workaround found for this particular case was to update the CDF table and change the 'F' flag in the status column with a 'U' value.
This renders the backups as Unavailable. The flag values are not documented in the file, but as far as I know the 'A' value means Available, meanwhile 'U' stands for unavailable.
After manually updating the column value the procedure to upgrade the repository was run once again from the RMAN prompt, this time it was successful.
In my particular case these backups were taken a very long time ago, and those are no longer available, so there it was no problem at all in my case taking this direction.
However a big issue that comes to my attention is that Oracle either has a bug in this upgrade procedure or it forgot to update the constraint declaration with the proper value set.


Update ...

While I was finding out if this issue could be reproduced on 11g Rel. 2 i found at the recover.bsq file the following lines:

define upgcat_166
<<<>>>

define upgcat_167
<<<>
CHECK (status in ('A','U','D','X','F')) >>>


So it means that someone in Oracle realized about this mistake, then they dropped the current constraint and then add the constraint with all the flags there included.
I can see that if someone has not hit this bug on 11g then most probably they will not realize about it by the time they migrate to 11g Rel. 2.

Monday, May 3, 2010

ORA-42012: error occurred while completing the redefinition

Al momento de intentar utilizar el paquete DBMS_REDEFINITION para redefinir una partición en línea apareció el error descrito a continuación.

dbms_redefinition.finish_redef_table('MADRID', 'a_table', 'an_int_table', 'a_partition');

El siguiente error apareció:

begin
*
ERROR at line 1:
ORA-42012: error occurred while completing the redefinition
ORA-04020: deadlock detected while trying to lock object MADRID.TABLE1_INT
ORA-06512: at "SYS.DBMS_REDEFINITION", line 78
ORA-06512: at "SYS.DBMS_REDEFINITION", line 1680
ORA-06512: at line 2


Cuando intenté eliminar la tabla internia para reiniciar la redefinición en línea del objeto, este procedimiento tuvo un conflicto con un par de objetos creados al vuelo.

SQL> drop table table1_int;
drop table table1_int
*
ERROR at line 1:
ORA-12083: must use DROP MATERIALIZED VIEW to drop "MADRID"."TABLE1_INT"

SQL> DROP MATERIALIZED VIEW "MADRID"."TABLE1_INT";
Materialized view dropped.

SQL> drop table table1_int;
Table dropped.

Este error se debe a un bug en la version 11.2.0 en cualquier plataforma de sistema operativo. Este bug se va a corregir en futuras versiones, entre tanto la manera de rodear el problema es cambiando un parámetro de instancias, solo hay que poner deferred_segment_creation a FALSE y reintentar la operación.