Deconfigure/Reconfigure Exadata node from CRS

Problem Description:

Few days back i started working on Exadata GI upgrade to 12.2 from 12. and ran into a problem upgrading node 1. We had to cancel the upgrade and start rolling back CRS to 12.1. This where we ran into following problem

First we tried to start CRS home after restoring old GRID home from backup but it seems like upgrade process deconfig 12.1 CRS home on node 1. We couldn’t start old CRS 12.1 from node 1.

[oracle@node1 bin]$ ./crsctl start crs 
CRS-4047: No Oracle Clusterware components configured. 
CRS-4000: Command Start failed, or completed with errors.

We could still see other nodes in the cluster but not node 1.

[oracle@node2 ~]$ olsnodes -n -t 
node2 2 Unpinned 
node3 3 Unpinned

Solution :

we wanted rollback node 1 to previous state so we can try the upgrade again. We solved this problem by configuring 12.1 again.

first make sure 12.1 CRS has been deconfig properly.

/u01/app/12.1.0.2/grid/crs/install/rootcrs.pl -deconfig -force

Then Run root.sh from 12.1 CRS home.

/u01/app/12.1.0.2/grid/root.sh

Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/12.1.0.2/grid
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.1.0.2/grid/crs/install/crsconfig_params
2018/07/20 23:16:17 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.

2018/07/20 23:16:17 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.

2018/07/20 23:16:18 CLSRSC-363: User ignored prerequisites during installation

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'dm01dbadm01'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
CRS-2673: Attempting to stop 'ora.evmd' on 'node1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'node1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'node1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'nod01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'dm01dbadm01'
CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
CRS-2672: Attempting to start 'ora.evmd' on 'node1'
CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
CRS-2676: Start of 'ora.evmd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'node1'
CRS-2681: Clean of 'ora.asm' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'node1'
CRS-2676: Start of 'ora.asm' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'node1'
CRS-2676: Start of 'ora.storage' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'node1'
CRS-2676: Start of 'ora.crf' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'node1'
CRS-2676: Start of 'ora.crsd' on 'node1' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: node1
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'node2'
CRS-2672: Attempting to start 'ora.net1.network' on 'node1'
CRS-2676: Start of 'ora.net1.network' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.ons' on 'node1'
CRS-2673: Attempting to stop 'ora.node1.vip' on 'node3'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'node2'
CRS-2677: Stop of 'ora.node1.vip' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.node1.vip' on 'node1'
CRS-2677: Stop of 'ora.scan1.vip' on 'node2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'node1'
CRS-2676: Start of 'ora.ons' on 'node1' succeeded
CRS-2676: Start of 'ora.node1.vip' on 'node1' succeeded
CRS-2676: Start of 'ora.scan1.vip' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'node1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'node1' succeeded
CRS-6016: Resource auto-start has completed for server node1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2018/07/20 23:18:19 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.

 

How to clear Exadata Storage Alerts

There are times you need to clear Exadata storage alerts. Its very important that you investigate and resolve the issue before clearing any storage  alers.  Additionally, you want to make a note of storage alert before you clear that alert. You can follow below steps to clear storage alert on one or all storage cells

Step 1 : Login to cellcli utility 

[root@cell01 ~]# cellcli
CellCLI: Release 18.1.4.0.0 - Production on Wed Jun 27 19:32:28 EDT 2018

Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.


Step 2 : Validate cell configuration 

CellCLI> ALTER CELL VALIDATE CONFIGURATION ;
Cell exceladm01 successfully altered

Step 3 : List Exadata Storage Alerts 

CellCLI> list alerthistory
1 2018-06-13T11:09:48-04:00 critical "ORA-00700: soft internal error, arguments: [main_21], [11], [Not enough open file descriptors], [], [], [], [], [], [], [], [], []"
2 2018-06-13T11:35:06-04:00 critical "RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
3_1 2018-06-25T13:26:17-04:00 critical "Configuration check discovered the following problems: Verify network configuration: 

3_2 2018-06-26T13:25:17-04:00 clear "The configuration check was successful."

Step 4 : Drop all Storage alerts 

CellCLI> drop alerthistory all
Alert 1 successfully dropped
Alert 2 successfully dropped

Step 5 : List storage alerts to validate they are gone

CellCLI> list alerthistory

CellCLI> exit
quitting

Step 6 : Repeat above steps on all storage cells 

Enabling SSH User Equivalency on Exadata Machine

Passwordless SSH configuration is a mandatory installation requirement. SSH is used during installation to configure cluster member nodes, and SSH is used after installation by configuration assistants, Oracle Enterprise Manager, OPatch, and other features.

In the examples that follow, i used the Root user but same can be done for Oracle or Grid user

Step 1 : Create all_group file

[root@node01 oracle.SupportTools]# pwd
/opt/oracle.SupportTools

[root@node01 oracle.SupportTools]# cat all_group
node01
node02
cell01
cell02
cell03

Step 2 : Generate ssh keys

[root@node01 oracle.SupportTools]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
e1:51:4b:ba:7c:c3:48:e8:e9:5f:2b:f4:3c:11:ea:65 
root@node1
The key's randomart image is:
+--[ RSA 2048]----+
| o |
| . + . |
| . = . |
| . = *. |
| o S.+. |
| . o.E. |
| .o =.. |
| .o.+. |
| .... |
+-----------------+

Step 3 : Copy ssh keys to all nodes 

[root@node01 oracle.SupportTools]# dcli -g ./all_group -l root -k -s '-o StrictHostKeyChecking=no'
root@node01's password:
root@node02's password:
root@cell01's password:
root@cell02's password:
root@cell03's password:
node01: ssh key added
node02: ssh key added
cell01: ssh key added
cell02: ssh key added
cell03: ssh key added

 

 Step 4 : Validate passwordless is working 

[root@node1 oracle.SupportTools]# dcli -g all_group -l root hostname
node01: XXXXXXX
node02: XXXXXXX
cell01: XXXXXXX
cell02: XXXXXXX
cell03: XXXXXXX

 

 

CLSRSC-180: An error occurred while executing the command ‘/bin/rpm -qf /sbin/init’

Hello,

I recently encountered following error during Exadata GI home upgrade from 12.1.0.1 to 12.2.0.1, We encountered this error during the execution of rootupgrade.sh script on node 1 itself.

2018/06/10 02:59:18 CLSRSC-180: An error occurred while executing the command '/bin/rpm -qf /sbin/init' 
Died at /u01/app/12.2.0.1/grid/crs/install/s_crsutils.pm line 2372. 
The command '/u01/app/12.2.0.1/grid/perl/bin/perl -I/u01/app/12.2.0.1/grid/perl/lib -I/u01/app/12.2.0.1/grid/crs/install /u01/app/12.2.0.1/grid/crs/install/rootcrs.pl -upgrade' execution failed 

I thought of furthur investigating this issue by running target command manually and i got following error. These errors were also logged in installation logfile.

[root@dm01dbadm01 ~]# /bin/rpm -qf /sbin/init 
rpmdb: Thread/process 261710/140405403039488 failed: Thread died in Berkeley DB library 
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery 
error: cannot open Packages index using db3 - (-30974) 
error: cannot open Packages database in /var/lib/rpm 
rpmdb: Thread/process 261710/140405403039488 failed: Thread died in Berkeley DB library 
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery 
error: cannot open Packages database in /var/lib/rpm 
rpmdb: Thread/process 261710/140405403039488 failed: Thread died in Berkeley DB library 
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery 
error: cannot open Packages database in /var/lib/rpm 
file /sbin/init is not owned by any package 
You have new mail in /var/spool/mail/root 

 

Issue was related corruption of OS level database RPM. We can validate this issue by running following command.

# /bin/rpm -qa | more

 

We had to fix RPM corruption issue by using following and so we can continue our Exadata upgrade.

As root OS user run the following: 
# rm -f /var/lib/rpm/__* 
# /bin/rpm --rebuilddb 
# echo $?

 

After rebuilding corrupted RPMs , using following command to validate them.

# /bin/rpm -qa | more

 

 

 

 

 

ERROR : 192.168.1.1 is responding to ping request

I recently ran into a following error while running Exadata checkip script during Exadata deployment process.

Processing section FACTORY
ERROR : 192.168.1.1 is responding to ping request

I checked and realized above IP is being used by another device on the network. Good news ! As per Oracle Exadata Manual this is a factory default IP used by older Exadata Machines and we can ignore this error.

As Per Oracle Exadata Manual (2.5 Default IP addresses) ,  In earlier releases, Oracle Exadata Database Machine had default IP addresses set at the factory, and the range of IP addresses was 192.168.1.1 to 192.168.1.203.

./ggsci: error while loading shared libraries: libnnz11.so

This issue is related environment variable. Please fix them using the following.

[ggate@dbadm01 oradb11]$ ./ggsci
./ggsci: error while loading shared libraries: libnnz11.so: cannot open shared object file: No such file or directory

export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0.4a/dbhome_1/lib

[ggate@dm02dbadm01 oradb11]$ export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0.4a/dbhome_1/lib
[ggate@dm02dbadm01 oradb11]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.1 OGGCORE_12.2.0.1.0_PLATFORMS_151211.1401_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Dec 12 2015 00:54:38
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.

GGSCI (dm02dbadm01.abcfinancial.net) 1>

or

[ggate@dm02dbadm01 oradb11]$ ln -s /u01/app/oracle/product/11.2.0.4a/dbhome_1/lib/libnnz11.so libnnz11.so
[ggate@dm02dbadm01 oradb11]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.1 OGGCORE_12.2.0.1.0_PLATFORMS_151211.1401_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Dec 12 2015 00:54:38
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.

GGSCI (dm02dbadm01.abcfinancial.net) 1>

 

ORA-46238: Database user or role does not exist during Upgrade to 12c using dbua

I recently ran into “ORA-46238: Database user or role does not exist ” issue while trying to upgrade Oracle database from 11g to 12c using DBCA.

You will see something like this in your logfile 

ERROR at line 1: 
ORA-46238: Database user or role '"BETADATASECURE"' does not exist 
ORA-06512: at "SYS.XS_ACL", line 93 
ORA-06512: at "SYS.XS_ADMIN_UTIL", line 53 
ORA-06512: at "SYS.XS_ACL_INT", line 126 
ORA-01403: no data found 
ORA-06512: at "SYS.XS_ACL_INT", line 122 
ORA-06512: at "SYS.XS_ACL_INT", line 493 
ORA-06512: at "SYS.XS_ACL", line 83 
ORA-06512: at "SYS.XS_OBJECT_MIGRATION", line 190 
ORA-06512: at "SYS.XS_OBJECT_MIGRATION", line 190 
ORA-06512: at line 56 
ORA-06512: at line 104

Reason : – You have dropped the user but there are still some permission lingering out there for that user. You can using the following.

SQL> SELECT a.object_id ACL_ID, b.principal, b.privilege
2 FROM xdb.xdb$acl a,
3 xmltable(xmlnamespaces(DEFAULT 'http://xmlns.oracle.com/xdb/acl.xsd'),
4 '/acl/ace' passing a.object_value
5 columns
6 principal VARCHAR2(30) path '/ace/principal',
privilege xmltype path '/ace/privilege') b
7 8 WHERE b.principal = 'BETADATASECURE';

ACL_ID PRINCIPAL
-------------------------------- ------------------------------
PRIVILEGE
--------------------------------------------------------------------------------
6013F2CBD4F65F5CE040007F01001457 BETADATASECURE
<privilege xmlns="http://xmlns.oracle.com/xdb/acl.xsd">
<plsql:connect xmlns:p

Drop permission : – 

connect / as sysdba
BEGIN
DBMS_NETWORK_ACL_ADMIN.delete_privilege (
acl => '/sys/acls/qualdatasecure.xml',
principal => 'BETADATASECURE',
is_grant => TRUE,
privilege => 'connect');
COMMIT;
END;
/

You do not have sufficient permissions to access the inventory ‘/u01/app/oraInventory/locks’

Sometime back i got the following error while trying to install Oracle GoldenGate on Exadata. This issue can be resovled by changing ” inventory.lock” permission using following method.

Error : – 

[ggate@dbadm01 Disk1]$ ./runInstaller
You do not have sufficient permissions to access the inventory ‘/u01/app/oraInventory/locks’. Installation cannot continue. It is required that the primary group of the install user is same as the inventory owner group. Make sure that the install user is part of the inventory owner group and restart the installer.: Permission denied

[ggate@dm02dbadm01 Disk1]$ ls -ltr
total 43
-rwxr-xr-x+ 1 oracle oinstall 918 Oct 22 2016 runInstaller
drwxr-xr-x+ 11 oracle oinstall 21 Oct 22 2016 stage
drwxr-xr-x+ 2 oracle oinstall 3 Oct 22 2016 response
drwxr-xr-x+ 4 oracle oinstall 11 Oct 22 2016 install

Solution : –

Chmod 770 locks
backup the already existing inventory.lock file
mv inventory.lock inventory.lock_<date>

And restart of the ./runinstaller using the response file fixed the issue.

 

[ggate@dm02dbadm01 Disk1]$ ./runInstaller
Starting Oracle Universal Installer…

Checking Temp space: must be greater than 120 MB. Actual 8704 MB Passed
Checking swap space: must be greater than 150 MB. Actual 23584 MB Passed
Checking monitor: must be configured to display at least 256 colors. Actual 16777216 Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2018-05-02_02-16-39PM. Please wait …[ggate@dbadm01 Disk1]$

 

Reference : –

 

Patch 17030189 is required on your Oracle mining database for trail format RELEASE 12.2 or later.

Please locate  prvtlmpg.plb script in GG home installation directory and execute it as sysdba as work around for “Patch 17030189 is required on your Oracle mining database for trail format RELEASE 12.2 or later.”

[oracle@OGGR2-1 ogg]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Tue Sep 20 12:00:42 2016

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @prvtlmpg.plb

SQL> @prvtlmpg.plb

Oracle GoldenGate Workaround prvtlmpg

This script provides a temporary workaround for bug 17030189.
It is strongly recommended that you apply the official Oracle
Patch for bug 17030189 from My Oracle Support instead of using
this workaround.

This script must be executed in the mining database of Integrated
Capture. You will be prompted for the username of the mining user.
Use a double quoted identifier if the username is case sensitive
or contains special characters. In a CDB environment, this script
must be executed from the CDB$ROOT container and the mining user
must be a common user.

=========================== WARNING ==========================
You MUST stop all Integrated Captures that belong to this mining
user before proceeding!
================================================================

Enter Integrated Capture mining user: ggs

Installing workaround…
No errors.
No errors.
No errors.
Installation completed.

Flashback Oracle Database on Exadata Machine

There are times when you need to flashback Oracle databases running on Exadata Machine. Database restore point is commonly  used during database upgrade or GoldenGate replication. You can using following steps to flashback Oracle database running on Exadata Machine.

Step  1 : Check Database Status using srvctl
[oracle@dm02dba01 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node dm02dba01
Instance orcl2 is running on node dm02dba02
Step 2 : Stop database using srvctl
[oracle@dm02dba01 ~]$ srvctl stop database -d orcl
Step 3 : Start only 1 instance in mount mode
[oracle@dm02dba01 ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Thu May 10 12:40:27 2018

Copyright (c) 1982, 2016, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area 1.0737E+11 bytes
Fixed Size 29888776 bytes
Variable Size 2.8369E+10 bytes
Database Buffers 7.8920E+10 bytes
Redo Buffers 55226368 bytes
Database mounted.

Step 4 : Check list of existing database restore points
SQL> select name,time from v$restore_point;

NAME
--------------------------------------------------------------------------------
TIME
---------------------------------------------------------------------------
upgrade
06-MAY-18 01.53.29.000000000 PM

Step 5 : Flashback database to target restore point
SQL> flashback database to restore point upgrade;

Flashback complete.
Step 6 : Open database instance with resetlogs
SQL> alter database open resetlogs;

Database altered.
Step 7: Shutdown database instance
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
Step 8 : Start database using srvctl
[oracle@dm02dba01 ~]$ srvctl start database -d orcl
[oracle@dm02dba01 ~]$ srvctl status database -d orcl
Instance orcl1 is running on node dm02dba01
Instance orcl2 is running on node dm02dba02