Reducing Exadata Active Cores on Compute nodes

Recently, I had an opportunity to deployment eight rack Exadata Machine. As you might already know that it will require reducing active CPU cores on both compute nodes and storage nodes. As per Oracle documentation , this can all be done during Exadata deployment. Make sure you have reduce active CPU cores during OEDA process using capacity on demand section. In my case , Exadata deployment (OEDA) didn’t reduce active cores and i had to manually reduce cores on both DB nodes.

Problem Description : You can clearly see below Exadata deployment process just skipped compute nodes and only reduced CPU cores on storage nodes.

[root@node1 linux-x64]# ./install.sh -cf Intellitrans-ex.xml -s 2 
Initializing 
Executing Update Nodes for Eighth Rack 

Skip Eighth rack configuration in compute node node1 

running setup on: celadm01 
running setup on: celadm03 
running setup on: celadm02 
cellnode3 total CPU cores set from 20 to 10 
cellnode2 needs total CPU cores set from 20 to 10 
cellnode31 needs total CPU cores set from 20 to 10 

Skip Eighth rack configuration in compute node node2 

Successfully completed execution of step Update Nodes for Eighth Rack [elapsed Time [Elapsed = 36051 mS [0.0 minutes] Fri Jul 13 20:31:36 EDT 2018]]
 
[root@node1 linux-x64]# dbmcli -e LIST DBSERVER attributes coreCount 
24/24 

Solution : alter dbserver pendingCoreCount=10 force ( repeat on all db nodes )

[root@node1 linux-x64]# dbmcli -e alter dbserver pendingCoreCount=10 force

Note :- reboot Exadata nodes 

[root@node1 linux-x64]# dbmcli -e LIST DBSERVER attributes coreCount

         10/24

 

The vm.min_free_kbytes configuration is not set as recommended

I saw following issue issue during Exachk review of one of my Exadata deployment. After working with Oracle support and deployment team , it was declare a BUG and will be fixed in future exachk release. But i will still recommend opening an SR with Oracle supprot if we see this issue being report in your exachk report.

Problem Description 
--------------------------------------------------- 
CRITICAL => The vm.min_free_kbytes configuration is not set as recommended 

DATA FROM EXDBADM01 FOR VERIFY THE VM.MIN_FREE_KBYTES CONFIGURATION 

FAILURE: vm.min_free_kbytes is not set as recommended: 
socket count: 1 
minimum size: -1 
in sysctl.conf: 524288 
in active memory: 524288 

Status on nod2: 
CRITICAL => The vm.min_free_kbytes configuration is not set as recommended 

DATA FROM node2 FOR VERIFY THE VM.MIN_FREE_KBYTES CONFIGURATION 

FAILURE: vm.min_free_kbytes is not set as recommended: 
socket count: 1 
minimum size: -1 
in sysctl.conf: 524288 
in active memory: 524288 

Error Codes 
--------------------------------------------------- 
FAILURE: vm.min_free_kbytes is not set as recommended:

 

Clone Oracle Database Home on Exadata Machine

I was asked to clone database home during one of my Exadata deployment project. We wanted to have additional Database home for patching and isolation purposes but its a topic for different blog.  you can use following guidelines to clone database blogs on Exadata machine.

Note :- These steps needs to be performed on all DB nodes.

Step 1 : Create directory or new mount for database home. It’s best to have separate mount for different database homes on Exadata Machine.

mkdir -p /u01/app/oracle/product/11.2.0.4/dbhome_2

Step 2 : Copy all files using root user to new database home (dbhome_2)

[root@exdbadm01 dbhome_1]# cp * -rp /u01/app/oracle/product/11.2.0.4/dbhome_2/

Step 3 : Links RDS required only for Exadata Machine

Set ORACLE_HOME environment variable
cd $ORACLE_HOME/rdbms/lib
make -f $ORACLE_HOME/rdbms/lib/ins_rdbms.mk ipc_rds ioracle

Step 4 : Clone and relink db home using Oracle OUI install in silent mode.

./runInstaller -silent -clone ORACLE_BASE=”/u01/app/oracle” ORACLE_HOME=”/u01/app/oracle/product/11.2.0.4/dbhome_2″ ORACLE_HOME_NAME=”OraDb11g_home2″

export ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_2

cd $ORACLE_HOME/oui/bin

[oracle@node1 bin]$ ./runInstaller -silent -clone ORACLE_BASE="/u01/app/oracle" ORACLE_HOME="/u01/app/oracle/product/11.2.0.4/dbhome_2" ORACLE_HOME_NAME="OraDb11g_home2"
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB. Actual 24575 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2018-06-27_05-04-52PM. Please wait ...[oracle@node1 bin]$ Oracle Universal Installer, Version 11.2.0.4.0 Production
Copyright (C) 1999, 2013, Oracle. All rights reserved.

You can find the log of this install session at:
/u01/app/oraInventory/logs/cloneActions2018-06-27_05-04-52PM.log
.................................................................................................... 100% Done.

Installation in progress (Wednesday, June 27, 2018 5:04:57 PM EDT)
............................................................................... 79% Done.
Install successful

Linking in progress (Wednesday, June 27, 2018 5:05:00 PM EDT)
Link successful

Setup in progress (Wednesday, June 27, 2018 5:05:17 PM EDT)
Setup successful

End of install phases.(Wednesday, June 27, 2018 5:05:38 PM EDT)
WARNING:
The following configuration scripts need to be executed as the "root" user.
/u01/app/oracle/product/11.2.0.4/dbhome_2/root.sh
To execute the configuration scripts:
1. Open a terminal window
2. Log in as "root"
3. Run the scripts

The cloning of OraDb11g_home2 was successful.
Please check '/u01/app/oraInventory/logs/cloneActions2018-06-27_05-04-52PM.log' for more details.

 

Deconfigure/Reconfigure Exadata node from CRS

Problem Description:

Few days back i started working on Exadata GI upgrade to 12.2 from 12. and ran into a problem upgrading node 1. We had to cancel the upgrade and start rolling back CRS to 12.1. This where we ran into following problem

First we tried to start CRS home after restoring old GRID home from backup but it seems like upgrade process deconfig 12.1 CRS home on node 1. We couldn’t start old CRS 12.1 from node 1.

[oracle@node1 bin]$ ./crsctl start crs 
CRS-4047: No Oracle Clusterware components configured. 
CRS-4000: Command Start failed, or completed with errors.

We could still see other nodes in the cluster but not node 1.

[oracle@node2 ~]$ olsnodes -n -t 
node2 2 Unpinned 
node3 3 Unpinned

Solution :

we wanted rollback node 1 to previous state so we can try the upgrade again. We solved this problem by configuring 12.1 again.

first make sure 12.1 CRS has been deconfig properly.

/u01/app/12.1.0.2/grid/crs/install/rootcrs.pl -deconfig -force

Then Run root.sh from 12.1 CRS home.

/u01/app/12.1.0.2/grid/root.sh

Performing root user operation.

The following environment variables are set as:
ORACLE_OWNER= oracle
ORACLE_HOME= /u01/app/12.1.0.2/grid
Copying dbhome to /usr/local/bin ...
Copying oraenv to /usr/local/bin ...
Copying coraenv to /usr/local/bin ...

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Relinking oracle with rac_on option
Using configuration parameter file: /u01/app/12.1.0.2/grid/crs/install/crsconfig_params
2018/07/20 23:16:17 CLSRSC-4001: Installing Oracle Trace File Analyzer (TFA) Collector.

2018/07/20 23:16:17 CLSRSC-4002: Successfully installed Oracle Trace File Analyzer (TFA) Collector.

2018/07/20 23:16:18 CLSRSC-363: User ignored prerequisites during installation

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'dm01dbadm01'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'node1'
CRS-2673: Attempting to stop 'ora.evmd' on 'node1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'node1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'node1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'node1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'node1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'node1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'node1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'node1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'node1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'nod01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'node1'
CRS-2677: Stop of 'ora.cssd' on 'node1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'dm01dbadm01'
CRS-2673: Attempting to stop 'ora.diskmon' on 'node1'
CRS-2677: Stop of 'ora.gipcd' on 'node1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'node1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
CRS-4123: Starting Oracle High Availability Services-managed resources
CRS-2672: Attempting to start 'ora.mdnsd' on 'node1'
CRS-2672: Attempting to start 'ora.evmd' on 'node1'
CRS-2676: Start of 'ora.mdnsd' on 'node1' succeeded
CRS-2676: Start of 'ora.evmd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'node1'
CRS-2676: Start of 'ora.gpnpd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'node1'
CRS-2676: Start of 'ora.gipcd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'node1'
CRS-2676: Start of 'ora.cssdmonitor' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'node1'
CRS-2672: Attempting to start 'ora.diskmon' on 'node1'
CRS-2676: Start of 'ora.diskmon' on 'node1' succeeded
CRS-2676: Start of 'ora.cssd' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'node1'
CRS-2672: Attempting to start 'ora.ctssd' on 'node1'
CRS-2676: Start of 'ora.ctssd' on 'node1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'node1' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'node1'
CRS-2681: Clean of 'ora.asm' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'node1'
CRS-2676: Start of 'ora.asm' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'node1'
CRS-2676: Start of 'ora.storage' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'node1'
CRS-2676: Start of 'ora.crf' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'node1'
CRS-2676: Start of 'ora.crsd' on 'node1' succeeded
CRS-6023: Starting Oracle Cluster Ready Services-managed resources
CRS-6017: Processing resource auto-start for servers: node1
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'node2'
CRS-2672: Attempting to start 'ora.net1.network' on 'node1'
CRS-2676: Start of 'ora.net1.network' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.ons' on 'node1'
CRS-2673: Attempting to stop 'ora.node1.vip' on 'node3'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'node2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'node2'
CRS-2677: Stop of 'ora.node1.vip' on 'node3' succeeded
CRS-2672: Attempting to start 'ora.node1.vip' on 'node1'
CRS-2677: Stop of 'ora.scan1.vip' on 'node2' succeeded
CRS-2672: Attempting to start 'ora.scan1.vip' on 'node1'
CRS-2676: Start of 'ora.ons' on 'node1' succeeded
CRS-2676: Start of 'ora.node1.vip' on 'node1' succeeded
CRS-2676: Start of 'ora.scan1.vip' on 'node1' succeeded
CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'node1'
CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'node1' succeeded
CRS-6016: Resource auto-start has completed for server node1
CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources
CRS-4123: Oracle High Availability Services has been started.
2018/07/20 23:18:19 CLSRSC-343: Successfully started Oracle Clusterware stack

clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 12c Release 1.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.

 

How to clear Exadata Storage Alerts

There are times you need to clear Exadata storage alerts. Its very important that you investigate and resolve the issue before clearing any storage  alers.  Additionally, you want to make a note of storage alert before you clear that alert. You can follow below steps to clear storage alert on one or all storage cells

Step 1 : Login to cellcli utility 

[root@cell01 ~]# cellcli
CellCLI: Release 18.1.4.0.0 - Production on Wed Jun 27 19:32:28 EDT 2018

Copyright (c) 2007, 2016, Oracle and/or its affiliates. All rights reserved.


Step 2 : Validate cell configuration 

CellCLI> ALTER CELL VALIDATE CONFIGURATION ;
Cell exceladm01 successfully altered

Step 3 : List Exadata Storage Alerts 

CellCLI> list alerthistory
1 2018-06-13T11:09:48-04:00 critical "ORA-00700: soft internal error, arguments: [main_21], [11], [Not enough open file descriptors], [], [], [], [], [], [], [], [], []"
2 2018-06-13T11:35:06-04:00 critical "RS-700 [No IP found in Exadata config file] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
3_1 2018-06-25T13:26:17-04:00 critical "Configuration check discovered the following problems: Verify network configuration: 

3_2 2018-06-26T13:25:17-04:00 clear "The configuration check was successful."

Step 4 : Drop all Storage alerts 

CellCLI> drop alerthistory all
Alert 1 successfully dropped
Alert 2 successfully dropped

Step 5 : List storage alerts to validate they are gone

CellCLI> list alerthistory

CellCLI> exit
quitting

Step 6 : Repeat above steps on all storage cells 

Enabling SSH User Equivalency on Exadata Machine

Passwordless SSH configuration is a mandatory installation requirement. SSH is used during installation to configure cluster member nodes, and SSH is used after installation by configuration assistants, Oracle Enterprise Manager, OPatch, and other features.

In the examples that follow, i used the Root user but same can be done for Oracle or Grid user

Step 1 : Create all_group file

[root@node01 oracle.SupportTools]# pwd
/opt/oracle.SupportTools

[root@node01 oracle.SupportTools]# cat all_group
node01
node02
cell01
cell02
cell03

Step 2 : Generate ssh keys

[root@node01 oracle.SupportTools]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
e1:51:4b:ba:7c:c3:48:e8:e9:5f:2b:f4:3c:11:ea:65 
root@node1
The key's randomart image is:
+--[ RSA 2048]----+
| o |
| . + . |
| . = . |
| . = *. |
| o S.+. |
| . o.E. |
| .o =.. |
| .o.+. |
| .... |
+-----------------+

Step 3 : Copy ssh keys to all nodes 

[root@node01 oracle.SupportTools]# dcli -g ./all_group -l root -k -s '-o StrictHostKeyChecking=no'
root@node01's password:
root@node02's password:
root@cell01's password:
root@cell02's password:
root@cell03's password:
node01: ssh key added
node02: ssh key added
cell01: ssh key added
cell02: ssh key added
cell03: ssh key added

 

 Step 4 : Validate passwordless is working 

[root@node1 oracle.SupportTools]# dcli -g all_group -l root hostname
node01: XXXXXXX
node02: XXXXXXX
cell01: XXXXXXX
cell02: XXXXXXX
cell03: XXXXXXX

 

 

CLSRSC-180: An error occurred while executing the command ‘/bin/rpm -qf /sbin/init’

Hello,

I recently encountered following error during Exadata GI home upgrade from 12.1.0.1 to 12.2.0.1, We encountered this error during the execution of rootupgrade.sh script on node 1 itself.

2018/06/10 02:59:18 CLSRSC-180: An error occurred while executing the command '/bin/rpm -qf /sbin/init' 
Died at /u01/app/12.2.0.1/grid/crs/install/s_crsutils.pm line 2372. 
The command '/u01/app/12.2.0.1/grid/perl/bin/perl -I/u01/app/12.2.0.1/grid/perl/lib -I/u01/app/12.2.0.1/grid/crs/install /u01/app/12.2.0.1/grid/crs/install/rootcrs.pl -upgrade' execution failed 

I thought of furthur investigating this issue by running target command manually and i got following error. These errors were also logged in installation logfile.

[root@dm01dbadm01 ~]# /bin/rpm -qf /sbin/init 
rpmdb: Thread/process 261710/140405403039488 failed: Thread died in Berkeley DB library 
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery 
error: cannot open Packages index using db3 - (-30974) 
error: cannot open Packages database in /var/lib/rpm 
rpmdb: Thread/process 261710/140405403039488 failed: Thread died in Berkeley DB library 
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery 
error: cannot open Packages database in /var/lib/rpm 
rpmdb: Thread/process 261710/140405403039488 failed: Thread died in Berkeley DB library 
error: db3 error(-30974) from dbenv->failchk: DB_RUNRECOVERY: Fatal error, run database recovery 
error: cannot open Packages database in /var/lib/rpm 
file /sbin/init is not owned by any package 
You have new mail in /var/spool/mail/root 

 

Issue was related corruption of OS level database RPM. We can validate this issue by running following command.

# /bin/rpm -qa | more

 

We had to fix RPM corruption issue by using following and so we can continue our Exadata upgrade.

As root OS user run the following: 
# rm -f /var/lib/rpm/__* 
# /bin/rpm --rebuilddb 
# echo $?

 

After rebuilding corrupted RPMs , using following command to validate them.

# /bin/rpm -qa | more

 

 

 

 

 

ERROR : 192.168.1.1 is responding to ping request

I recently ran into a following error while running Exadata checkip script during Exadata deployment process.

Processing section FACTORY
ERROR : 192.168.1.1 is responding to ping request

I checked and realized above IP is being used by another device on the network. Good news ! As per Oracle Exadata Manual this is a factory default IP used by older Exadata Machines and we can ignore this error.

As Per Oracle Exadata Manual (2.5 Default IP addresses) ,  In earlier releases, Oracle Exadata Database Machine had default IP addresses set at the factory, and the range of IP addresses was 192.168.1.1 to 192.168.1.203.

./ggsci: error while loading shared libraries: libnnz11.so

This issue is related environment variable. Please fix them using the following.

[ggate@dbadm01 oradb11]$ ./ggsci
./ggsci: error while loading shared libraries: libnnz11.so: cannot open shared object file: No such file or directory

export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0.4a/dbhome_1/lib

[ggate@dm02dbadm01 oradb11]$ export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0.4a/dbhome_1/lib
[ggate@dm02dbadm01 oradb11]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.1 OGGCORE_12.2.0.1.0_PLATFORMS_151211.1401_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Dec 12 2015 00:54:38
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.

GGSCI (dm02dbadm01.abcfinancial.net) 1>

or

[ggate@dm02dbadm01 oradb11]$ ln -s /u01/app/oracle/product/11.2.0.4a/dbhome_1/lib/libnnz11.so libnnz11.so
[ggate@dm02dbadm01 oradb11]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 12.2.0.1.1 OGGCORE_12.2.0.1.0_PLATFORMS_151211.1401_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Dec 12 2015 00:54:38
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2015, Oracle and/or its affiliates. All rights reserved.

GGSCI (dm02dbadm01.abcfinancial.net) 1>

 

ORA-46238: Database user or role does not exist during Upgrade to 12c using dbua

I recently ran into “ORA-46238: Database user or role does not exist ” issue while trying to upgrade Oracle database from 11g to 12c using DBCA.

You will see something like this in your logfile 

ERROR at line 1: 
ORA-46238: Database user or role '"BETADATASECURE"' does not exist 
ORA-06512: at "SYS.XS_ACL", line 93 
ORA-06512: at "SYS.XS_ADMIN_UTIL", line 53 
ORA-06512: at "SYS.XS_ACL_INT", line 126 
ORA-01403: no data found 
ORA-06512: at "SYS.XS_ACL_INT", line 122 
ORA-06512: at "SYS.XS_ACL_INT", line 493 
ORA-06512: at "SYS.XS_ACL", line 83 
ORA-06512: at "SYS.XS_OBJECT_MIGRATION", line 190 
ORA-06512: at "SYS.XS_OBJECT_MIGRATION", line 190 
ORA-06512: at line 56 
ORA-06512: at line 104

Reason : – You have dropped the user but there are still some permission lingering out there for that user. You can using the following.

SQL> SELECT a.object_id ACL_ID, b.principal, b.privilege
2 FROM xdb.xdb$acl a,
3 xmltable(xmlnamespaces(DEFAULT 'http://xmlns.oracle.com/xdb/acl.xsd'),
4 '/acl/ace' passing a.object_value
5 columns
6 principal VARCHAR2(30) path '/ace/principal',
privilege xmltype path '/ace/privilege') b
7 8 WHERE b.principal = 'BETADATASECURE';

ACL_ID PRINCIPAL
-------------------------------- ------------------------------
PRIVILEGE
--------------------------------------------------------------------------------
6013F2CBD4F65F5CE040007F01001457 BETADATASECURE
<privilege xmlns="http://xmlns.oracle.com/xdb/acl.xsd">
<plsql:connect xmlns:p

Drop permission : – 

connect / as sysdba
BEGIN
DBMS_NETWORK_ACL_ADMIN.delete_privilege (
acl => '/sys/acls/qualdatasecure.xml',
principal => 'BETADATASECURE',
is_grant => TRUE,
privilege => 'connect');
COMMIT;
END;
/