How to Safely Reboot Exadata Machine

There are times, you need to reboot your Exadata machine. Since Exadata machine has many components like compute nodes, storage nodes and network devices, rebooting Exadata machine is little different than rebooting any other database host. It is important to follow a sequence presented in this blog and make sure to have proper approvals for rebooting Exadata machine. It is also important to understand that you will lose all of your storage indexes, flash cache and it will take some time for Exadata machine to recreate them once it’s back online. If your application is heavily dependent on storage indexes and flash cache, you might experience some performance issues for next few hours after the reboot.

Plan

  • Take a snap shot of all the services running on Exadata Machine
  • Review /etc/oratab file and make sure all the instances are define properly
  • Review storage cell services
  • Check ASM disk groups and disks statuses
  • Make sure you  have approved change request to reboot Exadata machine
  • Alert owners and users of Exadata machine about this reboot before hand
  • Blackout OEM monitoring for target Exadata Machine

Stop all database instances

  • Stop all the db’s using SRVCTL command
Srvctl stop database –d <XYZ>

Stop CRS service

  • Stop Oracle Cluster CRS using the Following command as Root user
GRID_HOME/grid/bin/crsctl stop cluster all
  • Check if all the services are stopped gracefully otherwise user –f option to stop them forcefully
GRID_HOME/grid/bin/crsctl stop cluster all –f

Reboot Storage Cells

  • Once CRS and databases are down you can reboot Storage cells using the following Command as root user from any compute node
dcli -l root -g cell_group shutdown -r -y now

Note: – Above command will require root user equivalence being setup between all nodes and cell_group being created, otherwise login to each storage node as root user and execute the following command

shutdown -r -y now
  • Verify all the storage cells are back up successfully using following commands
dcli –l root –g cell_group uptime

dcli –l root –g cell_group “su – celladmin –c\”cellcli –e list cell detail \”” | grep Status;

Reboot Compute Nodes

  • Reboot compute nodes using the following Command as root user from any compute node
dcli -l root -g dbs_group shutdown -r -y now

Note: – Above command will require root user equivalence being setup between all nodes and dbs_group being created, otherwise login to each compute node as root user and execute the following command

shutdown -r -y now
  • Verify all the storage cells are back up successfully using following commands
dcli –l root –g dbs_group uptime

Verify CRS and databases

  • Verify CRS services using the following command as root user
GRID_HOME/grid/bin/crsctl stat res -t
  • Verify all the database instances came back up online
dcli -l root –g dbs_group ps -ef | grep smon