How to Safely Reboot Exadata Machine

There are times, you need to reboot your Exadata machine. Since Exadata machine has many components like compute nodes, storage nodes and network devices, rebooting Exadata machine is little different than rebooting any other database host. It is important to follow a sequence presented in this blog and make sure to have proper approvals for rebooting Exadata machine. It is also important to understand that you will lose all of your storage indexes, flash cache and it will take some time for Exadata machine to recreate them once it’s back online. If your application is heavily dependent on storage indexes and flash cache, you might experience some performance issues for next few hours after the reboot.

Plan

  • Take a snap shot of all the services running on Exadata Machine
  • Review /etc/oratab file and make sure all the instances are define properly
  • Review storage cell services
  • Check ASM disk groups and disks statuses
  • Make sure you  have approved change request to reboot Exadata machine
  • Alert owners and users of Exadata machine about this reboot before hand
  • Blackout OEM monitoring for target Exadata Machine

Stop all database instances

  • Stop all the db’s using SRVCTL command
Srvctl stop database –d <XYZ>

Stop CRS service

  • Stop Oracle Cluster CRS using the Following command as Root user
GRID_HOME/grid/bin/crsctl stop cluster all
  • Check if all the services are stopped gracefully otherwise user –f option to stop them forcefully
GRID_HOME/grid/bin/crsctl stop cluster all –f

Reboot Storage Cells

  • Once CRS and databases are down you can reboot Storage cells using the following Command as root user from any compute node
dcli -l root -g cell_group shutdown -r -y now

Note: – Above command will require root user equivalence being setup between all nodes and cell_group being created, otherwise login to each storage node as root user and execute the following command

shutdown -r -y now
  • Verify all the storage cells are back up successfully using following commands
dcli –l root –g cell_group uptime

dcli –l root –g cell_group “su – celladmin –c\”cellcli –e list cell detail \”” | grep Status;

Reboot Compute Nodes

  • Reboot compute nodes using the following Command as root user from any compute node
dcli -l root -g dbs_group shutdown -r -y now

Note: – Above command will require root user equivalence being setup between all nodes and dbs_group being created, otherwise login to each compute node as root user and execute the following command

shutdown -r -y now
  • Verify all the storage cells are back up successfully using following commands
dcli –l root –g dbs_group uptime

Verify CRS and databases

  • Verify CRS services using the following command as root user
GRID_HOME/grid/bin/crsctl stat res -t
  • Verify all the database instances came back up online
dcli -l root –g dbs_group ps -ef | grep smon

 

Oracle ZFS Storage Pool Data Profile Best Practices

Hello everyone, recently I was part of Oracle ZFS storage Pool design discussion, mostly focused on data profile types and Oracle best practices. Oracle recommend Mirrored data profile for many ZFS storage used cases including RMAN traditional backup and image backups for best performance and availability. I strongly recommend using mirrored pool production systems. Additionally, you can use double parity or triple parity, wide stripes for non-production systems if performance is not a major concern. Believing picture say a thousand words, please see below chart representing availability, performance and capacity detail of a 70 GB storage pool.  As you see from below chart Stripe data profile will provide you the most capacity without providing availability which can lead to a data loss. Additionally, you can see Mirrored data profile provide you both performance and availability.

Note: – Above figure is based on 70GB storage pool storage capacity

Please see below detail description of all available data profiles: 

Double parity: Each array stripe contains two parity disks, yielding high availability while increasing capacity over mirrored configurations. Double parity striping is recommended for workloads requiring little or no random access, such as backup/restore.

Mirrored: Duplicate copies of data yield fast and reliable storage by dividing access and redundancy evenly between two sets of disks. Mirroring is intended for workloads favoring high performance and availability over capacity, such as databases. When storage space is ample, consider triple mirroring for increased throughput and data protection at the cost of one-third total capacity.

Single parity, narrow stripes: Each narrow stripe assigns one parity disk for each set of three data disks, offering better random read performance than double parity stripes and larger capacity than mirrored configurations. Narrow stripes can be effective for configurations that are neither heavily random nor heavily sequential as it offers a compromise between the two access patterns.

Striped: Data is distributed evenly across all disks without redundancy, maximizing performance and capacity, but providing no protection from disk failure whatsoever. Striping is recommended only for workloads in which data loss is an acceptable tradeoff for marginal gains in throughput and storage space.

Triple mirrored: Three redundant copies of data yield a very fast and highly reliable storage system. Triple mirroring is recommended for workloads requiring both maximum performance and availability, such as critical databases. Compared to standard mirroring, triple mirrored storage offers increased throughput and an added level of protection against disk failure at the expense of capacity.

Triple parity, wide stripes : Each wide stripe has three disks for parity and allocates more data disks to maximize capacity. Triple parity is not generally recommended due to its limiting factor on I/O operations and low random access performance, however these effects can be mitigated with cache.