Tuesday 1 April 2014

CRS-0215: Could not start resource 'ora.rac2.ons'


After node addition ONS was not able to start.

CRS-0215: Could not start resource 'ora.rac2.ons'.

Problem :-


onsctl  start
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
   {node = node1, port = 6251}
Setting remote port from OCR repository to 6251
Adding remote host nodename:6251
onscfg[1]
   {node = node2, port = 6251}
Adding remote host sai_db2:6251
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
   {node = node3, port = 6251}
Setting remote port from OCR repository to 6251
Adding remote host nodename:6251
onscfg[1]
   {node = node4, port = 6251}
Adding remote host nodename:6251
onsctl: ons failed to start


 $CRS_HOME/bin/onsctl ping
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = NODE1, port = 6251}
Setting remote port from OCR repository to 6251
Adding remote host SGRAC1:6251
onscfg[1]
{node = NODE2, port = 6251}
Adding remote host SGRAC2:6251
ons is not running …

Trouble shoot :-

 
 
Check the log file "the / u01/app/oracle/product/10.2.0/crs/log/rac2/racg/ora.rac2.ons.log"


There can be multiple issues

Issue 1.

Failed to get IP for localhost (4)
Failed to get IP for localhost (4)
onsctl: ons failed to start




Issue 2

Adding remote host node_name:6251
onsctl: ons failed to start
 

Solution:-

 
Issue 1.
 
Add localhost IP "127.0.0.1" Manually in /etc/hosts.
 
Then retry to start the ONS, it should work.
 
 
Issue 2.
 
add a new configuration with the full host name
 
#racgons add_config  node1:6251
#racgons add_config node2:6251
#racgons add_config node3:6251
#racgons add_config node4:6251


---------------------------------------------------------------------------------------------------------------------------------

Investigating Further


Cause 1:
Host name retrieved from "hostname" command does not match host name registered in OCR. OS hostname returns fully qualified name (with domain name) while OCR shows short name.

Eg:
[oracle@node1 ~]$ hostname
node1.oracle.com

[oracle@node2 ~]$ hostname
node2

ocrdump shows
[DATABASE.ONS_HOSTS.node1]
ORATEXT : node1

[DATABASE.ONS_HOSTS.node2]
ORATEXT : node2

For ons to work, the OS hostname must equal to the hostname in OCR. This explains why node 2 ons is up find while node 1 has problem.

Cause 2:
local or remote port defined for ons are in use by other process. To confirm:

netstat -a |grep <port>
or
as root user:
lsof | grep <port>

Cause 3:
$ORA_CRS_HOME/opmn/conf/ons.conf has wrong ownership or permission. It should be owned by CRS owner, not root user. For example:
-rw-r--r-- 1 oracle oinstall 63 2011-05-30 12:15 ons.config
Cause 4:Mismatch of remote port entry DATABASE.ONS_HOSTS.<host>.PORT for ons in OCR and ons config file on the node.
1. To find ONS port in OCR:
<CRS_HOME>/bin/ocrdump -stdout -keyname DATABASE.ONS_HOSTS
2. To find ons port in ons config:
Check <CRS_HOME>/opmn/conf/ons.config

Solution

Solution 1:
Modify hostname to make "hostname" output same as ONS_HOST registered in OCR,
e.g. on Linux platform:
1. change hostname on node1 to use short name format:
a. edit /etc/sysconfig/network as root user
change HOSTNAME field to short name

b. as root, type
$hostname node1
$hostname               --- to verify it is now showing short name

2. restart ONS again:
onsctl start
Now onsctl shows ons started.

For other platforms, please consult with your system admin to make the change.
Or change the hostname in the OCR, via:
<CRS_HOME>/bin/racgons remove_config <bad_hostname>
<CRS_HOME>/bin/racgons add_config hostname1:6251
Solution 2:
Either stop the processes (for example, left over ons process) or change the ons local or remote port to avoid conflict.
Solution 3:
Change ons.conf file to have correct ownership and permission, eg:
-rw-r--r-- 1 oracle oinstall 63 2011-05-30 12:15 ons.config
Solution 4:
Either modify ons port in OCR or ons.config file to make them match and ensure the port is not used elsewhere.
To modify the port in ONS:
<CRS_HOME>/bin/racgons remove_config <hostname>:<wrong_remote_port>
<CRS_HOME>/bin/racgons add_config <hostname>:<correct_remote_port>
To modify port in ons.config, edit localport or remoteport in <CRS_HOME>/opmn/conf/ons.config

 

Manually Changing rebalance power in Oracle ASM


 

Manually changing rebalance power on an ongoing ASM rebalance operation     

 
 
A couple of times we remove ASM disks from a diskgroup and found that the rebalance operation will take too long.

This is not a problem when you don't want to disturb online users, the rebalance process only locks for writes 1mb of data at a time, so letting the rebalance to run slowly has no performance impact.

My problem was that I needed to free the disks as fast as possible in order to finish some IO tests within a limited timeframe.

The procedure to change the rebalance power of an ongoing operation is very simple:

alter diskgroup DATADG rebalance power 11;
(Power can be set from 0-11)

After that the rebalance operation will restart with the new set rebalance power.

addNode.sh fails with Exception in thread “Thread-29″ java.lang.OutOfMemoryError: Java heap space


addNode.sh fails with Exception in thread “Thread-29″ java.lang.OutOfMemoryError: Java heap space

 
 
I encountered this issue at the time of adding Node in Database Layer.
 
My Node addition in Cluster was successfully done.
 
 

PROBLEM:-

SEVERE: Abnormal program termination. An internal error has occured. Please provide the following files to Oracle Support :

"/home/oracle/oraInventory/logs/addNodeActions2014-04-01_06-18-55PM.log"
"/home/oracle/oraInventory/logs/oraInstall2014-04-01_06-18-55PM.err"
"/home/oracle/oraInventory/logs/oraInstall2014-04-01_06-18-55PM.out"

SOLUTION:-

 
Change the value of the variable  JRE_MEMORY_OPTIONS to ” -mx1024m” in file oraparam.ini in the $ORACLE_HOME/oui directory and run addnode.sh again.
 
 
Cheers !!