After node addition ONS was not able to start.
CRS-0215: Could not start resource 'ora.rac2.ons'.
Problem :-
onsctl start
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = node1, port = 6251}
Setting remote port from OCR repository to 6251
Adding remote host nodename:6251
onscfg[1]
{node = node2, port = 6251}
Adding remote host sai_db2:6251
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = node3, port = 6251}
Setting remote port from OCR repository to 6251
Adding remote host nodename:6251
onscfg[1]
{node = node4, port = 6251}
Adding remote host nodename:6251
onsctl: ons failed to start
$CRS_HOME/bin/onsctl ping
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = NODE1, port = 6251}
Setting remote port from OCR repository to 6251
Adding remote host SGRAC1:6251
onscfg[1]
{node = NODE2, port = 6251}
Adding remote host SGRAC2:6251
ons is not running …
Trouble shoot :-
Check the log file "the / u01/app/oracle/product/10.2.0/crs/log/rac2/racg/ora.rac2.ons.log"
There can be multiple issues
Issue 1.
Failed to get IP for localhost (4)
Failed to get IP for localhost (4)
onsctl: ons failed to start
Issue 2
Adding remote host node_name:6251
onsctl: ons failed to start
Solution:-
Issue 1.
Add localhost IP "127.0.0.1" Manually in /etc/hosts.
Then retry to start the ONS, it should work.
Issue 2.
add a new configuration with the full host name
#racgons add_config
node1:6251 |
#racgons add_config node2:6251 |
#racgons add_config node3:6251 |
#racgons add_config node4:6251
---------------------------------------------------------------------------------------------------------------------------------
Investigating Further
Cause 1: Host name retrieved from "hostname" command does not match host name registered in OCR. OS hostname returns fully qualified name (with domain name) while OCR shows short name.
Eg: [oracle@node1 ~]$ hostname node1.oracle.com
[oracle@node2 ~]$ hostname node2
ocrdump shows [DATABASE.ONS_HOSTS.node1] ORATEXT : node1
[DATABASE.ONS_HOSTS.node2] ORATEXT : node2
For ons to work, the OS hostname must equal to the hostname in OCR. This explains why node 2 ons is up find while node 1 has problem.
Cause 2: local or remote port defined for ons are in use by other process. To confirm:
netstat -a |grep <port> or as root user: lsof | grep <port>
Cause 3: $ORA_CRS_HOME/opmn/conf/ons.conf has wrong ownership or permission. It should be owned by CRS owner, not root user. For example: -rw-r--r-- 1 oracle oinstall 63 2011-05-30 12:15 ons.config
Cause 4:Mismatch of remote port entry DATABASE.ONS_HOSTS.<host>.PORT for ons in OCR and ons config file on the node. 1. To find ONS port in OCR: <CRS_HOME>/bin/ocrdump -stdout -keyname DATABASE.ONS_HOSTS 2. To find ons port in ons config: Check <CRS_HOME>/opmn/conf/ons.config
Solution
Solution 1:
Modify hostname to make "hostname" output same as ONS_HOST registered in OCR,
e.g. on Linux platform: 1. change hostname on node1 to use short name format: a. edit /etc/sysconfig/network as root user change HOSTNAME field to short name
b. as root, type $hostname node1 $hostname --- to verify it is now showing short name
2. restart ONS again: onsctl start Now onsctl shows ons started.
For other platforms, please consult with your system admin to make the change. Or change the hostname in the OCR, via:
<CRS_HOME>/bin/racgons remove_config <bad_hostname> <CRS_HOME>/bin/racgons add_config hostname1:6251
Solution 2:
Either stop the processes (for example, left over ons process) or change the ons local or remote port to avoid conflict.
Solution 3:
Change ons.conf file to have correct ownership and permission, eg:
-rw-r--r-- 1 oracle oinstall 63 2011-05-30 12:15 ons.config
Solution 4:
Either modify ons port in OCR or ons.config file to make them match and ensure the port is not used elsewhere. To modify the port in ONS:
<CRS_HOME>/bin/racgons remove_config <hostname>:<wrong_remote_port> <CRS_HOME>/bin/racgons add_config <hostname>:<correct_remote_port>
To modify port in ons.config, edit localport or remoteport in <CRS_HOME>/opmn/conf/ons.config |