IBM OSD Components for Oracle OPS on Windows NT 4.0 Version 1.1 IBM Netfinity Cluster Enabler README File This README file contains the latest hints and tips to enhance reliability and performance of your Netfinity Cluster. Refer to the "IBM Netfinity Cluster Enabler Hardware and Software Installation Guide for Oracle Parallel Server" for complete installation and configuration instructions. MAJOR CHANGES FROM LAST RELEASE _______________________________ o This version of the IBM Netfinity Cluster Enable Software supports Oracle Version 8.0.5 on Windows NT Service Pack 4. o The IBMGSCFG.exe configuration utility now allows input of a database name other than the default name of "OPS". The database name specified when configuring Oracle (e.g., using the OPSCONF.exe utility) must match the database name specified when using the IBMGSCFG.exe utility. CONTENTS ________ 1.0 Tips and Troubleshooting Hints for Installing and Configuring the Netfinity Cluster Enabler Software 2.0 How to Obtain the Oracle Patch Set 3.0 Trademarks and Notices 1.0 Tips and Troubleshooting Hints for Installing and Configuring the Netfinity Cluster Enabler Software ______________________________________________________________________ o Whenever Oracle is reinstalled on a node, you must reinstall the IBM Netfinity Cluster Enabler software. This ensures that dependencies between the IBM and Oracle services are set correctly. o Before updating or reconfiguring the IBM Netfinity Cluster Enabler software, the IBMCoreClusterService service must be stopped on all nodes. o Symptom: After first installing and configuring the IBM Netfinity Cluster Enabler Software, the IBMCoreClusterService fails to start. The %installation_dir%\config directory contains files named cscomputer.cfg.0 through cscomputer.cfg.n-1 where n is the number of nodes. Explanation: During the configuration step, the configuration files were not properly distributed to each of the nodes. Action: Ensure that all nodes are connected to the interconnect and can ping each other. Ensure that at least one free drive letter exists on the node from which the IBMGSCFG.exe configuration utility is run. o Symptom: "net stop ibmcoreclusterservice" indicates that OracleServiceOPSn is not started, and IBMCoreClusterService is not stopped. Explanation: When the IBMCoreClusterService is installed, it makes itself a dependency of OraclePGMSService. OraclePGMSService is itself a dependency of OracleServiceOPS. When stopping IBMCoreClusterService from a command line, the user is prompted that the two other Oracle services will be stopped in order. The order is: 1. OraclePGMSService 2. OracleServiceOPSn As a byproduct of step 1, OracleServiceOPSn is stopped. Then when step 2 is attempted, the indication that OracleServiceOPSn is not started is seen. This terminates the "net stop" command, and IBMCoreClusterService is not stopped. This is a normal behavior of Windows NT. Action: Reissue the "net stop ibmcoreclusterservice" command. Alternatively, stop the services sequentially in the following order: 1. OracleServiceOPSn 2. OraclePGMSService 3. IBMCoreClusterService Alternatively, use the Windows NT Services window to stop IBMCoreClusterService. o Symptom: OPSCONF does not create Net8 configuration to support an OPS cluster with more than one public network card. The user cannot select instances to start or stop from the Oracle Enterprise Manager Console. Explanation: Oracle Enterprise Manager Version 1 does not support multiple network cards on the agent machine. This can affect some operations in Oracle Enterprise Manager Console, Oracle Intelligent Agent, and the OPSCONF utility. Oracle plans to address this with the next version of these programs. Check with Oracle for details of the availability of the next version. Action: Use only one public network card on the Agent machine. o Symptom: The Net8 Assistant program does not start on the Oracle Enterprise Manager Console. Explanation: When the Net8 Assistant program is selected from the Windows NT Start-Programs menu, the program may fail to start. Action: Ensure that JRE 1.1.6 or later is installed. There are specific instructions for installing JRE with Oracle. Contact Oracle for instructions to acquire and install the JRE program with Net8. o Symptom: After installation of Oracle and the IBM Netfinity Cluster Enabler Software, a service or a database cannot be started. Explanation: The symbolic links for the shared disk partitions may not be set up correctly. The symbolic links are set up using the SETLINKS program as described in the Oracle Parallel Server "Getting Started" guide book (page 5-3). If the links have not been set up correctly, the problem could be in the input .tbl file for the SETLINKS program. Action: Ensure that there is a Carriage-Return character after the last line in the .tbl file used with SETLINKS. o Symptom: The Oracle Installer program reports an incorrect amount of disk storage on the installation drive. Explanation: The actual amount of available disk storage can be checked by using Windows NT commands. There is no functional problem due to the reported value. Action: None. o Symptom: Nodes are unable to communicate with each other or clients are unable to connect to a node. PING and/or TNSPING80 report different IP addresses or fail when pinging a node. Explanation: PING and/or TNSPING80 against the local node may return a different IP address than a PING or TNSPING80 from a remote node. This is due to how host names and IP addresses are resolved by Windows NT. The result is that two or more nodes may be unable to communicate. When a node pings itself, the returned IP address is that of the first network adapter card in the Windows NT list. When a node pings a remote node, the returned IP address is that of the public network. If the public network is not connected to the lowest numbered network adapter card, then the results of the two pings can be different. Action: Ensure that the lowest numbered network adapter card in the machine is connected to the public network. The private network should be connected to a higher numbered network adapter card. After installation, the easiest way to correct this problem is to switch the adapter cables and IP addresses for the installed network adapter cards. Also ensure that the network adapter properties (e.g., duplex, data rates) are also changed. o Symptom: "SELECT * FROM v$active_instances;" returns invalid information. The response may include an incorrect list of instances, a message that no rows were found, or random characters. Explanation: This SELECT statement is valid only when the database instances are in a stable state. If a database instance is in the process of being shutdown, the response may be invalid. Action: Reissue the statement after the database instance shutdown has completed and the remaining database instances are stable. o Symptom: The OraclePGMSService fails to start with error 1067. Explanation: This error normally indicates that there is an error in the software configuration. Action: Ensure that IBMCoreClusterService has been started. Ensure that only simple computer names were specified when using the IBMGSCFG.exe configuration utility rather than Fully Qualified Domain Names. For example, a name such as "ops1" should be used rather than "ops1.yourcompany.com". o Symptom: The OraclePGMSService service terminates when attempting to start a database instance. The messages "ORA-29702: Error occurred in Group Membership Services operation" or "ORA-03113: end-of-file in communication channel" might be seen. Explanation: When a node joins or leaves the cluster or when a database instance is started or stopped, Oracle must perform additional processing to complete the startup or shutdown of the new node or database instance. During this processing, additional membership changes may not be able to complete successfully. This is particularly so after a node failure when recovery actions are required by the database. Action: After a cluster membership change, it is recommended that time be allowed for the database state to stabilize before initiating another change. For example, when starting OraclePGMSService, after receiving the message "The OraclePGMSService service was started successfully" or observing the service status change to "Started" in the Windows NT Services panel, wait at least 30 seconds after the service has been reported as started before attempting to start the "OraclePGMSService" service on another node. Similarly, when stopping OraclePGMSService, wait at least 30 seconds after the service has been stopped before attempting to start or stop the service on another node. While 30 seconds is usually sufficient, the time can vary depending on the database load on the other nodes that have already joined the cluster. When starting or stopping a database instance, it may be necessary to wait several minutes before performing a similar action on another node. These times may be longer if a service or database instance was stopped due to a failure on one of the nodes. o Symptom: When the Oracle Enterprise Manager is used to start or stop all database instances together (as opposed to selecting an individual database instance), the operation does not complete successfully. Explanation: It is recommended that Oracle services and database instances not be started simultaneously on different nodes. Action: Select only individual instances when starting or stopping databases on different nodes. o Symptom: The Oracle "shutdown immediate" command does not complete within 15 minutes. Explanation: After the "shutdown immediate" command is issued, it is recommended that the OracleServiceOPSn service also be stopped. In some cases, "shutdown immediate" may take several minutes to complete. Action: Use the Windows NT Services window to stop OracleServiceOPSn or enter "net stop OracleServiceOPSn" from a command prompt, where n is the OPS instance number. If "shutdown immediate" reports that the database was closed and dismounted, then the OracleServiceOPSn may be stopped to free up resources of that database. If a message does not indicate that the database was closed and dismounted, then stopping OracleServiceOPSn may result in the loss of uncommitted changes but will not affect the integrity of committed data. o Symptom: The manual startup of a service fails when performed immediately after starting up a node and logging on. This occurs with one of the following services: IBMCoreClusterService, OraclePGMSService, or OracleServiceOPSn. Explanation: The system is still performing startup tasks when the attempt is made to start up the IBMCoreClusterService. This may slow the startup of this service to the point where it times out and stops. Since the Oracle services are dependent upon IBMCoreClusterServices, they also do not start. Action: Any of the following actions can be taken: - Wait a minute and retry the command to start the service. - After logging on to a system that is still starting up, wait a minute before attempting to start these services. - Set these services to "automatic" startup. This allows the system startup processes to complete before the services are started. This is the default setting for OraclePGMSService when Oracle is installed. o Symptom: Excessive shared drive activity on Mondays or Tuesdays. Explanation: The Symplicity Storage Manager software used to manage the shared storage is set up to do an automatic parity check of all LUNs on the shared storage every Sunday night by default. This can take quite some time. Since Symplicity Storage Manager needs to be installed on each node and the scheduling of the parity check is done by the Symplicity Storage Manager software on each node, the result is that the parity check will be scheduled to run from all six nodes (i.e., six times) every Sunday night. Action: Start the Symplicity Storage Manager Maintenance and Tuning application on one of the nodes. Go into the Options menu and select Auto Parity Settings. Uncheck the Automatic Parity Check/Repair. Repeat this process for all but one node (it is only necessary for the parity check to run from one node). o Symptom: Node running very slowly with process Oracle80.exe consuming the majority of the processing time. Another symptom may be an ORA-00600 error in an Oracle instance's instanceLCK0.trc file. The specific error in that file might be ORA-00600: internal error code, arguments: [ksires_1], [KJUSERSTAT_SHUTDOWN], [], [], [], [], [], [] Explanation: The database is thrashing on that node due to redo logs or rollback segments that are either too small or too few in number. This is more likely to occur after a failure which causes database recovery operations to run. Action: The system should be tuned by increasing the number of redo logs, increasing the size of the redo logs, and increasing the initial size of the rollback segments. As an example, in a test configuration this problem was resolved by increasing the number of redo logs per thread from 2 to 4 and by increasing their sizes from 20MB to 100MB. The initial size of the rollback segments was increased from 2MB to 20MB with 2MB increments for the extents. 2.0 How to Obtain the Oracle Patch Set _______________________________________ - Go to the Oracle Web site http://www.oracle.com - Click on "Support". - If you have already registered for a Metalink ID, then Click on "Visit Metalink". Otherwise, Click on the link to register for a Metalink ID. - Click on "Download". - From the "product" pull-down, select Parallel Server Option. From the "platform" pull-down, select MS Windows NT - Download the Patch Set for OPS Version 8.0.5.1.a 3.0 Trademarks and Notices ___________________________ The following terms are trademarks of the IBM Corporation in the United States or other countries or both: IBM Netfinity Windows NT is a trademark or registered trademark of Microsoft Corporation. Oracle and Oracle OPS are trademarks or registered trademarks of Oracle Corporation. Any other company, product, and service names may be trademarks or service marks of others. THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. IBM DISCLAIMS ALL WARRANTIES, WHETHER EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF FITNESS FOR PARTICULAR PURPOSE AND MERCHANTABILITY WITH RESPECT TO THE INFORMATION IN THIS DOCUMENT. BY FURNISHING THIS DOCUMENT, IBM GRANTS NO LICENSES TO ANY PATENTS OR COPYRIGHTS. Copyright (C) 1998, 1999 IBM Corporation. All rights reserved. Note to U.S. Government Users -- Documentation related to restricted rights -- Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.