DB2 HADR 구성 환경에서 Cluster를 구성하기 위해 db2haicu 를 실행하면 여러 에러들이 발생하곤 한다. xml 설정파일에 설정한 값이 잘못되었거나, 환경이 잘못되었거나 등등의 이유로 대부분 발생한다.

이 중 RSCT가 사용하는 db2 모니터링 script 관련한 오류에 대해 정리를 해 본다.

1. 진단로그(db2diag.log) 오류 메시지

2011-10-06-13.29.04.814926+540 E522666E818         LEVEL: Error
PID     : 21846                TID  : 47753254935648PROC : db2haicu
INSTANCE: hadr97               NODE : 000
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlhaAddResource, probe:1600
MESSAGE : ECF=0x90000542=-1879046846=ECF_SQLHA_CREATE_GROUP_FAILED
          Create group failed
DATA #1 : String, 35 bytes
Error during vendor call invocation
DATA #2 : unsigned integer, 4 bytes
37
DATA #3 : String, 36 bytes
db2_hadr97_cluster2.localdomain_0-rg
DATA #4 : unsigned integer, 8 bytes
1
DATA #5 : signed integer, 4 bytes
98343
DATA #6 : String, 186 bytes
Line # : 6719---cluster2.localdomain: 2661-011
The command specified for attribute MonitorCommand is NULL, not a absolute path, does not exist or has insufficient permissions to be run.

2011-10-06-13.29.04.815548+540 E523485E365         LEVEL: Error
PID     : 21846                TID  : 47753254935648PROC : db2haicu
INSTANCE: hadr97               NODE : 000
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlhaCreateDB2Partition, probe:80
RETCODE : ECF=0x90000542=-1879046846=ECF_SQLHA_CREATE_GROUP_FAILED
          Create group failed

2. 원인

   db2haicu 를 실행하여 cluster를 구성할 때, db2 모니터링 script가 실행이 된다. 그런데 해당 위치에 script가 존재하지 않아 위와같이 에러가 발생하게 된다.

3. 해결책

    script가 존재하는지 확인 후, 없으면 db2cptsa를 이용하여 script를 해당 경로에 생성해 준다.

명령어
#> ls –al /usr/sbin/rsct/sapolicies/db2

결과


명령어
#> 엔진_설치_위치/install/tsamp/db2cptsa

확인
#> ls –al /usr/sbin/rsct/sapolicies/db2

결과

-r-xr-xr-x 1 root root  3183  4월  1  2011 db2V97_monitor.ksh
-r-xr-xr-x 1 root root  6227  4월  1  2011 db2V97_start.ksh
-r-xr-xr-x 1 root root  4566  4월  1  2011 db2V97_stop.ksh
-r-xr-xr-x 1 root root  1114  4월  1  2011 forceAllApps
-r-xr-xr-x 1 root root 13980  4월  1  2011 hadrV97_monitor.ksh
-r-xr-xr-x 1 root root  5517  4월  1  2011 hadrV97_start.ksh
-r-xr-xr-x 1 root root  4255  4월  1  2011 hadrV97_stop.ksh
-r-xr-xr-x 1 root root  7121  4월  1  2011 mountV97_monitor.ksh
-r-xr-xr-x 1 root root  7087  4월  1  2011 mountV97_start.ksh
-r-xr-xr-x 1 root root  7948  4월  1  2011 mountV97_stop.ksh
-r-xr-xr-x 1 root root  7101  4월  1  2011 nfsserverctrl-server
-r-xr-xr-x 1 root root  5554  4월  1  2011 rovingV97_failover.ksh

1. db2haicu 실행 중 에러 발생

(명령어)
db2haicu –f hadr.xml

(실행 결과)

Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu).

You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster domains you create.

For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 Information Center.

db2haicu determined the current DB2 database manager instance is hadr97. The cluster configuration that follows will apply to this instance.

db2haicu is collecting information on your current setup. This step may take some time as db2haicu will need to activate all databases for the instance to discover all paths ...
Creating domain db2hadr in the cluster ...
Creating domain db2hadr in the cluster was successful.
Configuring quorum device for domain db2hadr ...
Configuring quorum device for domain db2hadr was successful.
Adding network interface card eth0 on cluster node cluster1 to the network db2_public_network_0 ...
Adding network interface card eth0 on cluster node cluster1 to the network db2_public_network_0 was successful.
Adding network interface card eth0 on cluster node cluster2 to the network db2_public_network_0 ...
Adding network interface card eth0 on cluster node cluster2 to the network db2_public_network_0 was successful.
Adding network interface card eth1 on cluster node cluster1 to the network db2_private_network_0 ...
Adding network interface card eth1 on cluster node cluster1 to the network db2_private_network_0 was successful.
Adding network interface card eth1 on cluster node cluster2 to the network db2_private_network_0 ...
Adding network interface card eth1 on cluster node cluster2 to the network db2_private_network_0 was successful.
Adding DB2 database partition 0 to the cluster ...                                                                                                <-------- 에러
There was an error with one of the issued cluster manager commands. Refer to db2diag.log and the DB2 Information Center for details.
There was an internal db2haicu error. Refer to db2diag.log and the DB2 Information Center for details.

2. 진단로그 에러 메시지

2011-10-06-11.03.01.142906+540 I510055E380         LEVEL: Warning
PID     : 6699                 TID  : 47063466929248PROC : db2haicu
INSTANCE: hadr97               NODE : 000
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlhaGetPolicyTypeFromSysFile, probe:400
MESSAGE : No matching instance record ... setting policy to none
DATA #1 : unsigned integer, 4 bytes


2011-10-06-11.04.03.207153+540 E514374E546         LEVEL: Severe
PID     : 6699                 TID  : 47063466929248PROC : db2haicu
INSTANCE: hadr97               NODE : 000
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, handleEndElement, probe:1450
MESSAGE : Failed to create cluster resources for database partition in cluster
DATA #1 : unsigned integer, 4 bytes
0
DATA #2 : String, 6 bytes
hadr97
DATA #3 : String, 20 bytes
cluster2.localdomain
DATA #4 : unsigned integer, 4 bytes
1
DATA #5 : Pointer, 8 bytes
0x000000000854aeb0


2011-10-06-11.31.24.990724+540 E519870E520         LEVEL: Error
PID     : 11458                TID  : 47790888316000PROC : db2haicu
INSTANCE: hadr97               NODE : 000
FUNCTION: DB2 Common, SQLHA APIs for DB2 HA Infrastructure, sqlhaUICreatePartition, probe:200
RETCODE : ECF=0x90000557=-1879046825=ECF_SQLHA_CLUSTER_ERROR
          Error reported from Cluster
DATA #1 : String, 6 bytes
hadr97
DATA #2 : String, 20 bytes
cluster2.localdomain
DATA #3 : signed integer, 4 bytes
1
DATA #4 : signed integer, 4 bytes
0

 

3. 원인

     hadr.xml의 호스트 명을 잘못 적어서 생긴 문제

(명령어)  vi /etc/hosts
192.168.137.241  cluster1.localdomain    cluster1
192.168.137.242  cluster2.localdomain    cluster2

 

(명령어) vi hadr.xml

<?xml version="1.0" encoding="UTF-8"?>
<DB2Cluster xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="db2ha.xsd" clusterManagerName="TSA" version="1.0">
  <ClusterDomain domainName="db2hadr">
     <Quorum quorumDeviceProtocol="network" quorumDeviceName="192.168.137.1"/>

     <PhysicalNetwork physicalNetworkName="db2_public_network_0" physicalNetworkProtocol="ip">
      <Interface interfaceName="eth0" clusterNodeName="cluster1">
        <IPAddress baseAddress="192.168.137.241" subnetMask="255.255.255.0" networkName="db2_public_network_0"/>
      </Interface>
      <Interface interfaceName="eth0" clusterNodeName="cluster2">
        <IPAddress baseAddress="192.168.137.242" subnetMask="255.255.255.0" networkName="db2_public_network_0"/>
      </Interface>
      </PhysicalNetwork>

      <PhysicalNetwork physicalNetworkName="db2_private_network_0" physicalNetworkProtocol="ip">
      <Interface interfaceName="eth1" clusterNodeName="cluster1">
        <IPAddress baseAddress="10.10.10.1" subnetMask="255.255.255.0" networkName="db2_private_network_0"/>
      </Interface>             
      <Interface interfaceName="eth1" clusterNodeName="cluster2">
        <IPAddress baseAddress="10.10.10.2" subnetMask="255.255.255.0" networkName="db2_private_network_0"/>
      </Interface>             
     </PhysicalNetwork>

     <ClusterNode clusterNodeName="cluster1"/>
     <ClusterNode clusterNodeName="cluster2"/>
  </ClusterDomain>

 

4. 해결 방안

   xml 파일의 설정할 때 hostname 명령어를 실행한 결과 값으로 호스트 명을 설정한다.

(명령어) hostname
(결과)  cluster1.localdomain

수정된 xml 파일

<?xml version="1.0" encoding="UTF-8"?>
<DB2Cluster xmlns:xsi="
http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="db2ha.xsd" clusterManagerName="TSA" version="1.0">
  <ClusterDomain domainName="db2hadr">
     <Quorum quorumDeviceProtocol="network" quorumDeviceName="192.168.137.1"/>

     <PhysicalNetwork physicalNetworkName="db2_public_network_0" physicalNetworkProtocol="ip">
      <Interface interfaceName="eth0" clusterNodeName="cluster1.localdomain">
        <IPAddress baseAddress="192.168.137.241" subnetMask="255.255.255.0" networkName="db2_public_network_0"/>
      </Interface>
      <Interface interfaceName="eth0" clusterNodeName="cluster2.localdomain">
        <IPAddress baseAddress="192.168.137.242" subnetMask="255.255.255.0" networkName="db2_public_network_0"/>
      </Interface>
      </PhysicalNetwork>

      <PhysicalNetwork physicalNetworkName="db2_private_network_0" physicalNetworkProtocol="ip">
      <Interface interfaceName="eth1" clusterNodeName="cluster1.localdomain">
        <IPAddress baseAddress="10.10.10.1" subnetMask="255.255.255.0" networkName="db2_private_network_0"/>
      </Interface>             
      <Interface interfaceName="eth1" clusterNodeName="cluster2.localdomain">
        <IPAddress baseAddress="10.10.10.2" subnetMask="255.255.255.0" networkName="db2_private_network_0"/>
      </Interface>             
     </PhysicalNetwork>

     <ClusterNode clusterNodeName="cluster1.localdomain"/>
     <ClusterNode clusterNodeName="cluster2.localdomain"/>
  </ClusterDomain>

+ Recent posts