[Serusers] Heartbeat with ser and asterisk

aespinoza at vivophone.com aespinoza at vivophone.com
Thu Mar 15 19:48:29 CET 2007


Hello people
I have a heartbeat cluster that manages a ser 0.9.3 running  on one  
machine and asterisk1.2.3 running on another in an Active/Active two  
IP address Configuration with failover support. I have SUSE 10.1  
installed on both machines. My ha.cf files look like this
###############################################
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 10
warntime 10
initdead 20
udpport 694
baud    19200
bcast   eth1
ping    xxx.xxx.x.x
auto_failback on
node    linux-xczz
node    prueba2
respawn hacluster /usr/local/lib/heartbeat/ipfail
#################################################


My haresources files look like this
########################################
prueba2 xxx.xxx.x.125/24 safe_asterisk
linux-xczz xxx.xxx.x.124/24 serctl
########################################


so "linux-xczz" is the master when running ser  and "prueba2" is the  
master when running asterisk

the authkeys files are the same on both machines too with the right  
permissions (mod 600). The /etc/hosts files look like this

##########################
10.10.10.1      linux-xczz
10.10.10.2      prueba2
##########################


The First time I try to run heartbeat on one machine (prueba2) with  
/etc/init.d/heartbeat start, both my services run good. But when I try  
to run heartbeat on the other machine (linux-xczz) so that it takes  
over the ser service, the system goes crazy  and once linux-xczz takes  
over the ser service, prueba2 gives up the other resource (asterisk)  
which it should not do, and it appears on linux-xczz, only to  
disappear seconds later along with ser, leaving my cluster-ha a  
complete wreck with no service running on either machine. The error  
log I get from prueba2 is this:

heartbeat[12609]: 2007/03/15_11:41:18 info: Link linux-xczz:eth1 up.
heartbeat[12609]: 2007/03/15_11:41:18 info: Status update for node  
linux-xczz: status init
heartbeat[12609]: 2007/03/15_11:41:18 info: Status update for node  
linux-xczz: status up
harc[13730]:    2007/03/15_11:41:18 info: Running /etc/ha.d/rc.d/status status
harc[13741]:    2007/03/15_11:41:18 info: Running /etc/ha.d/rc.d/status status
heartbeat[12609]: 2007/03/15_11:41:19 info: Status update for node  
linux-xczz: status active
harc[13754]:    2007/03/15_11:41:19 info: Running /etc/ha.d/rc.d/status status
heartbeat[12609]: 2007/03/15_11:41:19 info: remote resource transition  
completed.
heartbeat[12609]: 2007/03/15_11:41:19 info: prueba2 wants to go  
standby [foreign]
heartbeat[12609]: 2007/03/15_11:41:20 info: standby: linux-xczz can  
take our foreign resources
heartbeat[13767]: 2007/03/15_11:41:20 info: give up foreign HA  
resources (standby).
ResourceManager[13777]: 2007/03/15_11:41:20 info: Releasing resource  
group: linux-xczz xxx.xxx.x.124/24 serctl
ResourceManager[13777]: 2007/03/15_11:41:20 info: Running  
/etc/init.d/serctl  stop
ResourceManager[13777]: 2007/03/15_11:41:20 info: Running  
/etc/ha.d/resource.d/IPaddr xxx.xxx.x.124/24 stop
IPaddr[13915]:  2007/03/15_11:41:20 INFO: /sbin/route -n del -host  
xxx.xxx.x.124
IPaddr[13915]:  2007/03/15_11:41:20 INFO: /sbin/ifconfig eth0:0  
xxx.xxx.x.124 down
IPaddr[13915]:  2007/03/15_11:41:20 INFO: IP Address xxx.xxx.x.124 released
IPaddr[13836]:  2007/03/15_11:41:20 INFO: IPaddr Success
heartbeat[13767]: 2007/03/15_11:41:20 info: foreign HA resource  
release completed (standby).
heartbeat[12609]: 2007/03/15_11:41:20 info: Local standby process  
completed [foreign].
heartbeat[12609]: 2007/03/15_11:41:23 WARN: 1 lost packet(s) for  
[linux-xczz] [13:15]
heartbeat[12609]: 2007/03/15_11:41:23 info: remote resource transition  
completed.
heartbeat[12609]: 2007/03/15_11:41:23 info: No pkts missing from linux-xczz!
heartbeat[12609]: 2007/03/15_11:41:23 info: Other node completed  
standby takeover of foreign resources.
heartbeat[12609]: 2007/03/15_11:41:35 info: linux-xczz wants to go  
standby [foreign]
heartbeat[12609]: 2007/03/15_11:41:36 info: standby: acquire [foreign]  
resources from linux-xczz
heartbeat[14011]: 2007/03/15_11:41:36 info: acquire local HA resources  
(standby).
ResourceManager[14021]: 2007/03/15_11:41:36 info: Acquiring resource  
group: prueba2 xxx.xxx.x.125/24 asterisk-rosa
IPaddr[14048]:  2007/03/15_11:41:36 INFO: IPaddr Running OK
ResourceManager[14021]: 2007/03/15_11:41:36 info: Running  
/etc/init.d/safe_asterisk  start
ResourceManager[14021]: 2007/03/15_11:41:36 ERROR: Return code 1 from  
/etc/init.d/safe_asterisk
ResourceManager[14021]: 2007/03/15_11:41:36 CRIT: Giving up resources  
due to failure of safe_asterisk
ResourceManager[14021]: 2007/03/15_11:41:36 info: Releasing resource  
group: prueba2 xxx.xxx.x.125/24 asterisk-rosa
ResourceManager[14021]: 2007/03/15_11:41:xxz.xxz.x.xxz36 info: Running  
/etc/init.d/safe_asterisk  stop
ResourceManager[14021]: 2007/03/15_11:41:37 info: Running  
/etc/ha.d/resource.d/IPaddr xxx.xxx.x.125/24 stop
IPaddr[14310]:  2007/03/15_11:41:37 INFO: /sbin/route -n del -host  
xxx.xxx.x.125
IPaddr[14310]:  2007/03/15_11:41:37 INFO: /sbin/ifconfig eth0:2  
xxx.xxx.x.125 down
IPaddr[14310]:  2007/03/15_11:41:37 INFO: IP Address xxx.xxx.x.125 released
IPaddr[14231]:  2007/03/15_11:41:37 INFO: IPaddr Success
heartbeat[14011]: 2007/03/15_11:41:37 info: local HA resource  
acquisition completed (standby).
heartbeat[12609]: 2007/03/15_11:41:37 info: Standby resource  
acquisition done [foreign].
heartbeat[12609]: 2007/03/15_11:41:37 info: remote resource transition  
completed.
heartbeat[12609]: 2007/03/15_11:41:38 WARN: G_CH_dispatch_int:  
Dispatch function for read child took too long to execute: 520 ms (>  
50 ms)
(GSource: 0x80fbe00)
hb_standby[14375]:      2007/03/15_11:42:07 Going standby [foreign].
heartbeat[12609]: 2007/03/15_11:42:07 info: prueba2 wants to go  
standby [foreign]
heartbeat[12609]: 2007/03/15_11:42:08 info: standby: linux-xczz can  
take our foreign resources
heartbeat[14385]: 2007/03/15_11:42:08 info: give up foreign HA  
resources (standby).
ResourceManager[14395]: 2007/03/15_11:42:08 info: Releasing resource  
group: linux-xczz xxx.xxx.x.124/24 serctl
ResourceManager[14395]: 2007/03/15_11:42:08 info: Running  
/etc/init.d/serctl  stop
ResourceManager[14395]: 2007/03/15_11:42:08 ERROR: Return code 1 from  
/etc/init.d/serctl







The error log I get from linux-xczz when I run heartbeat is this:

heartbeat[1063]: 2007/03/15_14:26:12 WARN: Core dumps could be lost if  
multiple dumps occur
heartbeat[1063]: 2007/03/15_14:26:12 WARN: Consider setting  
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum  
supportability
heartbeat[1063]: 2007/03/15_14:26:12 WARN: Logging daemon is disabled  
--enabling logging daemon is recommended
heartbeat[1063]: 2007/03/15_14:26:12 info: **************************
heartbeat[1063]: 2007/03/15_14:26:12 info: Configuration validated.  
Starting heartbeat 2.0.7
heartbeat[1064]: 2007/03/15_14:26:12 info: heartbeat: version 2.0.7
heartbeat[1064]: 2007/03/15_14:26:12 info: Heartbeat generation: 130
heartbeat[1064]: 2007/03/15_14:26:12 info: G_main_add_TriggerHandler:  
Added signal manual handler
heartbeat[1064]: 2007/03/15_14:26:12 info: G_main_add_TriggerHandler:  
Added signal manual handler
heartbeat[1064]: 2007/03/15_14:26:12 info: Removing  
/usr/local/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[1064]: 2007/03/15_14:26:12 info: glib: UDP Broadcast  
heartbeat started on port 694 (694) interface eth1
heartbeat[1064]: 2007/03/15_14:26:12 info: glib: UDP Broadcast  
heartbeat closed on port 694 interface eth1 - Status: 1
heartbeat[1064]: 2007/03/15_14:26:12 info: glib: ping heartbeat started.
heartbeat[1064]: 2007/03/15_14:26:12 info: G_main_add_SignalHandler:  
Added signal handler for signal 17
heartbeat[1064]: 2007/03/15_14:26:12 info: Local status now set to: 'up'
heartbeat[1064]: 2007/03/15_14:26:13 info: Link linux-xczz:eth1 up.
heartbeat[1064]: 2007/03/15_14:26:13 info: Link prueba2:eth1 up.
heartbeat[1064]: 2007/03/15_14:26:13 info: Status update for node  
prueba2: status active
heartbeat[1064]: 2007/03/15_14:26:13 info: Link xxx.xxx.x.x:xxx.xxx.x.x up.
heartbeat[1064]: 2007/03/15_14:26:13 info: Status update for node  
xxx.xxx.x.x: status ping
harc[1073]:	2007/03/15_14:26:13 info: Running  
/usr/local/etc/ha.d/rc.d/status status
heartbeat[1064]: 2007/03/15_14:26:14 info: Comm_now_up(): updating  
status to active
heartbeat[1064]: 2007/03/15_14:26:14 info: Local status now set to: 'active'
heartbeat[1064]: 2007/03/15_14:26:14 info: Starting child client  
"/usr/local/lib/heartbeat/ipfail" (1001,100)
heartbeat[1084]: 2007/03/15_14:26:14 info: Starting  
"/usr/local/lib/heartbeat/ipfail" as uid 1001  gid 100 (pid 1084)
heartbeat[1064]: 2007/03/15_14:26:14 info: remote resource transition  
completed.
heartbeat[1064]: 2007/03/15_14:26:14 info: remote resource transition  
completed.
heartbeat[1064]: 2007/03/15_14:26:14 info: Local Resource acquisition  
completed. (none)
heartbeat[1064]: 2007/03/15_14:26:15 info: prueba2 wants to go standby  
[foreign]
heartbeat[1064]: 2007/03/15_14:26:15 info: standby: acquire [foreign]  
resources from prueba2
heartbeat[1088]: 2007/03/15_14:26:15 info: acquire local HA resources  
(standby).
ResourceManager[1098]:	2007/03/15_14:26:15 info: Acquiring resource  
group: linux-xczz xxx.xxx.x.124/24 serctl
IPaddr[1122]:	2007/03/15_14:26:16 INFO: IPaddr Resource is stopped
ResourceManager[1098]:	2007/03/15_14:26:16 info: Running  
/usr/local/etc/ha.d/resource.d/IPaddr 192.168.1.124/24 start
IPaddr[1321]:	2007/03/15_14:26:16 INFO: eval /sbin/ifconfig eth0:0  
xxx.xxx.x.124 netmask 255.255.255.0 broadcast xxx.xxx.x.255
IPaddr[1321]:	2007/03/15_14:26:16 INFO: Sending Gratuitous Arp for  
xxx.xxx.x.124 on eth0:0 [eth0]
IPaddr[1321]:	2007/03/15_14:26:16 INFO:  
/usr/local/lib/heartbeat/send_arp -i 500 -r 10 -p  
/usr/local/var/run/heartbeat/rsctmp/send_arp/send_arp-xxx.xxx.x.124  
eth0 xxx.xxx.x.124 auto xxx.xxx.x.124 ffffffffffff
IPaddr[1241]:	2007/03/15_14:26:16 INFO: IPaddr Success
ResourceManager[1098]:	2007/03/15_14:26:16 info: Running  
/etc/init.d/serctl  start
heartbeat[1088]: 2007/03/15_14:26:17 info: local HA resource  
acquisition completed (standby).
heartbeat[1064]: 2007/03/15_14:26:17 info: Standby resource  
acquisition done [foreign].
heartbeat[1064]: 2007/03/15_14:26:17 info: Initial resource  
acquisition complete (auto_failback)
heartbeat[1064]: 2007/03/15_14:26:23 info: remote resource transition  
completed.
heartbeat[1064]: 2007/03/15_14:26:28 info: linux-xczz wants to go  
standby [foreign]
heartbeat[1064]: 2007/03/15_14:26:28 info: standby: prueba2 can take  
our foreign resources
heartbeat[1492]: 2007/03/15_14:26:28 info: give up foreign HA  
resources (standby).
ResourceManager[1502]:	2007/03/15_14:26:28 info: Releasing resource  
group: prueba2 xxx.xxx.x.125/24 safe_asterisk
ResourceManager[1502]:	2007/03/15_14:26:28 info: Running  
/etc/init.d/safe_asterisk  stop
ResourceManager[1502]:	2007/03/15_14:26:28 info: Running  
/usr/local/etc/ha.d/resource.d/IPaddr xxx.xxx.x.125/24 stop
IPaddr[1561]:	2007/03/15_14:26:29 INFO: IPaddr Success
heartbeat[1492]: 2007/03/15_14:26:29 info: foreign HA resource release  
completed (standby).
heartbeat[1064]: 2007/03/15_14:26:29 info: Local standby process  
completed [foreign].
heartbeat[1064]: 2007/03/15_14:26:30 WARN: 1 lost packet(s) for  
[prueba2] [68:70]
heartbeat[1064]: 2007/03/15_14:26:30 info: remote resource transition  
completed.
heartbeat[1064]: 2007/03/15_14:26:30 info: No pkts missing from prueba2!
heartbeat[1064]: 2007/03/15_14:26:30 info: Other node completed  
standby takeover of foreign resources.
heartbeat[1064]: 2007/03/15_14:27:00 info: prueba2 wants to go standby  
[foreign]
heartbeat[1064]: 2007/03/15_14:27:11 info: standby: acquire [foreign]  
resources from prueba2
heartbeat[1784]: 2007/03/15_14:27:11 info: acquire local HA resources  
(standby).
ResourceManager[1794]:	2007/03/15_14:27:11 info: Acquiring resource  
group: linux-xczz xxx.xxx.x.124/24 serctl
IPaddr[1818]:	2007/03/15_14:27:11 INFO: IPaddr Running OK
ResourceManager[1794]:	2007/03/15_14:27:11 info: Running  
/etc/init.d/serctl  start
ResourceManager[1794]:	2007/03/15_14:27:11 ERROR: Return code 1 from  
/etc/init.d/serctl
ResourceManager[1794]:	2007/03/15_14:27:11 CRIT: Giving up resources  
due to failure of serctl
ResourceManager[1794]:	2007/03/15_14:27:11 info: Releasing resource  
group: linux-xczz xxx.xxx.x.124/24 serctl
ResourceManager[1794]:	2007/03/15_14:27:11 info: Running  
/etc/init.d/serctl  stop
ResourceManager[1794]:	2007/03/15_14:27:11 info: Running  
/usr/local/etc/ha.d/resource.d/IPaddr xxx.xxx.x.124/24 stop
IPaddr[2090]:	2007/03/15_14:27:12 INFO: /sbin/route -n del -host xxx.xxx.x.124
IPaddr[2090]:	2007/03/15_14:27:12 INFO: /sbin/ifconfig eth0:0  
xxx.xxx.x.124 down
IPaddr[2090]:	2007/03/15_14:27:12 INFO: IP Address xxx.xxx.x.124 released
IPaddr[2006]:	2007/03/15_14:27:12 INFO: IPaddr Success
heartbeat[1784]: 2007/03/15_14:27:12 info: local HA resource  
acquisition completed (standby).
heartbeat[1064]: 2007/03/15_14:27:12 info: Standby resource  
acquisition done [foreign].
heartbeat[1064]: 2007/03/15_14:27:12 info: remote resource transition  
completed.
hb_standby[2228]:	2007/03/15_14:27:42 Going standby [foreign].
heartbeat[1064]: 2007/03/15_14:27:42 info: linux-xczz wants to go  
standby [foreign]
heartbeat[1064]: 2007/03/15_14:27:42 info: standby: prueba2 can take  
our foreign resources
heartbeat[2238]: 2007/03/15_14:27:42 info: give up foreign HA  
resources (standby).
ResourceManager[2248]:	2007/03/15_14:27:43 info: Releasing resource  
group: prueba2 xxx.xxx.x.125/24 safe_asterisk
ResourceManager[2248]:	2007/03/15_14:27:43 info: Running  
/etc/init.d/safe_asterisk  stop
ResourceManager[2248]:	2007/03/15_14:27:43 info: Running  
/usr/local/etc/ha.d/resource.d/IPaddr xxx.xxx.x.125/24 stop
IPaddr[2310]:	2007/03/15_14:27:43 INFO: IPaddr Success
heartbeat[2238]: 2007/03/15_14:27:43 info: foreign HA resource release  
completed (standby).
heartbeat[1064]: 2007/03/15_14:27:43 info: Local standby process  
completed [foreign].
heartbeat[1064]: 2007/03/15_14:27:44 WARN: 1 lost packet(s) for  
[prueba2] [114:116]
heartbeat[1064]: 2007/03/15_14:27:44 info: remote resource transition  
completed.
heartbeat[1064]: 2007/03/15_14:27:44 info: No pkts missing from prueba2!
heartbeat[1064]: 2007/03/15_14:27:44 info: Other node completed  
standby takeover of foreign resources.



What I am trying to have is ser running on linux-xczz and asterisk  
running on prueba2 with failover configured on both machines but  
apparently the failover crashes and I lose both my services if both  
heartbeats are running. Any idea why this happens or what I'm doing  
wrong?.

Can ser and asterisk be run by heartbeat with failover support?

thanxs in advance




More information about the sr-users mailing list