Hi,
Recently upgraded to version 8.0.16 from 7.8.1.
At times(randomly), seeing confd crash soon after loading initial .xml files in phase0
<INFO> 6-Feb-2025::17:01:59.577 localhost confd[2805]: - CDB load: processing file: /var/confd/cdb/0001_arcos_init.xml
<INFO> 6-Feb-2025::17:01:59.938 localhost confd[2805]: - ConfD phase0 started
<INFO> 6-Feb-2025::17:02:24.009 localhost confd[2805]: - Stopping to listen for Internal IPC on 127.0.0.1:4565
<CRIT> 6-Feb-2025::17:02:24.498 localhost confd[2805]: - Internal error: Supervision terminated
A large stack trace follows, but it seems specific to confd internal processes.
Since ConfD boots fine most of the time, I don’t suspect issues with backend daemons or the init.xml
file. All backend applications connect to ConfD only after Phase0 for validation and data callpoints.
Has anyone encountered a similar issue, or is there a recommended way to debug this further?
=ERROR REPORT==== 6-Feb-2025::17:01:57.680055 ===
confd_rcmd,1609,
{noproc,{gen_server,call,[capi_server,get_info,infinity]}},
[{gen_server,call,3,[{file,"gen_server.erl"},{line,234}]},
{confd_rcmd,fmt_c_points,0,[{file,"confd_rcmd.erl"},{line,1574}]},
{confd_rcmd,send_status,2,[{file,"confd_rcmd.erl"},{line,1237}]},
{confd_rcmd,handle_tcp_data,1,
[{file,"confd_rcmd.erl"},{line,841}]},
{proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,234}]}]}
=ERROR REPORT==== 6-Feb-2025::17:02:19.853951 ===
** Generic server cdb_db terminating
** Last message in was {status,50}
** When Server state == {state,
{config,"/var/confd/cdb",
["/var/confd/cdb"],
ramdisk,
[{file_save_log_fun,#Fun<cdb_db.2.51010220>},
{progressf,#Fun<cdb_db.3.51010220>}],
running,true,sync,false},
0,init,
{cdb_init_sess,init,true,undefined,4,<0.146.0>,
undefined,false,[],
{[],[],[],[]},
[],
["/var/confd/cdb"],
{tts_cursor,#Ref<0.3183527211.2701262849.99366>},
undefined,false,undefined},
3,undefined,undefined,normal,undefined,noreply,
{0,0,0},
[],
{xds_ramdisk,
{xds_ram,
{otts,#Ref<0.3183527211.2701262849.99511>,0,
#Ref<0.3183527211.2701131777.99512>},
140694351143280,0,[],[],[],[],140694351143280,
undefined,undefined,undefined},
read,ram_and_wal,disabled,undefined,undefined,
"/var/confd/cdb/A.cdb",raw,0,
{compact_after,50,50},
undefined,4,#Fun<cdb_db.2.51010220>,0,
{xds_wal,"/var/confd/cdb/A.cdb",
{file,
{file_descriptor,raw_file_io_delayed,
#{buffer => #Ref<0.3183527211.2701262849.99518>,
delay_size => 65536,owner => <0.137.0>,
pid => <0.151.0>}}},
raw,[],-1,none,-1},
[]},
[],undefined,undefined,
{subs,[],[],0,[],undefined},
{subs,[],[],0,[],undefined},
notab,[],undefined,undefined,undefined,undefined}
** Reason for termination ==
** {{timeout,{gen_server,call,
[confd_cfg_server,{get,[dbDir,cdb,confdConfig]},50]}},
[{gen_server,call,3,[{file,"gen_server.erl"},{line,234}]},
{confd_cfg_server,do_get,2,[{file,"confd_cfg_server.erl"},{line,141}]},
{cdb_config,get_db_dir,2,[{file,"cdb_config.erl"},{line,31}]},
{cdb_config,cdb_conf_file,1,[{file,"cdb_config.erl"},{line,55}]},
{cdb_db,stat,2,[{file,"cdb_db.erl"},{line,5365}]},
{cdb_db,handle_call,3,[{file,"cdb_db.erl"},{line,1948}]},
{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,677}]},
{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,706}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
** Client <0.270.0> is dead
=CRASH REPORT==== 6-Feb-2025::17:02:20.126601 ===
crasher:
initial call: cdb_db:init/1
pid: <0.137.0>
registered_name: cdb_db
exception exit: {timeout,
{gen_server,call,
[confd_cfg_server,
{get,[dbDir,cdb,confdConfig]},
50]}}
in function gen_server:call/3 (gen_server.erl, line 234)
in call from confd_cfg_server:do_get/2 (confd_cfg_server.erl, line 141)
in call from cdb_config:get_db_dir/2 (cdb_config.erl, line 31)
in call from cdb_config:cdb_conf_file/1 (cdb_config.erl, line 55)
in call from cdb_db:stat/2 (cdb_db.erl, line 5365)
in call from cdb_db:handle_call/3 (cdb_db.erl, line 1948)
in call from gen_server:try_handle_call/4 (gen_server.erl, line 677)
in call from gen_server:handle_msg/6 (gen_server.erl, line 706)
ancestors: [cdb_sup,<0.126.0>]
message_queue_len: 1
messages: [{#Ref<0.3183527211.2701131777.100949>,
{ok,<<"/var/confd/cdb">>}}]
links: [<0.138.0>,<0.146.0>,<0.128.0>]
dictionary: [{config_cache,false}]
trap_exit: true
status: running
heap_size: 6772
stack_size: 27
reductions: 1897662
neighbours:
neighbour:
pid: <0.138.0>
registered_name: cdb_subid_alloc
initial call: cdb_subid:'-start/0-fun-0-'/0
current_function: {cdb_subid,subid_allocator,1}
ancestors: [cdb_db,cdb_sup,<0.126.0>]
message_queue_len: 0
links: [<0.137.0>]
trap_exit: false
status: waiting
heap_size: 233
stack_size: 6
reductions: 19
current_stacktrace: [{cdb_subid,subid_allocator,1,
[{file,"cdb_subid.erl"},{line,40}]},
{proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,234}]}]
=SUPERVISOR REPORT==== 6-Feb-2025::17:02:20.978715 ===
supervisor: {local,cdb_sup}
errorContext: child_terminated
reason: {timeout,{gen_server,call,
[confd_cfg_server,
{get,[dbDir,cdb,confdConfig]},
50]}}
offender: [{pid,<0.137.0>},
{id,cdb_db},
{mfargs,{cdb_db,start_link,[]}},
{restart_type,permanent},
{shutdown,3000},
{child_type,worker}]
=SUPERVISOR REPORT==== 6-Feb-2025::17:02:21.014888 ===
supervisor: {local,cdb_sup}
errorContext: shutdown
reason: reached_max_restart_intensity
offender: [{pid,<0.137.0>},
{id,cdb_db},
{mfargs,{cdb_db,start_link,[]}},
{restart_type,permanent},
{shutdown,3000},
{child_type,worker}]
=ERROR REPORT==== 6-Feb-2025::17:02:21.138051 ===
cdb_capi:1298: handle_client_data/3 failed: exit: {noproc,
{gen_server,call,
[cdb_db,get_init_sess,
infinity]}}
[{gen_server,call,3,[{file,"gen_server.erl"},{line,234}]},
{cdb_capi,do_get_phase,2,[{file,"cdb_capi.erl"},{line,4115}]},
{cdb_capi,handle_setup,3,[{file,"cdb_capi.erl"},{line,1349}]},
{cdb_capi,handle_client_data,3,[{file,"cdb_capi.erl"},{line,1267}]},
{cdb_capi,handle_info,2,[{file,"cdb_capi.erl"},{line,542}]},
{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,653}]},
{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,727}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]
=INFO REPORT==== 6-Feb-2025::17:02:22.096885 ===
application: cdb
exited: shutdown
type: permanent
=ERROR REPORT==== 6-Feb-2025::17:02:22.620281 ===
confd_ia:574: Server capi_server, which registered 3, seems to be down?!
[{confd_ia,'-handle_connection/7-fun-0-',0,[{file,"confd_ia.erl"},{line,575}]},
{confd_ia,handle_connection,7,[{file,"confd_ia.erl"},{line,575}]},
{confd_ia,acceptor,5,[{file,"confd_ia.erl"},{line,527}]},
{proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,234}]}]
"Internal error: Supervision terminated\n"
=ERROR REPORT==== 6-Feb-2025::17:02:25.547474 ===
init:boot_msg: "Internal error: Supervision terminated\n"
[IFMGR] INTERNAL ERROR: confd_internal.c(2979): Failed to decode data
[MACSEC] INTERNAL ERROR: confd_internal.c(2979): Failed to decode data