Receiving a Validation Timeout Before Reaching Configured Timeout Time

Hello!

We’re having a problem in a YANG validation, where a timeout is thrown before reaching the time configured for the timeout.

The YANG snippet which calls the validation looks like this:

    augment /router-mpls:mpls/l2-vpn:l2vpn-config/l2-vpn:l2vpn {
        container vpn-common-validation {
            tailf:hidden full;
            tailf:cli-drop-node-name;
            tailf:cli-suppress-validation-warning-prompt;
            tailf:validate vpn_app_validation {
                tailf:opaque "vlm_memb_valp";
                tailf:dependency "/dot1q:dot1q/dot1q:vlan/dot1q:interface/dot1q:interface-name";
            }
        }
    }

The validation function opens a socket and calls maapi_connect():

    socket_ = socket(PF_INET, SOCK_STREAM, 0);
	...
	maapi_connect(socket_, (struct sockaddr *)&addr_, sizeof(struct sockaddr_in));

After that, the validation is made.

To reproduce the behavior, I put the following code inside the validation, just to simulate a large validation process, and to proove that the validation by itself isn’t the issue:

  confd_data_set_timeout(tctx, 250);
  for(uint16_t i=0; i<100; i++) {
      sleep(1);
  }

This validation should complete with success, but after 60 to 100 seconds, a timeout occurs.

It’s not deterministic. But it’s way more frequent when the equipment is receiving lots of SNMP requests. Using SNMP to increase the reproduction rate, it happens after 3 to 6 tries, usually.

Some logs we collected:

var/log/confd.log:
  <CRIT> 7-Jan-2025::12:11:28.826 confd[5374]: - Daemon config_validationd timed out
  
var/log/confd_devel.log:
  <DEBUG> 7-Jan-2025::12:11:23.098 confd[<0.124.0>]: devel-c close_usess db request daemon id: 77
  <ERR> 7-Jan-2025::12:11:28.824 confd[<0.124.0>]: devel-c Control socket request timed out daemon config_validationd id 77

If the validation function takes more time than the configured, it throws a timeout correctly:

  confd_data_set_timeout(tctx, 280);
  for(uint16_t i=0; i<300; i++) {
    sleep(1);
  }

So, is there a way to prevent a timeout from happening before reaching the configured timeout time? Are we using the validation as it should?

Using ConfD version 8.0.6

Thanks in advance!

confd_data_set_timeout() sets the timeout daemon to respond to a worker socket query, not a control socket request.

From the confd_lib_dp(3) man page:

*Note*
All the callbacks that are invoked via these sockets are subject to timeouts configured
in confd.conf, see confd.conf(5). The callbacks invoked via the control socket must
generate a reply back to ConfD within the time configured for /confdConfig/capi/
newSessionTimeout, the callbacks invoked via a worker socket within the time configured
for /confdConfig/capi/queryTimeout. If either timeout is exceeded, the daemon will
be considered dead, and ConfD will disconnect it by closing the control and worker sockets.

Since the developer log reports that the control socket request timed out, what’s your confd.conf /confdConfig/capi/newSessionTimeout setting?

From the confd.conf(5) man page:

/confdConfig/capi/newSessionTimeout (xs:duration) [PT30S]
Timeout for a daemon to respond to a control socket request, see confd_lib_dp(3). If the daemon fails
to respond within the given time, it will be disconnected

You nailed it! Thank you so much!
The newSessionTimeout parameter wasn’t configured in confd.conf.
One question though: any idea on why was I receiving a timeout after 80 or 90 seconds after starting the validation if the default newSessionTimeout was 30 seconds?

The developer log with the log level set to ”trace” will likely provide insights into why the timeout occurs when it does