OpenLDAP Logging for Troubleshooting

Pushpalanka Jayawardhana
8 min readMar 4, 2024

Recently I had the opportunity to be a part of a root cause analysis of a LDAP unavailability issue. We could see the LDAP was fully utilizing it’s available number of threads, which caused a slowness, then timeouts and finally making the server unavailable for a time.

  • With LDAP metrics we could see the LDAP operations of bind, unbind, search or extended operations numbers are steady and haven’t been increased.
  • Audit log confirmed a usual level of modifications on the LDAP.
  • We had few cron jobs running in the machine, but none of them collide with the unavailability starting time.
  • Machine CPU/RAM/Network usages have been normal.
  • A slow client keeping the threads occupied can be a reason, but we don’t have evidence. This could also be due to a missing index in LDAP as we well.
  • The LDAP node is configured to sync with a replica node, so this sync operation could have occupied some LDAP resources.

At this occasion we understood we need more insights into what has been going on, if it ever occurs again.

Metrics

One option was capturing more metrics from the metrics endpoint and keeping the history. It provides below details, which can be a gold mine when understanding…

--

--