View Issue Details

IDProjectCategoryView StatusLast Update
0003211SOGoBackend Generalpublic2015-11-03 19:47
Reporterwimmer Assigned Toludovic  
PriorityhighSeveritycrashReproducibilityalways
Status closedResolutionfixed 
Platform[Server] LinuxOSDebianOS Version8 (Jessie)
Fixed in Version2.3.3 
Summary0003211: "[ERROR] descriptor is not a socket descriptor" by using gnustep-base1.24
Description

sogod child processes lose their "port 20000" listening on Debian 8 with gnustep-base1.24. They stay running, but they are unusable. After while when sogod master process has no child listening on port 20000, SOGo goes to endless loop and fills log with lot of error messages:

May 08 17:20:41 sogod [14360]: [ERROR] <0x0x7ff1727c5cd0[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7ff17281f830> NAME:NGCouldNotAcc
eptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null)

It is the same when I use sogo-2.2.9+git20141017-1 from jessie repository, sogo-2.2.17a-1 from testing repository or when I compile sogo from source code.
I tried it on two different servers.
When I downgrade gnustep-base to version 1.22.1 from wheezy and recompile sogo, problem goes away.

TagsNo tags attached.

Activities

wimmer

wimmer

2015-05-09 13:18

reporter   ~0008458

I found that this issue has no relation to gnustep-base1.24, but to libldap-2.4-2.
When I downgraded gnustep-base to 1.22, libldap-2.4-2 was downgraded due to dependencies to 2.4.31-2 together with it - and in fact this change eliminated issue.
Today I tried upgrade libldap-2.4-2 back to jessie version 2.4.40+dfsg-1 and issue came back again. When I upgraded gnustep-base to jessie version 1.24 and used downgraded libldap from wheezy 2.4.31-2, sogo works fine.

Summary - when I use all packages from jessie except libldap and I compile sogo from source code, I have no problem.
The question is if this issue has some relation to version of backend LDAP server. I will ask what version we use.

inty

inty

2015-06-09 17:10

reporter   ~0008607

Confirmed here... Just upgraded to SOGo 2.3.0 and Debian Jessie and ran into this issue. Fixed it by downgrading libldap just like wimmer did.

My LDAP server is slapd 2.4.31 (running on wheezy) if that makes any difference.

wimmer

wimmer

2015-06-18 18:49

reporter   ~0008654

This issue has no relation to version of remote LDAP server. I tried to use LDAP servers running on both Wheezy and Jessie with the same results.

It looks it has some relation to SSL.
Here is my setup - I'm running SOGo on Debian jessie with no downgraded packages - so system includes ldap packages version 2.4.40 .

1) when I define LDAP server in SOGoUserSources as:
type = ldap;
hostname = ldaps://server:637;

I have mentioned problem

2) when I defince LDAP server without SSL:
type = ldap;
hostname = ldap://localhost:387;

and I use stunnel for encrypted connection localhost:387 -> server:637
the problem is not there.
Need to say I have no problem by using ldapsearch to ldaps://server:637 from the SOGo server.

libldap-2.4.so.2 from wheezy uses libgnutls.so.26
while libldap-2.4.so.2 from jessie uses libgnutls-deb0.so.28

Phillip Leder

Phillip Leder

2015-06-23 09:52

reporter   ~0008662

As of today I also have this problem. In addtion to SOGo crashing the logs are getting spammed with
Jun 23 11:20:48 sogod [11738]: [ERROR] <0x0x7f834ce60820[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7f8323e3dee0> NAME:NGCouldNotAcceptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null)
resulting in a logsize over 1GB over the course of a few hours.

If anyone could look into this problem it would be much appreciated.

pabelenda

pabelenda

2015-07-10 14:24

reporter   ~0008722

Last edited: 2015-07-10 14:26

I have experienced the same issue.

  • Operative System: Debian Jessie 64 bits
  • SOGo version: 2.2.9+git20141017-1
  • libldap: libldap-2.4-2 (2.4.40+dfsg-1)

I have followed this instructions [https://lists.inverse.ca/sogo/arc/users/2015-06/msg00176.html] in order to keep things working. Since then I have not experienced any problems. Before this, I have made some captures that may be usefull for debugging. The user activity by the moment I start debbuging was very low, only me loggin into the sogo web. Thanks in advance for checking this.

I have just catch sogo when it was starting to go crazy and generated a backtrace with gdb:

{{{
Thread 1 (Thread 0x7f6ab9825740 (LWP 3241)):
#0 0x00007f6ab60254f0 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:81
0000001 0x00007f6ab6dad11e in poll (timeout=895, nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
0000002 -[GSRunLoopCtxt pollUntil:within:] (self=<optimized out>, _cmd=0x7f6ab7192900 <_OBJC_SELECTOR_TABLE+1216>, milliseconds=895, contexts=<optimized out>) at GSRunLoopCtxt.m:430
0000003 0x00007f6ab6ceb840 in -[NSRunLoop acceptInputForMode:beforeDate:] (self=0x7f6abba7c0a0, _cmd=0x2, mode=0x7f6ab7193770 <_OBJC_INSTANCE_2>, limit_date=0x7f6abba86380) at NSRunLoop.m:1217
0000004 0x00007f6ab6cebcf2 in -[NSRunLoop runMode:beforeDate:] (self=0x7f6abba7c0a0, _cmd=0x7f6abb9ed830, mode=0x7f6ab7193770 <_OBJC_INSTANCE_2>, date=0x7f6ae05af420) at NSRunLoop.m:1288
0000005 0x00007f6ab84a7b8e in -[WOCoreApplication run] (self=0x7f6abba60a20, _cmd=<optimized out>) at WOCoreApplication.m:584
0000006 0x00007f6ab98505de in _start ()
}}}

Also attached strace

{{{
[...]
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}], 2, 4119) = 1 ([{fd=6, revents=POLLIN}])
recvfrom(6, "\0\0\0\0", 4, 0, NULL, NULL) = 4
accept(3, 0x7f6ae4528850, [16]) = -1 ENOTSOCK (Socket operation on non-socket)
write(2, "Jul 10 15:13:22 sogod [3241]: [E"..., 241) = 241
write(2, "\n", 1) = 1
sendto(6, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
sendto(6, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}], 2, 4118) = 1 ([{fd=6, revents=POLLIN}])
recvfrom(6, "\0\0\0\0", 4, 0, NULL, NULL) = 4
accept(3, 0x7f6ae4528e00, [16]) = -1 ENOTSOCK (Socket operation on non-socket)
write(2, "Jul 10 15:13:22 sogod [3241]: [E"..., 241) = 241
write(2, "\n", 1) = 1
sendto(6, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
sendto(6, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}], 2, 4117) = 1 ([{fd=6, revents=POLLIN}])
recvfrom(6, "\0\0\0\0", 4, 0, NULL, NULL) = 4
accept(3, 0x7f6ae45293b0, [16]) = -1 ENOTSOCK (Socket operation on non-socket)
write(2, "Jul 10 15:13:22 sogod [3241]: [E"..., 241) = 241
write(2, "\n", 1) = 1
sendto(6, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
sendto(6, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}], 2, 4115) = 1 ([{fd=6, revents=POLLIN}])
recvfrom(6, "\0\0\0\0", 4, 0, NULL, NULL) = 4
accept(3, 0x7f6ae4529960, [16]) = -1 ENOTSOCK (Socket operation on non-socket)
write(2, "Jul 10 15:13:22 sogod [3241]: [E"..., 241) = 241
write(2, "\n", 1) = 1
sendto(6, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
sendto(6, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}], 2, 4114) = 1 ([{fd=6, revents=POLLIN}])
recvfrom(6, "\0\0\0\0", 4, 0, NULL, NULL) = 4
accept(3, 0x7f6ae4529f10, [16]) = -1 ENOTSOCK (Socket operation on non-socket)
write(2, "Jul 10 15:13:22 sogod [3241]: [E"..., 241) = 241
write(2, "\n", 1) = 1
sendto(6, "\0\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
sendto(6, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4
poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}], 2, 4113) = 1 ([{fd=6, revents=POLLIN}])
recvfrom(6, "\0\0\0\0", 4, 0, NULL, NULL) = 4
[...]
}}}

The log file was full of:

{{{
Jul 10 15:08:58 sogod [3231]: [ERROR] <0x0x7f6abbaad020[WOWatchDog]> No child available to handle incoming request!
Jul 10 15:08:58 sogod [3241]: [ERROR] <0x0x7f6abba0cf70[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7f6abbdf7590> NAME:NGCouldNotAcceptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null)
Jul 10 15:08:58 sogod [3231]: [ERROR] <0x0x7f6abbaad020[WOWatchDog]> No child available to handle incoming request!
Jul 10 15:08:58 sogod [3241]: [ERROR] <0x0x7f6abba0cf70[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7f6abbdf7810> NAME:NGCouldNotAcceptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null)
Jul 10 15:08:58 sogod [3231]: [ERROR] <0x0x7f6abbaad020[WOWatchDog]> No child available to handle incoming request!
Jul 10 15:08:58 sogod [3241]: [ERROR] <0x0x7f6abba0cf70[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7f6abbdf7400> NAME:NGCouldNotAcceptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null)
Jul 10 15:08:58 sogod [3231]: [ERROR] <0x0x7f6abbaad020[WOWatchDog]> No child available to handle incoming request!
Jul 10 15:08:58 sogod [3241]: [ERROR] <0x0x7f6abba0cf70[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7f6abbdf70c0> NAME:NGCouldNotAcceptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null)
Jul 10 15:08:58 sogod [3231]: [ERROR] <0x0x7f6abbaad020[WOWatchDog]> No child available to handle incoming request!
Jul 10 15:08:58 sogod [3241]: [ERROR] <0x0x7f6abba0cf70[WOHttpAdaptor]> http server caught: <NGCouldNotAcceptException: 0x7f6abbdf67d0> NAME:NGCouldNotAcceptException REASON:Could not accept: descriptor is not a socket descriptor INFO:(null
}}}

In few minutes the log size almost reach 1 GB.

ludovic

ludovic

2015-07-22 16:46

administrator   ~0008756

I suspect ABI issues between librairies. Our chroots were updated and all integration tests work. Retry with the upcoming 2.3.1 version.

wimmer

wimmer

2015-07-22 17:15

reporter   ~0008757

I used current sogo and sope git sources just now and I see the same problem.

ludovic

ludovic

2015-07-22 17:43

administrator   ~0008758

If you DO NOT use SSL for LDAP, does it work?

wimmer

wimmer

2015-07-22 18:13

reporter   ~0008759

Yes.
I tried configuration mentioned on http://www.sogo.nu/bugs/view.php?id=3211#c8654 just now and it works without problem.

TRS-80

TRS-80

2015-07-27 11:45

reporter   ~0008775

We get the same error, plus it uses up all the RAM in the server. Versions:

ii libldap-2.4-2:i386 2.4.31+really2.4.4 i386 OpenLDAP libraries
ii libsope-ldap4.9 4.9.r1664.20150723 i386 SOPE libraries for LDAP access
ii sogo:i386 2.3.1-1 i386 a modern and scalable groupware
ii libgnutls-deb0-28:i386 3.3.8-6~bpo70+1 i386 GNU TLS library - main runtime library

mattronix

mattronix

2015-09-09 08:57

reporter   ~0008896

Last edited: 2015-09-09 09:11

so i can confirm that pinning it at a lower version 2.3.1 fixed our initial issues (LDAP login causing sogo to crash) until we started doing lots of requests to the address book then we changed sogo to use non SSL LDAP.

I can confirm all issues disappear after not using LDAPS.

ludovic

ludovic

2015-10-23 12:50

administrator   ~0009021

Here is the problem. Upon start of the sogod deamon, we have:

ls -l /proc/4193/fd

total 0
l-wx------ 1 sogo sogo 64 Oct 23 14:04 0 -> /dev/null
lrwx------ 1 sogo sogo 64 Oct 23 14:04 1 -> /dev/pts/1
lrwx------ 1 sogo sogo 64 Oct 23 14:04 2 -> /dev/pts/1
lr-x------ 1 sogo sogo 64 Oct 23 14:04 3 -> pipe:[96893]
l-wx------ 1 sogo sogo 64 Oct 23 14:04 4 -> pipe:[96893]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 5 -> socket:[97193]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 6 -> socket:[96903]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 7 -> socket:[97200]

GNUstep's NSThread instance creates a pipe for its runloop (with file descriptor 3) so that other threads can signal it.

When SOGo binds to the LDAP server, the file descriptor 3 gets closed:

#0 close () at ../sysdeps/unix/syscall-template.S:81
0000001 0x00007ffff3e70ee3 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls-deb0.so.28
0000002 0x00007ffff3e70f06 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls-deb0.so.28
0000003 0x00007ffff3dd0c56 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls-deb0.so.28
0000004 0x00007ffff3de1d4f in gnutls_global_set_mutex () from /usr/lib/x86_64-linux-gnu/libgnutls-deb0.so.28
0000005 0x00007ffff3735c06 in tls_init (impl=0x7ffff394e420 <ldap_int_tls_impl>) at tls2.c:170
0000006 0x00007ffff3736ce3 in ldap_int_tls_start (ld=ld@entry=0x555555a9ff40, conn=conn@entry=0x555555aa0210, srv=srv@entry=0x555555aa0060) at tls2.c:838
0000007 0x00007ffff3710fbd in ldap_int_open_connection (ld=ld@entry=0x555555a9ff40, conn=conn@entry=0x555555aa0210, srv=0x555555aa0060, async=async@entry=0)
at open.c:448
0000008 0x00007ffff3723fad in ldap_new_connection (ld=ld@entry=0x555555a9ff40, srvlist=srvlist@entry=0x555555aa0368, use_ldsb=use_ldsb@entry=1,
connect=connect@entry=1, bind=bind@entry=0x0, m_req=m_req@entry=0, m_res=0) at request.c:487
0000009 0x00007ffff371067a in ldap_open_defconn (ld=ld@entry=0x555555a9ff40) at open.c:41
0000010 0x00007ffff3725458 in ldap_send_initial_request (ld=ld@entry=0x555555a9ff40, msgtype=msgtype@entry=96,
dn=dn@entry=0x555555c4b5a8 "uid=sogo-tests1,ou=sogo-build-tests,dc=inverse", ber=ber@entry=0x555555aa0660, msgid=msgid@entry=1) at request.c:130
0000011 0x00007ffff371a005 in ldap_sasl_bind (ld=0x555555a9ff40, dn=0x555555c4b5a8 "uid=sogo-tests1,ou=sogo-build-tests,dc=inverse", mechanism=<optimized out>,
cred=0x7fffffffbbf0, sctrls=0x0, cctrls=<optimized out>, msgidp=0x7fffffffbc0c) at sasl.c:148
0000012 0x00007ffff659ba42 in -[NGLdapConnection bindWithMethod:binddn:credentials:] (self=0x5555559052a0, _cmd=0x7ffff7dbe180 <_OBJC_SELECTOR_TABLE+640>,
_method=0x7ffff7dbc780 <_OBJC_INSTANCE_54>, _login=0x555555aa01c0, _cred=0x555555885f70) at NGLdapConnection.m:303
0000013 0x00007ffff7b53500 in -[LDAPSource checkLogin:password:perr:expire:grace:] (self=0x555555adeec0, _cmd=0x7ffff7db9f40 <_OBJC_SELECTOR_TABLE+704>,
_login=0x555555885f30, _pwd=0x555555885f70, _perr=0x7fffffffbfbc, _expire=0x7fffffffbfb8, _grace=0x7fffffffbfb4) at LDAPSource.m:563

So the file descriptor gets bogus - as it gets replaced by /dev/urandom for TLS handling:

ls -l /proc/4193/fd

total 0
l-wx------ 1 sogo sogo 64 Oct 23 14:04 0 -> /dev/null
lrwx------ 1 sogo sogo 64 Oct 23 14:04 1 -> /dev/pts/1
lrwx------ 1 sogo sogo 64 Oct 23 14:21 10 -> socket:[97266]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 2 -> /dev/pts/1
lr-x------ 1 sogo sogo 64 Oct 23 14:04 3 -> /dev/urandom
l-wx------ 1 sogo sogo 64 Oct 23 14:04 4 -> pipe:[96893]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 5 -> socket:[97193]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 6 -> socket:[96903]
lrwx------ 1 sogo sogo 64 Oct 23 14:04 7 -> socket:[97260]
lrwx------ 1 sogo sogo 64 Oct 23 14:21 8 -> socket:[97262]
lrwx------ 1 sogo sogo 64 Oct 23 14:21 9 -> socket:[97264]

This makes GNUstep enter into an infinite loop - since /dev/urandom is non-blocking and provides a continuous stream of byte.

The culprit here is either libgnutls or libldap passing wrong data to libgnutls. Latest versions of both (coming from Stretch) - 2.4.42+dfsg-2 for libldap and 3.3.18-1 for libgnutls also expose the issue.

We will continue our investigation a bit, but a work around in SOPE/SOGo is almost impossible for this bug.

ludovic

ludovic

2015-11-03 19:47

administrator   ~0009070

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=803197

We reverted a patch in SOPE but the real issue should be fixed in libldap.

Try the nightly builds and report issues, if any.

Issue History

Date Modified Username Field Change
2015-05-08 19:55 wimmer New Issue
2015-05-09 13:18 wimmer Note Added: 0008458
2015-06-09 17:10 inty Note Added: 0008607
2015-06-18 18:49 wimmer Note Added: 0008654
2015-06-23 09:52 Phillip Leder Note Added: 0008662
2015-07-10 14:24 pabelenda Note Added: 0008722
2015-07-10 14:25 pabelenda Note Edited: 0008722
2015-07-10 14:26 pabelenda Note Edited: 0008722
2015-07-10 14:26 pabelenda Note Edited: 0008722
2015-07-22 16:46 ludovic Note Added: 0008756
2015-07-22 16:46 ludovic Status new => resolved
2015-07-22 16:46 ludovic Fixed in Version => 2.3.1
2015-07-22 16:46 ludovic Resolution open => fixed
2015-07-22 16:46 ludovic Assigned To => ludovic
2015-07-22 17:15 wimmer Note Added: 0008757
2015-07-22 17:15 wimmer Status resolved => feedback
2015-07-22 17:15 wimmer Resolution fixed => reopened
2015-07-22 17:43 ludovic Note Added: 0008758
2015-07-22 18:13 wimmer Note Added: 0008759
2015-07-22 18:13 wimmer Status feedback => assigned
2015-07-27 11:45 TRS-80 Note Added: 0008775
2015-09-09 08:57 mattronix Note Added: 0008896
2015-09-09 09:09 mattronix Note Edited: 0008896
2015-09-09 09:11 mattronix Note Edited: 0008896
2015-10-23 12:50 ludovic Note Added: 0009021
2015-11-03 19:47 ludovic Note Added: 0009070
2015-11-03 19:47 ludovic Status assigned => closed
2015-11-03 19:47 ludovic Resolution reopened => fixed
2015-11-03 19:47 ludovic Fixed in Version 2.3.1 => 2.3.3