View Issue Details

IDProjectCategoryView StatusLast Update
0003446SOGoSOPEpublic2018-06-12 11:55
Reporterlpouzenc Assigned Toludovic  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Platform[Server] LinuxOSDebianOS Version8 (Jessie)
Product Version2.2.9 
Fixed in Version4.0.1 
Summary0003446: Stuck processses sogod, high CPU, no syscall : infinite loop while reading SSL socket
Description

Hi,

We have apt-get install sogo from Debian Jessie, make some conf with LDAP + CAS, put into production and we see sogod process from the pool at high CPU usage. This happens many times per hour with 3500 potential users (but not much parallel requests seen in logs).

On those process, strace -p <pid> says : no syscalls.

gdbserver / ddd says : infinite loop in sope-2.2.9/sope-core/NGStreams/NGByteBuffer.m line 247, in a curiously named function la() with some "//TODO" in it, copyrighted from 2000 to 2005.

The infinite loop, with some ellipsis :

readStram=YES;
while (readStream) {
desiredBytes = 738;
cntReadBytes = self->readBytes(... ,desiredBytes) // readBytes always return 0
if (cntReadBytes == NGStreamError) {
break; // Never reached
} else {
if (cntReadBytes == desiredBytes)
readStream = NO; // Never reached
} else {
while (cntReadBytes > 0) {
// [...] // Never reached
}
}
}

Function readBytes(...) calls nearly directly GNUTLS :

(unsigned)readBytes:(void *)_buf count:(unsigned)_len {
// [...]
ret = gnutls_record_recv((gnutls_session_t) self->session, _buf, _len);
if (ret < 0)
return NGStreamError;
else
return ret;
}

Steps To Reproduce

I'm unsure about the about the particular context that triggers gnutls_record_recv() to constantly return 0, but when the context implies that, then the infinite loop in NGByteBuffer.m is always reproductible.

It may a lost contact with a user using WiFi or so ?

Additional Information

gnutls may have sligthy changed behavior across versions about null reads / EOF ?

man page gnutls_record_recv from Jessie's gnutls-doc (3.3.8-6+deb8u3)
Some man page version says :

RETURNS
The number of bytes received and zero on EOF (for stream connections). A negative error code is returned in case of an error. The number of bytes received might be less than the requested data_size.

TagsNo tags attached.

Activities

lpouzenc

lpouzenc

2016-01-21 05:45

reporter  

fix-sogo-infinite-loops.patch (573 bytes)   
Fix : NGActiveSSLSocket.m causes infinite loop on some sockets in readBytes()
Index: sope-2.2.9/sope-core/NGStreams/NGActiveSSLSocket.m
===================================================================
--- sope-2.2.9.orig/sope-core/NGStreams/NGActiveSSLSocket.m	2014-09-26 20:38:11.000000000 +0200
+++ sope-2.2.9/sope-core/NGStreams/NGActiveSSLSocket.m	2016-01-20 09:19:06.059619399 +0100
@@ -92,7 +92,7 @@
 
 
   ret = gnutls_record_recv((gnutls_session_t) self->session, _buf, _len);
-  if (ret < 0)
+  if (ret <= 0)
     return NGStreamError;
   else
     return ret;
lpouzenc

lpouzenc

2016-01-21 05:53

reporter   ~0009308

Please find an attached patch that erradicates the stuck processes on my production setup. It includes "read after EOF" case as NGStreamError.

No side effects found last 2 days but consider it as highly experimental.

Hoping for the best,
Ludovic

ludovic

ludovic

2016-04-08 12:50

administrator   ~0009944

I don't think it's the right fix.

gnutls_record_recv() returns 0 because EOF is reached, not because there's an error (< 0). So that code leads to unwanted code paths.

ludovic

ludovic

2018-06-12 11:55

administrator   ~0012915

I fixed a similar issue in SOPE for OpenSSL (https://github.com/inverse-inc/sope/commit/2f26952009f622f97a43921a6cfdafb79b8f46f6) for which SSL_read clearly has special error-meaning when it reads 0.

Issue History

Date Modified Username Field Change
2016-01-19 12:07 lpouzenc New Issue
2016-01-21 05:45 lpouzenc File Added: fix-sogo-infinite-loops.patch
2016-01-21 05:53 lpouzenc Note Added: 0009308
2016-04-08 12:50 ludovic Note Added: 0009944
2016-04-08 12:50 ludovic Severity major => minor
2018-06-12 11:55 ludovic Note Added: 0012915
2018-06-12 11:55 ludovic Status new => closed
2018-06-12 11:55 ludovic Assigned To => ludovic
2018-06-12 11:55 ludovic Resolution open => fixed
2018-06-12 11:55 ludovic Fixed in Version => 4.0.1