Solvedicinga2 2.11 RC1: Nessus Scan crash the Windows-Client.

Describe the bug

We tested the 2.11 RC1 against the Nessus Scan again. So it's a litte bit a follow up from #6559: the Linux Client survived on CentOS 7 - Great! But not on our Windows System. Here the service is stopped after the scan. 2.11 RC1 is installed on a Windows Server 2012 R2 Standard.

If you Need something more pls tell us.

To Reproduce

Scan with a security scanner against a Windows system

Expected behavior

The Windows Client should also not Crash like the Linux Client.

Your Environment

Include as many relevant details about the environment you experienced the problem in

  • Version used (icinga2 --version): 2.11 RC1
  • Operating System and version: Windows Server 2012 R2 Standard with .NET Framework 4.8
19 Answers

✔️Accepted Answer

TL;DR - 2 weeks with emotional ups and downs and a lot of technical background learned. Icinga now survives a Nessus scan.

Screen Shot 2019-09-09 at 17 46 50

Analysis

Setup

Scanner Setup

Both Nessus and OpenVAS work. Thanks to Stefan & Philipp for providing a setup documentation for Nessus Essential. Won't be provided for security reasons here.

Wireshark TCP Dump

https://www.netways.de/blog/2018/07/26/ssl-geknackt-naja-fast/

Change the cipher_list, add the node's private key for following the TLS traffic, extract the sent malformed requests.
This worked to mitigate the request which caused the immediate crash. Unfortunately this isn't always the same, and we found another way
to trigger the crash as well.

Request Analysis

Only failing requests would lead to stack buffer underrun or memory corruption errors.

Memory Analysis

  • ucrtbase.dll is the Windows Universal C Runtime. Something like libstdc++ on Linux, where all memory allocations etc. are provided.
  • A crash within new() or free() typically means that the application does not crash from a current problem, but memory is already corrupted

Origin

Boost Beast HTTP Parser

http::async_read_header may throw exceptions leading into crashes. In order not to populate the exception
stack, it sounds reasonable to move to error_code handling anyways.

Boost ASIO Socket IO

Same strand in m_IoStrand as with other operations. No special influence here.

boostorg/asio#154
https://www.boost.org/doc/libs/1_47_0/doc/html/boost_asio/overview/core/strands.html

http://www.crazygaze.com/blog/2016/03/17/how-strands-work-and-why-you-should-use-them/

Boost Coroutines

ASIO wraps them and makes them easier to use.

https://stackoverflow.com/questions/30557582/what-does-boostasiospawn-do

Coroutine Stack Unwinding on Complete

https://stackoverflow.com/questions/9286550/is-there-any-method-that-causes-whole-stack-frame-unwinding-in-c-except-usin

However, I've found some critical flaw. Any user code that "swallow exception" as below will completely break the assumption of cooperative scheduling.

try {
    ...
} catch(...) { // ContextClosingException 
    // do nothing, just swallow exception.
}

So I need to find other way to invoke stack unwinding (or any other way to destruct stack object in the continuation). Standard conformance way would be nice - but the continuation implementation itself is dependent on platform specific API, so non-portable way would be acceptable. (I'm using win32)

Swallowing exceptions is evil with Boost Coroutines.

Root Cause

Coroutines work in the way that they allocate their own stack. Whenever an exception is thrown,
it is expected to be caught or to terminate the coroutine.

Exceptions are asynchronous and put onto the stack. This stack is also used by the Coroutine to
store the function pointer when yielding the operation.

At some point, the thrown exceptions corrupt the stack. Only on Windows, only with Visual Studio as compiler.

In order to trigger this problem, either fire many Nessus Scan Attacks against the HTTP API.
Or you'll go with something via curl/debug console which always triggers an exception.

The easiest way to reproduce this is to use the debug console and evaluate a script expression
which is incorrect. At first glance, the generated ScriptError exception somehow works. Later iterations
always lead into a crash.

C:\Program Files\ICINGA2\sbin> .\icinga2.exe console --connect 'https://root:icinga@localhost:5665/' --eval 'get_host("bla)'

Note

Rewriting the code parts which invoke std::invalid_argument do not fix this, the
memory is still corrupted. Also, EnsureValidHeaders() is passed without throwing
any exception.

We really need to work on the exceptions themselves.

Coroutines and Context Stack

Our (exception) stack is foobar at some point. Either Beast exceptions throw, our own ScriptError exceptions throw and something asio related itself.

One can manipulate the coroutine stack size allocated by Boost.Context. This defaults to 64KB.

Default constructor using boost::context::default_stacksize(), does unwind the stack after coroutine/generator is complete.

https://gist.github.com/chfast/bfc1e1af4176960b4722#file-context-cpp-L85

typedef ctx::simple_stack_allocator<
    8 * 1024 * 1024, // 8MB
    64 * 1024, // 64kB
    8 * 1024 // 8kB
>       stack_allocator;

https://emcl-gitlab.iwr.uni-heidelberg.de/hiflow3.org/hiflow3/blob/c7a3ea7216a9cb15a638112dfb17732e932cc265/contrib/boost_libraries/libs/context/performance/winfiber/performance.cpp

Sets the default stack size to 8 MB. 64MB. Does not immediately crash, but later.

https://www.boost.org/doc/libs/1_68_0/libs/context/doc/html/context/stack.html

Exceptions

http://boost.2283326.n4.nabble.com/corountine2-context-forced-unwind-exception-on-Windows-with-TDM-GCC-td4706583.html

forced_unwind is part of the internal plumbing of Boost.Context and can 
be thrown from any yield point when the coroutine is destroyed while 
suspended.  It's required that this exception is not swallowed and the 
whole call stack between the yield point and the "outside" of the 
coroutine is not noexcept. 

raptorjit/raptorjit#110

https://asio-users.narkive.com/QP9g9dj9/can-t-catch-boost-asio-exceptions-inside-coroutines

Are you linking to pre-built boost DLL's provided by MXE? If so, the
DLL is probably not seeing the exception handler created by your
program, probably because the compiler is not creating the exception
handler in the way expected by the DLL. If that is the case, there
might be some compiler command line options that would fix it, or
possibly the compiler was not built correctly to support exceptions
coming from DLL's. Linking to a static boost library instead of a DLL
might fix it though either way.

http://boost.2283326.n4.nabble.com/context-Crash-in-case-of-exception-from-a-context-td4672953.html

[context] Crash in case of exception from a context

I use boost::context to implement cooperative multitasking. At the moment I use boost 1.57 on Windows 7 with VisualStudio 2013. 

Everything works fine until I throw an exception from a context created by "boost::context::make_fcontext(.......)".  I have seen
someone had the same problem and it was caused by the Windows OS, that checks the exception hierarchy in the stack and if it finds
that the starting point is not the one that should come from the OS then it terminates the process.

It suspects some malitious code that corrupted the stack. 

https://www.acodersjourney.com/top-15-c-exception-handling-mistakes-avoid/

Beyond knowing how to use the try/catch syntax, one of the fundamental concepts to know regarding C++ exception handling is the concept
of Stack Unwinding.

When an exception is thrown and control passes from a try block to a handler, the C++ run time calls destructors for all automatic objects
constructed since the beginning of the try block. This process is called stack unwinding. The automatic objects are destroyed in reverse order
of their construction. If an exception is thrown during construction of an object consisting of sub-objects or array elements, destructors are
only called for those sub-objects or array elements successfully constructed before the exception was thrown.

Why should you know this ? Because this'll help you understand the exception handling tips and tricks for making your code robust and efficient.
A full discussion of the Stack Unwinding process is beyond the scope of this article  but here is an excellent reference from msdn : 
https://msdn.microsoft.com/en-us/library/hh254939.aspx.

https://docs.microsoft.com/en-us/cpp/cpp/exceptions-and-stack-unwinding-in-cpp?view=vs-2019

Mistake # 15: Swallowing Exceptions
Swallowing critical exceptions will cause your program to do either of two thingsto fail in unexpected ways downstream or prevent the program from fulfilling its purpose. Sometimes programmers will catch any exception via catch() and then swallow them . This is usually done for exceptions that the programmer did not foresee happening. However, this can lead to downstream failuresometimes with no obvious reason for the failure since the stacktrace disappears with the swallowed exception.

If you must swallow exceptions, make sure that you Log the exception as well as document them in code and in your documentation. 

https://groups.google.com/forum/#!topic/boost-list/6oHRdLmuQ5k

The documentation for boost context contains the following warning: "Do not jump from inside a catch block and then re-throw the exception in another continuation. "
I can attempt to answer "why" from a high-level point of view: I think is that it could be summarized as: the exception machinery is not reentrant. Throwing and catching exceptions performs a lot of messy manipulations with the data on the stack. During exception unwinding, you have to traverse the stack (a single-linked list) during which 1) a check is made at each level for whether the current stack frame has an exception handler, and if not 2) call destructors for local objects. Stack traversal is implemented by the runtime and even possibly relies on the OS' data structures (in case of Windows) and *might* use also thread-local storage (e.g., exception_ptr).


Switching the stack aborts this process without informing the run-time exception machinery and.. what happens when you resume it? What if another exceptions has been thrown and handled on the same thread before resuming a suspended exception handler? A handler which is written with the assumption that only a single active invocation can exist on any particular thread?


There are other concerns too: on Windows you can get asynchronous callbacks from the kernel (akin to unix signals, just nicer) and if you switch the stack when called from the kernel, you can really hose your process.

Incompatibility between std, boost and ScriptError exceptions?

I've found a library which wraps the coroutine exceptions from their different types.

https://github.com/niekbouman/commelec-api/blob/master/commelec-api/coroutine-exception.hpp#L35

The implementation is safe and good to use - since I've now learned that coroutines only support
Boost exceptions and nothing else.

Compiler Optimization

The Visual Studio Compiler optimizes several bits in a way we as developers do not expect it.
Keep in mind, the underlaying code works 100% on Linux/Unix when compiled with GCC.

In terms of Boost, we need to disable certain flags already:

# Disable optimization for Boost::context
# https://www.boost.org/doc/libs/1_69_0/libs/context/doc/html/context/overview.html
# https://docs.microsoft.com/en-us/cpp/build/reference/gl-whole-program-optimization?view=vs-2017
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /bigobj /GL- /EHs")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /bigobj /GL- /EHs")

# detect if 32-bit target
if(CMAKE_VS_PLATFORM_NAME STREQUAL "Win32")
  # SAFESEH is not supported in Boost on Windows x86
  # maybe it is when Boost is compiled with it...
  # https://lists.boost.org/Archives/boost/2013/10/206720.php
  # https://docs.microsoft.com/en-us/cpp/build/reference/safeseh-image-has-safe-exception-handlers?view=vs-2017
  set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} /SAFESEH:NO")
endif()

These optimizations don't have much of influence here.

Instead, I've found some compiler optimizations going on to reduce the binary sizes.
This includes improving the exception handling and stack unwinding operations.

https://habr.com/en/company/microsoft/blog/442996/
https://developercommunity.visualstudio.com/content/problem/273518/incorrect-stack-unwinding-after-throwing-exception.html

https://www.boost.org/doc/libs/1_68_0/libs/context/doc/html/context/ff/implementations__fcontext_t__ucontext_t_and_winfiber.html

Because the TIB (thread information block on Windows) is not fully described in the MSDN, it might be possible that not all required TIB-parts are swapped. Using WinFiber implementation migh be an alternative.

ASIO IO Threads manipulating memory?

https://stackoverflow.com/questions/14981291/boostasio-internal-threads

On Windows, defining BOOST_ASIO_DISABLE_IOCP will cause Asio to use the select-based implementation, which is slower but should not spawn additional threads. 

Tested this, but it did not fix the crash. Won't apply the patch on Windows therefore.

Windows Stack

Well, turns out that the default allocated stack size for a coroutine is 64 KB. Whenever it has exceptions inside, this stack may be too low.

https://stackoverflow.com/questions/51110650/minimal-possible-stack-size-on-windows-when-using-c-exceptions-using-boost-co

Raising the default coroutine stack on Windows actually fixes the problem.

https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2019
https://devblogs.microsoft.com/cppblog/making-cpp-exception-handling-smaller-x64/

Final Analysis

Statement: The crash only happens when an error is raised.

Whenever valid requests are made, everything is fine.

Statement: When something doesn't work, it throws an exception.

Boost Beast http::async_read_header can throw an exception. We can re-build this into error codes. Doesn't change.
The debug console triggers a ScriptError raised from the config compiler.
Wrong header reading triggers std::invalid_argument()

Not all of them are boost compatible exceptions.

Statement: Whenever an exception happens, the coroutine returns where called and unwinds the stack.

This includes a stack deallocate() call which calls std::free(). This crashes with pointing to memory which is corrupted.

Statement: Throwing std::invalid_argument() on Windows corrupts the exception stack.

Yes. Rewriting EnsureValidHeaders() to not use these exceptions allows the code to complete.

No fucking idea what the Visual Studio compiler renders from this code bit. It may originate from static INLINE too.

Statement: ScriptError still causes the application to crash.

The stack size with 64 KB is too low. Why? Look into what's stored inside the exception. The whole DebugInfo struct is massively bloated.

It may also be the case that exceptions inside coroutines cause an overhead which is
timing related to many requests exceeding the limit.

Statement: Swallowing exceptions with catch(...) is evil.

Yes, Boost Coroutine doesn't work with the unwind exception.

Statement: Defer-Disconnect with exceptions swalled in the Destructor don't work.

Yes, these two patterns are incompatible within the new network stack. Avoid them at all cost.

Statement: It is unclear which coroutine stack size is needed for Windows and MSVC.

Tests have proven something between 8 and 64 MB. This still is too high, but mitigates the issue.
With all the error handling and code alignments we are even more safe, also in regard to Linux/Unix.
A windows agent typically doesn't expose much HTTP traffic (only scanners), JSON-RPC also is low (2 parent satellites).

Fixes

Ensure that Coroutine Stack unwinding is fully supported within the API.

  • HttpServerConnection: Prefer error codes over Boost exceptions #7477
  • JsonRpcConnection: Don't swallow exceptions in Boost.Coroutine #7483
  • Rewrite error handling in HttpServerConnection#EnsureValidHeaders() #7486
  • Avoid the Defer-Disconnect destructor pattern with Boost.Coroutines #7485
  • Re-throw boost::coroutines::detail::forced_unwind #7351

Ensure that HTTP Disconnects don't fire I/O.

  • API: Avoid I/O on shutdown #7487

Wrap thrown exceptions into Boost compatible exceptions in boost::asio::spawn().
Windows coroutine stack size: Increase stack size to prevent crashes with MSVC.

  • Introduce IoEngine::SpawnCoroutine wrapping asio::spawn and Boost exceptions #7491

Side Fixes

  • Quality: Replace deprecated Boost IO service code #7490
  • Dev Docs: Update Windows to Visual Studio 2019 #7480

References and URLs

Everything I have had opened and shed some light into finding the cause.

https://geidav.wordpress.com/2013/03/21/anatomy-of-dynamic-stack-allocations/
https://docs.microsoft.com/en-us/windows/uwp/cpp-and-winrt-apis/concurrency-2
https://www.boost.org/doc/libs/1_62_0/libs/fiber/doc/html/fiber/stack.html
https://stackoverflow.com/a/45027110/1821348
https://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio/overview/core/spawn.html
https://docs.microsoft.com/de-de/cpp/cpp/exception-specifications-throw-cpp?view=vs-2019
https://stackoverflow.com/questions/9286550/is-there-any-method-that-causes-whole-stack-frame-unwinding-in-c-except-usin
https://gist.github.com/chfast/bfc1e1af4176960b4722#file-context-cpp-L85
boostorg/beast#1055
https://stackoverflow.com/questions/41030285/boostproperty-treeread-xml-segfaults-in-an-asio-handler-spawned-using-boost
https://www.boost.org/doc/libs/1_66_0/libs/coroutine/doc/html/coroutine/attributes.html
boostorg/beast#1209
https://www.boost.org/doc/libs/1_68_0/libs/context/doc/html/context/stack.html
boostorg/coroutine2#19
http://www.mikemarcin.com/post/coroutine_a_million_stacks/
https://docs.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases#memory_limits
mxe/mxe#1559
http://www.blinnov.com/en/2013/04/11/english-boostcontext-and-seh/
http://boost.2283326.n4.nabble.com/context-Crash-in-case-of-exception-from-a-context-td4672953.html
https://news.ycombinator.com/item?id=11781201
https://rethinkdb.com/blog/improving-a-large-c-project-with-coroutines/
https://github.com/facebook/folly/tree/master/folly/fibers
https://stackoverflow.com/questions/46738872/weird-after-exit-error-on-coroutine-test-module-on-windows-x64
https://stackoverflow.com/questions/50429345/fast-fibers-coroutines-under-x64-windows
https://www.boost.org/doc/libs/1_65_1/libs/fiber/doc/html/fiber/performance.html
https://paoloseverini.wordpress.com/tag/fibers/
https://news.ycombinator.com/item?id=11781201
https://stackoverflow.com/questions/49036542/visual-c-2017-bug-compiler-optimizes-away-expression?noredirect=1#comment85079616_49036542
https://docs.microsoft.com/de-de/cpp/build/reference/f-set-stack-size?view=vs-2019
https://developercommunity.visualstudio.com/content/problem/345025/heap-corruption-in-stack-unwinding-when-inlining-f.html
https://www-user.tu-chemnitz.de/~heha/viewchm.php/hs/Win32SEH.chm/

Libraries

https://github.com/owt5008137/libcopp
https://github.com/yyzybb537/libgo

Google Searches

Newest first.

6.9.2019 

boost coroutine std::exception problem
boost thread default stack size windows
boost fibers coroutine stack size
windows c++ disable exception hierarchy in the stack
BOOST_ASIO_DISABLE_THREADS
boost coroutine exceptions windows
windows export linked libraries of binary
boost coroutine windows visual studio std::invalid_argument exceptions stack corruption site:stackoverflow.com
boost coroutine windows visual studio std::invalid_argument exceptions stack corruption
windows visual studio std::invalid_argument exceptions stack corruption
boost coroutine stack with exception templates windows
boost coroutine stack with exception templates
boost coroutine STATIC_POOL_SIZE
BOOST_PROPERTY_TREE_RAPIDXML_STATIC_POOL_SIZE
visual studio threadsanitizer
windows visual studio use clang++ cmake
boost context windows visual studio stack corruption
boost context windows stack corruption
windows visual studio stackful coroutines
detached coroutine boost asio
boost asio socket cancel io strand
boost asio spawn avoid exceptions windows visual studio
boost asio spawn avoid exceptions windows
coroutine do not unwind stack windows misaligned
coroutine do not unwind stack windows
boost coroutines stack unwind option
boost windows fibers stack size performance
windows visual studio optimizer couroutines stack size
windows visual studio compiler stack guards optimization
windows visual studio compiler stack optimization
cmake windows disable stack guard
fno stack protector
windows exception stack guards
windows allocate stack guards
boost::context::default_stacksize
boost::context::default_stacksize windows
boost::coroutines::attributes
boost::asio::spawn default coroutines stack size
boost coroutine exceptions windows
boost beast exception coroutine windows stack
boost asio coroutine exception stack windows
boost asio coroutine exception stack
boost asio io strand spawn
boost asio io strand spawn windows exception stack broken
visual studio optimization boost asio spawn
visual studio c++ lambda context optimized unused
visual studio lambda context optimized unused
windows crash in register_onexit_function
error C2491: 'boost::unit_test::unit_test_main': definition of dllimport function not allowed
boost::uuids BCryptOpenAlgorithmProvider
windows boost Boost_USE_STATIC_LIBS
windows boost context static lib crash
Og compiler flag
visual studio disable seh exceptions boost asio
visual studio disable seh exceptions
boost asio spawn exception memory violation windows
visual studio c++ exception types
visual studio exception types
visual studio set exception handling boost context asio
boost coroutine exceptions windows
boost coroutines basic_standard_stack_allocator deallocate memory violation
CMAKE_CXX_FLAGS cmake windows wrong cache
cmake visual studio compiler flags
cmake visual studio compile /GL-
windows msvc turn off global program optimization (/GL)
boost context stack unwound windows
visual studio boost context /Og
boost context increase stack size windows
windows c++ set stack size
windows exception execution_context_v2 in Boost Context
boost binaries context safeseh
disable compiler optimization visual studio 2017 c++
boost context visual studio 2017 crash
cmake windows override /EHsc
basic_standard_stack_allocator boost windows crash
compile boost context windows
visual studio 2017 memory profiler
windows c++ print memory usage from self
windows boost asio context out of memory
boost asio spawn shared_from_this
C++ exception: boost::wrapexcept<boost::system::system_error> at memory location
visual studio exception thrown at memory location
endless http::async_read_header
boost beast async_read_header utf8 problem
 Microsoft C++ exception: boost::system::system_error at memory location
boost asio error code
boost::asio::error::operation_aborted
BOOST_ASIO_DISABLE_IOCP
windows boost asio bad_address
windows bad_address EFAULT
boost::system::error_code bad_address windows
boost::system::error_code access violation windows
boost beast win_iocp
boost beast http::async_read_header ec null pointer
boost beast http::async_read exception
boost asio http::async_read exception windows invalid memory access
Microsoft C++ exception: boost::system::system_error at memory location

28.8.2019

lowest_layer().cancel()
Der E/A-Vorgang wurde wegen eines Threadendes oder einer Anwendungsanforderung abgebrochen
boost http::async_read_header ucrtbase crash windows
windows analyse wer dump file visual studio 2017
reset nessus You've now scanned over 16 different IP addresses over time, and Nessus will not let you scan any additi... 

Other Answers:

Both Stefan and Philipp are doing a marvelous job here - they provided me with a full self-written PDF on how to install and configure Nessus Essentials to trigger the full scan.

Previously I tested with a macOS install, but with a wrong subnet Nessus locked me out of the trial (yeah, enterprise software which always wants you to buy something). Now I am using a CentOS 7 Vagrant box.

I can see that Icinga stops running when such a scan happens. This is good news, so I can dig deeper.

For security reasons, I am not sharing any details about Nessus and its configuration nor the exact logs for triggering things until I have found and implemented a fix.

More Issues: