When C++ meets SAL annotations

SAL annotations introduced by Microsoft is an interesting attempt to improve security and reliability of C/C++ code. They fill the gap between human-readable documentation and documentation that compiler is able to understand. Understanding actually means that compiler can apply static analysis and check code against common mistakes like null pointer dereferences, buffer overflows, etc. To be honest, compiler is also able to analyze code without SAL – but accuracy of such analysis is questionable at best. SAL annotations allows compiler to make analysis more accurate without tons of false positives. I had recently an occasion to check myself how code analysis works in VS2012 for real world C code. It seem that it performs pretty good – for example it was able to detect deadlock in similar (more complicated) code:

BOOL func()
{
	EnterCriticalSection(&cs);
	if(var)
	{
		return FALSE;
	}
	LeaveCriticalSection(&cs);
	return TRUE;
}

Code analysis tool knows that EnterCriticalSection acquires lock so it will give me warning: C26115: Failing to release lock ‘cs’ in function ‘func’.

But this is oldschool C coding style. Now we have 21st century and we are using C++11 in brave new world of information technology. Let’s see how VS2012 code analysis works with new C++ standard.

Deadlock detection
In first test I will modify previous code to use std::mutex instead of CRITICAL_SECTION. Pretty innocent change.

BOOL func()
{
	mtx.lock();
	if(var)
	{
		return false;
	}
	mtx.unlock();
	return true;
}

After running code analysis, VS2012 is not able to detect any flaws. It seems that compiler does not understand C++ locking primitives, even if it uses WinApi under the hood. Fortunately std::mutex implementation performs runtime checks that can detect mistake. But trading compilation error for runtime error is not good deal (especially in production code).

NULL dereference
Another simple example is NULL pointer dereference. VS2012 code analysis is able to check this kind of bug even without SAL annotations (if pointer is local variable):

	char* p = 0;
	if(var) p = (char*)malloc(10);
	strcpy_s(p, 10, "hello");

As expected, compiler shows warning: C6387: ‘p’ could be ‘0’.
This code is of course in poor taste provided that we are coding in C++11. I will use std::unique_ptr instead of raw pointers, which will help to avoid memory leaks:

	std::unique_ptr<char[]>  p = 0;
	if(var) p.reset(new (std::nothrow) char[10]);
	strcpy_s(p.get(), 10, "hello");

After running code analysis, again no flaws detected. std::unique_ptr does not use SAL annotations so compiler cannot determine if p.get() can return NULL. To avoid false positives this situation is just ignored.
But what if get() method would be declared with _Maybenull_ annotation? Well, in this case you will get false positives all the time, even if p will be correct non-null pointer (yeah I’ve checked it). So this is definitely not a solution.

New overload
It seems that VS2012 is at least able to deal correctly with new operator:

  • dereference of memory allocated by new (std::nothrows) will raise warning
  • dereference of memory allocated by default new will pass

But what if we have to use overloaded new that does not throw? I know this is not very good idea to overload new in this way (it will cause problem with STL containers), but unfortunately, sometimes in legacy code it is a must. So let’s check how overloaded new will be treated:

_Maybenull_ void* operator new(size_t size)
{
	return malloc(size);
}

int _tmain(int argc, _TCHAR* argv[])
{
	char* p = (char*)new char;
	*p = 1;
	return 0;
}

Guess what? No warning at all. Visual Studio code analysis assumes that new always returns correct pointer. It totally ignores overloaded new definition.

C++ Paradox
Some wise computer scientist said that all problems in computer science can be solved by another level of indirection. Unfortunately in case of static code analysis this is not true – another level of indirection introduced by C++ abstractions (like smart pointers, portable mutexes, etc.) makes static analysis much harder. There is paradox in here: you use C++ standard library to make code more safe, only to realize that now your code analysis tools are useless. Maybe in some distant future Visual Studio will be able to verify C++ code more or less correctly. But currently all what you can do is to choose lesser evil: resign from automatic code analysis or resign from C++ standard library classes.

Windbg logging

Lets assume we have application and we would like to print logs from it’s execution (like arguments and return code of functions). One way to do it is to hard-code prints into source and recompile it (in Polish language we call it “dupa-debugging”, which pretty nicely evaluates this approach). In Java there is a elegant way to do it – use Aspect Oriented Programming to inject tracing without messing with the existing code (or JVMTI if you would like to do it in runtime). But what if our application is written in native language like C that does not support AOP paradigm or maybe we have no source code, just plain binary? There are several possible solutions like using LD_PRELOAD on Linux or DLL injection on Windows to dynamically replace existing functions. Other solution is to use debugger that will automatically print useful information for us.

For example lets trace all executions of malloc and free (this can be useful e.g. for memory leak debugging). I will use Windbg as example but it can be done in gdb as well. To trace execution of function we will set breakpoint with action that will be executed during breakpoint hit. The simplest version of command is:

bp MSVCR80D!free ".printf \"free\\n\";g"

Command tells to debugger to print “free” after breakpoint  and let code execute further (g command).  It is not really useful because it does not print addresses of memory that will be freed. So lets fix it:

bp MSVCR80D!free ".printf \"%08x free\\n\", dwo(esp+4);g"

Expression is little bit more complicated so we will decompose it:
.printf \”%08x free\\n\” – prints “X free” where X will be first argument passed to function
dwo(esp+4) – computes first argument passed to function. dwo is an expression which returns double word from specified address. During breakpoint we are at the point after return address was pushed on stack but before full stack frame was created.  It means that first argument lies at esp+4 memory address (in case of x86 cdecl calling convention – which is our case).

Now we will print addresses of allocated memory returned by malloc. We will need to set breakpoint at the end of function when code jumps to return address, then we will print value of eax which is used to pass return value in x86 cdecl convention. Return address is on the stack and can be retrieved in Windbg by reading pseudo register $ra:

bp MSVCR80D!malloc "bp /1 @$ra \".printf \\\"%08x malloc\\\\n\\\",eax;g\";g"

In command we are setting two breakpoints: first will be executed every time malloc is called. Action for this breakpoint is to set another one-time breakpoint (/1 switch) – at return address on stack. At second breakpoint we will print returned value – which is address of allocated memory. Why second breakpoint? This is because at first breakpoint, value of pseudo register $ra is updated by Windbg to correct value which points at the return address of current function. Before execution of malloc we don’t know what return address will be.

0:000:x86> bp MSVCR80D!free ".printf \"%08x free\\n\", dwo(esp+4);g"
0:000:x86> bp MSVCR80D!malloc "bp /1 @$ra \".printf \\\"%08x malloc\\\\n\\\",eax;g\";g"
0:000:x86> g
...
008d3f80 malloc
008d1190 malloc
008d11d8 malloc
008d1220 malloc
008d1268 malloc
008d3f80 free
008d1190 free
008d11d8 free
008d1220 free
ntdll!NtTerminateProcess+0xa:
00000000`777ef97a c3 ret

As you can see there were called 5 malloc’s and 4 free’s so memory at address 008d1220 leaked. To trace source of leakage you can modify Windbg commands to print additionally stack trace during malloc:

bp MSVCR80D!malloc "bp /1 @$ra \".printf \\\"%08x malloc\\\\n\\\",eax;k;g\";g"

After analyzing Windbg log I can see that leak occurred in function leaky!main+0x40:

0018ff30 00411a16 leaky!main+0x40 [c:\users\lastsector\documents\visual studio 2005\projects\leaky\leaky\leaky.c @ 13]
0018ff80 0041185d leaky!__tmainCRTStartup+0x1a6 [f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 597]
0018ff88 755d3677 leaky!mainCRTStartup+0xd [f:\sp\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 414]
0018ff94 779b9f42 kernel32!BaseThreadInitThunk+0xe
0018ffd4 779b9f15 ntdll32!__RtlUserThreadStart+0x70
0018ffec 00000000 ntdll32!_RtlUserThreadStart+0x1b

Code for debugged application is presented on listing:

#include <stdlib.h>

int main(int argc, char* argv[])
{
 void* arr[5];
 int i;

for(i = 0; i < _countof(arr); ++i)
 arr[i] = malloc(10);

for(i = 0; i < _countof(arr)-1; ++i)
 free(arr[i]);

return 0;
}

If you would like to do the same but in x64 bits you just need to follow x64 calling convention (e.g. address returned by malloc will be in rax register and address passed to free will be in rcx register).