Buffer overflows using NT 3.1

05 Aug 2023

Recently I attempted to use NT 3.1 as a development system, performing editing, compiling and debugging on an NT 3.1 host. This revealed some bugs in the software I was writing, but also uncovered some new (to me) bugs in the operating system itself.

For context, Yori has a debug mode memory allocator that places all allocations to end at a page boundary and marks the next page as no access. This causes any buffer overflows to immediately crash. Running on NT 3.1 exercised that path unexpectedly.

First bug: GetTempPathW

The following is a short snippet of code from GetTempPathW:

77884691   call      778a68e0                      ; Call underlying function which does most of the work
77884696   mov       ecx,eax                       ; save the length of the string, in bytes
77884698   shr       ecx,1                         ; right shift to convert bytes to characters
7788469a   or        eax,eax                       ; Check if zero
7788469c   jz        778846eb                      ; If zero, leave
7788469e   cmp       eax,esi                       ; Check if returned size is within buffer size
778846a0   jnb       778846eb                      ; If not, leave
778846a2   lea       edx,dword ptr [edi+ecx*2]     ; Get a pointer to the buffer after the end of the string
778846a5   cmp       word ptr [edx-02],005c        ; Check for trailing backslash
778846aa   jz        778846e0                      ; If it's there, leave
778846ac   lea       ecx,dword ptr [eax+02]        ; Calculate number of bytes with an additional WCHAR
778846af   cmp       esi,ecx                       ; Check if it fits in buffer
778846b1   jbe       778846d0                      ; If not, leave
778846b3   mov       word ptr [edx],005c           ; Write trailing backslash to end of buffer
778846b8   mov       word ptr [edi+eax*2+02],0000  ; Write NULL to buffer start plus BYTE count
                                                   ; multiplied by two, plus two

The final instruction clearly intended to write to the character index but instead wrote to the byte index, thereby writing the NULL terminator well off the end of the buffer. Once the bug is known the instructions above look rather silly, because the "correct" final instruction would have been a relatively simple:

mov word ptr [edx+02], 0000

...which is smaller and simpler than the contortion the compiler needed to generate the bug. I worked around this by supplying an excessively long buffer to GetTempPath, and manually NULL terminating the result.

Obviously this bug is potentially serious in that it will stomp a NULL over random memory. But one mitigating factor is that the ANSI functions, such as GetTempPathA, end up using a scratch buffer for Unicode strings and converting the result back into the caller's buffer. For any program using the ANSI function where the temporary path was less than 128 characters - which would almost always be the case - this NULL is written into the scratch buffer with no ill-effects.

Second bug: All of user32?

The second piece of code seems to be common to a lot of string manipulation in User32:

76a0a595   lea       edx,dword ptr [00000005+eax*2] ; Convert characters to bytes and add 5 (huh?)
76a0a59c   and       edx,fffffffc                   ; Strip any trailing bytes to form a 32 bit multiple
76a0a59f   mov       edi,dword ptr [esp+0c]
76a0a5a3   lea       eax,dword ptr [edx+edi]
76a0a5a6   cmp       dword ptr [esp+10],eax
76a0a5aa   jb        76a0a5c6
76a0a5ac   cmp       edi,eax
76a0a5ae   jnb       76a0a5c6
76a0a5b0   mov       ecx,edx                        ; Load the number of bytes to copy into ecx
76a0a5b2   shr       ecx,02                         ; Shift right so it's the number of 32 bit copy operations
76a0a5b5   rep movsd                                ; Perform 32 bit copies
76a0a5b7   mov       ecx,edx                        ; Load the number of bytes to copy
76a0a5b9   and       ecx,00000003                   ; Remove anything that was 32 bit aligned above
76a0a5bc   rep movsb                                ; Copy the remaining bytes

This is a bit odd because the code at the bottom of the function clearly intends to round down to the number of 32 bit aligned values, copy those, then perform a byte-by-byte copy of the remainder. But the code at the top is adding 5 presumably to round up to the next whole 32 bit value. With a debug allocator, this code routinely copies one WCHAR off the end of the buffer and crashes. I "fixed" this by ensuring there's always one more WCHAR than required. Fortunately Yori doesn't use User32 very much, since it's primarily a set of command line tools.

Both of these bugs seemed a little odd because with a debug memory allocator they're trivial to find. But on the other hand, the reason I use a debug memory allocator is because of hard-learned experience that they find bugs, and that experience came into existence after seeing this class of bug in early Windows development. A lot of Windows code was written before learning this lesson.