TALOS-2016-0036
Matroska libebml EbmlUnicodeString Heap Information Leak
January 28, 2016
Description
A specially crafted unicode string can cause an off-by-few read on the heap in unicode string parsing code in libebml. This issue can potentially be used for information leaks.
Tested Versions
libebml master branch
Product URLs
Details
An off-by-few read on heap occurs when parsing unicode strings in
EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8
. String is parsed
in a for loop but in case of a four byte character, no check is made
if the last bytes accessed fall outside the allocated buffer:
Technical information below:
Vulnerable code is located in EbmlUnicodeString.cpp:UTFstring::UpdateFromUTF8
:
```
for (j=0, i=0; i<UTF8string.length(); j++) {
uint8 lead = static_cast<uint8>(UTF8string[i]);
if (lead < 0x80) {
_Data[j] = lead;
i++;
} else if ((lead >> 5) == 0x6) {
_Data[j] = ((lead & 0x1F) << 6) + (UTF8string[i+1] & 0x3F);
i += 2;
} else if ((lead >> 4) == 0xe) {
_Data[j] = ((lead & 0x0F) << 12) + ((UTF8string[i+1] & 0x3F) << 6) + (UTF8string[i+2] & 0x3F);
i += 3;
} else if ((lead >> 3) == 0x1e) {
printf("i is now %d and the highest accessed byte is %d\n",i,i+3 );
_Data[j] = ((lead & 0x07) << 18) + ((UTF8string[i+1] & 0x3F) << 12) + ((UTF8string[i+2] & 0x3F) << 6) + (UTF8string[i+3] & 0x3F);
i += 4;
} else
// Invalid char?
break;
}
```
If the last byte in the string being parsed satisfies the
else if ((lead >> 3) == 0x1e)
condition, for example 0xf2, 3 bytes
past the end of the buffer will be read thereby causing a out of
bounds read on the heap.
Credit
Discovered by Richard Johnson and Aleksandar Nikolic of Cisco Talos.