基于动态插桩的堆漏洞触发位置检测

二进制插桩

二进制插桩是指在现有二进制程序中的任意位置插入新代码,并以某种方式来观察或修改二进制程序的行为。添加新代码的位置称为插桩点,添加的代码则称为插桩代码。插桩的方式主要有两种,分别为静态二进制插桩(SBI)和动态二进制插桩(DBI)。本文采用动态二进制插桩。常用的动态二进制插桩主要有 Intel Pin 和 DynamoRIO。本文使用的是 Intel Pin 工具。具体的插桩概念和原理不再本文进行赘述。

Intel Pin 插桩

本文所使用的版本为 3.27 98718。Intel Pin 具有较为完善用户手册。下载地址:Intel Pin Download;用户手册:Pin 3.27 98718 Pin Manual
为了实现本文所期望的对堆漏洞触发位置进行检测,本文主要对指令和堆操作进行了插桩。具体如下:

  1. 首先是对最细粒度即指令插桩,主要记录了指令的地址、具体指令和调用栈。在获取调用栈时,由于是在指令前进行了插桩,因此要跳过通过 rip 寄存器来间接寻址跳转的指令。
 1VOID RecordIns(VOID* address, const CONTEXT* ctxt)
 2{
 3    insCount += 1;
 4    if (disassembleCode[address].rfind("jmp qword ptr [rip+", 0) != 0) {
 5        void *buf[128];
 6        PIN_LockClient();
 7        PIN_Backtrace(ctxt, buf, sizeof(buf) / sizeof(buf[0]));
 8        *backtrace << insCount << "\t" << "[";
 9        for (int i=0; i<128; i++) {
10            if (buf[i] == 0) {
11                break;
12            }
13            if (i != 0) {
14                *backtrace << ",";
15            }
16            *backtrace << (VOID*) VoidStar2Addrint(buf[i]);
17        }
18        *backtrace << "]" << endl;
19        PIN_UnlockClient();
20    }
21    *trace << insCount << "\t" << address << ": " << disassembleCode[address] << endl;
22}
  1. 其次,由于目标是堆内存,所以要监测堆的申请和释放。本文对 malloccallocfree 三个函数进行了插桩,监测它们被调用时的参数以及返回值。
 1VOID RecordArg1(CHAR* name, ADDRINT arg)
 2{
 3    if (flag) {
 4        *heapTrace << "0x0" << endl;
 5    }
 6
 7    if (!strcmp(name, FREE)) {
 8        flag = false;
 9        freeCount++;
10        count++;
11        *heapTrace << insCount << "\t" << count << "\t" << name << " [" << (VOID*)arg << "] 0x0" << endl;
12    } else if (!strcmp(name, MALLOC)) {
13        flag = true;
14        mallocCount++;
15        count ++;
16        targetSize = arg;
17        *heapTrace << insCount << "\t" << count << "\t" << name << " [" << arg << "] ";
18    }
19}
20
21VOID RecordArg2(CHAR* name, ADDRINT arg1, ADDRINT arg2)
22{
23    if (flag) {
24        *heapTrace << "0x0" << endl;
25    }
26
27    flag = true;
28    count ++;
29    callocCount++;
30    targetSize = arg1 * arg2;
31    *heapTrace << insCount << "\t" << count << "\t" << name << " [" << arg1 << "," << arg2 << "] ";
32}

RecordArg1 用来记录 mallocfree 的参数,RecordArg2 用来记录 calloc 的参数。

 1VOID RecordRet(ADDRINT ret)
 2{
 3    flag = false;
 4    *heapTrace << (VOID*)ret << endl;
 5
 6    if (minAddress != 0) {
 7        if (ret < minAddress) {
 8            minAddress = ret;
 9        }
10    } else {
11        minAddress = ret;
12    }
13
14    if (maxAddress != 0) {
15        if (ret + targetSize > maxAddress) {
16            maxAddress = ret + targetSize;
17        }
18    } else {
19        maxAddress = ret + targetSize;
20    }
21}

RecordRet 用来记录 malloccalloc 的返回值。并且记录堆内存的最大和最小两个地址,用于之后减少保存的对内存读写的数据量。

  1. 最后,需要监测对内存读写的操作。为了减小数据量,本文只保存了对堆内存读写的指令。
 1VOID RecordMemRead(VOID* address, VOID* targetAddress, UINT32 size)
 2{
 3    if ((UINT64)targetAddress >= minAddress && (UINT64)targetAddress <= maxAddress) {
 4        readCount += 1;
 5        count ++;
 6        *memoryTrace << insCount << "\t" << count << "\t" << address << " R " << targetAddress << " " << size << endl;
 7    }
 8}
 9
10VOID RecordMemWrite(VOID* address, VOID* targetAddress, UINT32 size)
11{
12    if ((UINT64)targetAddress >= minAddress && (UINT64)targetAddress <= maxAddress) {
13        writeCount += 1;
14        count ++;
15        *memoryTrace << insCount << "\t" << count << "\t" << address << " W " << targetAddress << " " << size << endl;
16    }
17}

插桩数据解析

插桩数据解析同样分为三个部分,分别为指令解析、堆申请和释放解析以及内存操作解析。
指令解析包括两个文件:trace.log 和 backtrace.log。这两个分别为程序的执行迹和调用栈。解析后的数据以类的形式保存:

 1class Instruction(dict):
 2    def __init__(self, id: int, address: int, ins: str) -> None:
 3        super().__init__({
 4            "id": id,
 5            "address": address,
 6            "ins": ins
 7        })
 8    
 9    def __str__(self) -> str:
10        return self.__repr__()
11  
12    def __repr__(self) -> str:
13        id = self["id"]
14        address = self["address"]
15        ins = self["ins"]
16    
17        return f"{id}\t{hex(address)}: {ins}"
18
19class Trace(list):
20    def __init__(self, filename: str) -> None:
21        super().__init__([])
22        self._parseFromFile(filename)
23  
24    def _parseFromFile(self, filename: str) -> None:
25        with open(filename) as f:
26            line = f.readline()
27            while line:
28                line = line.strip()
29            
30                id = int(line.split("\t")[0])
31                address = int(line.split("\t")[1].split(": ")[0], 16)
32                ins = line.split("\t")[1].split(": ")[1]
33                self.append(Instruction(id, address, ins))
34            
35                line = f.readline()
36            
37    def __str__(self) -> str:
38        return self.__repr__()
39  
40    def __repr__(self) -> str:
41        trace = ""
42        for ins in self:
43            trace += str(ins) + "\n"
44        return trace.strip()
45class Backtrace(dict):
46    def __init__(self, filename: str) -> None:
47        super().__init__({})
48        self._parseFromFile(filename)
49    
50    def _parseFromFile(self, filename: str) -> None:
51        with open(filename) as f:
52            while True:
53                line = f.readline()
54                if not line:
55                    break
56                line = line.strip()
57            
58                id = int(line.split("\t")[0])
59                btAddress = eval(line.split("\t")[1])
60                self[id] = btAddress
61
62    def __str__(self) -> str:
63        return self.__repr__()
64  
65    def __repr__(self) -> str:
66        backtrace = ""
67        for key, value in self.items():
68            backtrace += str(key) + "\t" + str(value) + "\n"
69        return backtrace.strip()

堆申请和释放主要是处理 heapTrace.log 文件。其中会有干扰数据,还需要进行处理。最后结果也为一个类。

 1class HeapOp(dict):
 2    def __init__(self, id: int, opId: int, func: str, args: list, retAddress: int) -> None:
 3        super().__init__({
 4            "id": id,
 5            "opId": opId,
 6            "func": func,
 7            "args": args,
 8            "retAddress": retAddress
 9        })
10    
11    def __str__(self) -> str:
12        return self.__repr__()
13  
14    def __repr__(self) -> str:
15        id = self["id"]
16        opId = self["opId"]
17        func = self["func"]
18        args = self["args"]
19        retAddress = self["retAddress"]
20    
21        argsStr = ", ".join([hex(arg) for arg in args])
22    
23        return f"{id}\t{opId}\t{hex(retAddress)} = {func}({argsStr})"
24  
25class HeapTrace(list):
26    def __init__(self, filename: str) -> None:
27        super().__init__([])
28        self._parseFromFile(filename)
29
30    def _parseFromFile(self, filename: str) -> None:
31        with open(filename) as f:
32            while True:
33                line = f.readline()
34                if not line:
35                    break
36                line = line.strip()
37            
38                id = int(line.split("\t")[0])
39                # if id == 0 pass
40                if id == 0:
41                    continue
42                opId = int(line.split("\t")[1])
43                line = line.split("\t")[2].split(" ")
44                line[1] = eval(line[1])
45                line[2] = int(line[2], 16)
46                # if free(0) or 0x0 = malloc() pass
47                if line[0] == "free" and line[1] == [0]:
48                    continue
49                if len(self) > 0 and opId - self[-1]["opId"] <= 1 and self[-1]["retAddress"] == 0:
50                    self[-1] = HeapOp(id, opId, *line)
51                else:
52                    self.append(HeapOp(id, opId, *line))

最后是内存操作处理,主要处理的文件为 memoryTrace.log。

 1class MemoryOp(dict):
 2    def __init__(self, id: int, opId: int, address: int, op: str, targetAddress: int, size: int) -> None:
 3        super().__init__({
 4            "id": id,
 5            "opId": opId,
 6            "address": address,
 7            "op": op,
 8            "targetAddress": targetAddress,
 9            "size": size
10        })
11  
12    def __str__(self) -> str:
13        return self.__repr__()
14  
15    def __repr__(self) -> str:
16        id = self["id"]
17        opId = self["opId"]
18        address = self["address"]
19        op = self["op"]
20        targetAddress = self["targetAddress"]
21        size = self["size"]
22    
23        return f"{id}\t{opId}\t{hex(address)} {op} {hex(targetAddress)} {size}"
24  
25class MemoryTrace(list):
26    def __init__(self, filename: str) -> None:
27        super().__init__([])
28        self._parseFromFile(filename)
29
30    def _parseFromFile(self, filename: str) -> None:
31        with open(filename) as f:
32            while True:
33                line = f.readline()
34                if not line:
35                    break
36                line = line.strip()
37            
38                id = int(line.split("\t")[0])
39                if id == 0:
40                    continue
41                opId = int(line.split("\t")[1])
42                line = line.split("\t")[2].split(" ")
43                line[0] = int(line[0], 16)
44                line[2] = int(line[2], 16)
45                line[3] = int(line[3])
46                self.append(MemoryOp(id, opId, *line))
47
48    def __str__(self) -> str:
49        return self.__repr__()
50  
51    def __repr__(self) -> str:
52        memoryTrace = ""
53        for memOp in self:
54            memoryTrace += str(memOp) + "\n"
55        return memoryTrace.strip()

堆漏洞检测原理

本文堆漏洞检测的原理比较简单,通过堆申请释放和内存操作来恢复程序对堆块的操作。同时,由于拥有堆块的大小信息,所以可以检测对堆块的释放是否合规以及对堆块的读写是否超出了大小限制。目前可以检测的漏洞有二次释放漏洞(Double Free)、释放后重用(Use After Free)以及越界(Out Of Bound)。主要的实现如下:

 1def checkVulnerability(info: Info, trace: Trace, backtrace: Backtrace, allTrace: dict):  
 2    memoryInfo = {}
 3    heapInfo = {}
 4    lastId = 0
 5  
 6    for op in allTrace:
 7        if type(op) == HeapOp:
 8            lastId = op["id"]
 9            if op["func"] == "free":
10                address = op["args"][0]
11                if (address - 0x10) not in heapInfo:
12                    print(f"[-] {hex(address)} not malloc!")
13                    print(op)
14                    exit(0)
15                size = heapInfo[address - 0x10]
16                for i in range(size // 16):
17                    if memoryInfo[address - 0x10 + 0x10 * i] == [0] * 16:
18                        print(f"[!] {hex(address)} Double Free!")
19                        print(op)
20                        print("[+] backtrace:")
21                        getBacktraceStr(info, trace, backtrace, op)
22                        print()
23                        break
24                    memoryInfo[address - 0x10 + 0x10 * i] = [0] * 16
25            else:
26                if op["func"] == "malloc":
27                    size = op["args"][0]
28                elif op["func"] == "calloc":
29                    size = op["args"][0] * op["args"][1]
30
31                newSize = calculateSize(size)
32                if (size % 0x10) != 0 and (size % 0x10) <= 8:
33                    newSize += 8
34            
35                address = op["retAddress"]
36                heapInfo[address - 0x10] = newSize
37                for i in range(newSize // 16):
38                    memoryInfo[address - 0x10 + 0x10 * i] = [1] * 16
39                if newSize % 16 != 0:
40                    memory = memoryInfo.get(address - 0x10 + 0x10 * (newSize // 16), [0]*16)
41                    memory[:8] = [1] * 8
42                    memoryInfo[address - 0x10 + 0x10 * (newSize // 16)] = memory
43        elif type(op) == MemoryOp:
44            if op["id"] == lastId:
45                continue
46        
47            address = op["targetAddress"]
48            size = op["size"]
49            if (address - address % 16) not in memoryInfo:
50                continue
51        
52            if memoryInfo[address - address % 16] == [0] * 16:
53                print(f"[!] {hex(address)} Use After Free!")
54                print(op)
55                print("[+] backtrace:")
56                getBacktraceStr(info, trace, backtrace, op)
57                print()
58                continue
59        
60            for baseAddress in heapInfo:
61                targetSize = heapInfo[baseAddress]
62                if baseAddress <= address < baseAddress + targetSize:
63                    if (baseAddress + targetSize - address) < size:
64                        print(f"[!] {hex(address)} Out Of Bound!")
65                        print(hex(baseAddress), targetSize)
66                        print(f"offset: {address - baseAddress}")
67                        print(op)
68                        print("[+] backtrace:")
69                        getBacktraceStr(info, trace, backtrace, op)
70                        print()
71                    break

实验

本文针对 CVE-2018-18557 进行了实验。该漏洞是 libtiff 的一个堆溢出漏洞,漏洞的位置在 tif_jbig.c 中的 JBIGDecode 函数,该函数会调用 _TIFFmemcpy 来进行拷贝,但是没有对大小进行检查,从而导致了越界写漏洞。

Tracer

1pin -t ../Tracer/obj-intel64/Tracer.so -- ./poc ./image.tif

结果如下:

 1===============================================
 2This application is instrumented by Tracer
 3Info File: ./output/info.log
 4Trace File: ./output/trace.log
 5Backtrace File: ./output/backtrace.log
 6Memory Trace File: ./output/memoryTrace.log
 7Heap Trace File: ./output/heapTrace.log
 8===============================================
 9TIFFReadDirectoryCheckOrder: Warning, Invalid TIFF directory; tags are not sorted in ascending order.
10it will crash,because heap space has been overflow:
11
12free(): invalid next size (normal)
13
14===============================================
15Tracer analysis results: 
16FileName: <PATH>/tiff/poc
17Loaded address: 0x7f6341f49000
18Code address: 0x55b30079b000
19Number of instructions: 33269
20Number of read memory instructions: 381781
21Number of write memory instructions: 314752
22Number of malloc: 31
23Number of calloc: 0
24Number of free: 25
25===============================================
26[1]    147880 IOT instruction (core dumped)  pin -t ../../Tracer/obj-intel64/Tracer.so -- ./poc ./image.tif

Analysis

1python3 ../Tracer/script/Analysis.py -d ./output

结果:

 1[!] 0x5599f45bbe80 Out Of Bound!
 20x5599f45bbe40 2872
 3offset: 64
 433215	684343	0x7f93b29a0c4a W 0x5599f45bbe80 5991
 5[+] backtrace:
 60x000000000002222e: JBIGDecode 于 /root/tiff/tiff-4.0.9/libtiff/tif_jbig.c:102
 70x000000000001227d: TIFFReadEncodedStrip 于 /root/tiff/tiff-4.0.9/libtiff/tif_read.c:539
 80x00000000000037bd: main 于 <PATH>/tiff/poc.c:18 (discriminator 3)
 90x000029f9bfe90d90: ?? ??:0
100x000029f9bfe90e40: ?? ??:0
110x0000000000003655: _start 于 ??:?

最后脚本分析的结果与漏洞成因一致。

总结

通过该问题的研究,熟悉了动态插桩工具的使用以及插桩代码的编写,同时也能过对特定的漏洞进行检测。但是目前该工具还有不足:

  • 虚警概率过大,常常检测出不存在的漏洞;
  • 只能检测堆块的溢出,无法检测堆块内部字段的溢出;
  • 分析脚本的性能不足,代码过于简陋。

代码开源地址:https://github.com/Cirno9-dev/Tracer


标题:基于动态插桩的堆漏洞触发位置检测
作者:Cirno9dev
地址:https://blog-ljx.site/articles/2023/06/20/1687267924835.html