PWN复建之路(一):基础汇编与函数调用栈
And just let time go on.
Above All
之前有学长和我说读了高三一年技术全忘光了,我开玩笑“倒退两年多吧。”他瞪大眼睛,“倒退两年多就是高一中期,我要是技术水平能回到那个时候我就谢天谢地了。”当时我觉得学长在整活。现在读了半年准高三,想要把CTF技术捡起来,发现无论是整体把握还是细节处理,我都没什么记得的东西了。因此在博客里把我的PWN复建历程记录一下,顺便督促自己。
Basic Assembly
Docs / Tools / Websites
godbolt.org
A Complier that can complie c code to assembly code with the selected version of gcc.
rappel
A tools can evaluate assembly code inline & in time.
But even in the latest version of the program, it still can’t handle the stack function .
https://github.com/yrp604/rappel
Intel Assembly Manual
Intel汇编最权威的官方手册,Intel® 64 and IA-32 Architectures Software Developer’s Manual Instruction Set Reference, A-Z
gcc
1 | gcc -nostdlib -static -S tmp.S -o tmp |
AT&T vs. Intel (Assembly Syntax)
七月份去腾讯星火少年计划之前写的。
Use
GNU汇编器(Gas)和许多其他GNU工具(如gcc和gdb)都使用AT&T语法。
使用Intel语法的汇编器包括微软汇编器(MASM)、Borland的Turbo汇编器(TASM)和Netwide汇编器(NASM)
Most translated from this article.
The Basic Format
The structure of a program in AT&T-syntax is similar to any other assembler-syntax, consisting of a series of directives, labels, instructions - composed of a mnemonic followed by a maximum of three operands. The most prominent difference in the AT&T-syntax stems from the ordering of the operands. 使用AT&T语法的程序架构与其他汇编语法相似,由一系列的伪指令,标签和指令(由一个助记符和最大三个操作数组成)组成。AT&T语法中最突出的区别来自于操作数的顺序。
For example, the general format of a basic data movement instruction in INTEL-syntax is,
举个例子,INTEL语法中基本数据移动指令一般的格式是这样的,
1 | mnemonic destination, source |
whereas, in the case of AT&T, the general format is
然而,在AT&T的情况下,一般的格式是这样的,
1 | mnemonic source, destination |
Registers
All register names of the IA-32 architecture must be prefixed by a ‘%’ sign, eg. %al,%bx, %ds, %cr0 etc.IA-32架构中的所有寄存器名字都必须以’%’符号为前缀。
1 | mov %ax, %bx |
The above example is the mov instruction that moves the value from the 16-bit register AX to 16-bit register BX.上面的例子是一条mov指令-它将16位寄存器AX中的价值移动向了另一个16位寄存器BX。
Literal Values
All literal values must be prefixed by a ‘$’ sign. For example,
所以直接数都必须以’$’符号为前缀。
1 |
|
The first instruction moves the the value 100 into the register AX and the second one moves the numerical value of the ascii A into the AL register. To make things clearer, note that the below example is not a valid instruction,
第一条指令将值100移动到了寄存器AX中,第二条将ASCII’A’的数值移动到了AL寄存器中。让这个概念更清晰,注意到下面的指令并不是一条合法指令。
1 | mov %bx, $100 |
as it just tries to move the value in register bx to a literal value. It just doesn’t make any sense.
因为它试图将寄存器中的值移动到一个直接数中,这没有任何意义。
Memory Addressing
In the AT&T Syntax, memory is referenced in the following way,
AT&T语法中,内存必须以下面的方式得到引用。
1 | segment-override:signed-offset(base,index,scale) |
parts of which can be omitted depending on the address you want.
部分可以省略,这取决于你想要的地址。
1 | %es:100(%eax,%ebx,2) |
Please note that the offsets and the scale should not be prefixed by ‘$’. A few more examples with their equivalent NASM-syntax, should make things clearer,
请注意offset和scale必须以’$’为前缀。再举几个与NASM语法相对应的例子,应该会使事情更清楚。
1 | GAS memory operand NASM memory operand |
1 | mov %ax, 100 |
The first instruction moves the value in register AX into offset 100 of the data segment register (by default), and the second one moves the value in eax register to [eax-100].第一条指令将寄存器AX中的值移动到了数据段寄存器中的offset100处。第二条指令将eax寄存器中的值移动到了[eax - 100]处。
Operand Sizes
At times, especially when moving literal values to memory, it becomes neccessary to specify the size-of-transfer or the operand-size. For example the instruction,
有时,特别是在将直接数移到内存中时,有必要指定移动的的大小或操作数的大小。例如,
1 | mov $10, 100 |
only specfies that the value 10 is to be moved to the memory offset 100, but not the transfer size. In NASM this is done by adding the casting keyword byte/word/dword etc. to any of the operands. In AT&T syntax, this is done by adding a suffix - b/w/l - to the instruction. For example,
只规定了数值10要被移到内存偏移量100,但没有规定传输大小。在NASM中,这可以通过在任何操作数上添加关键字byte/word/dword等来完成。在AT&T语法中,这是通过在指令中添加一个后缀(b/w/l)来完成的。比如说。
1 | movb $10, %es:(%eax) |
moves a byte value 10 to the memory location [ea:eax], whereas,
将一个字节的数据10移动到内存位置[ea:ax],而
1 | movl $10, %es:(%eax) |
moves a long value (dword) 10 to the same place.
将一个long(dword)的数据10移动到同样的位置。
A few more examples,
几个更多的例子。
1 | movl $100, %ebx |
Control Transfer Instructions
The jmp, call, ret, etc., instructions transfer the control from one part of a program to another. They can be classified as control transfers to the same code segment (near) or to different code segments (far). The possible types of branch addressing are - relative offset (label), register, memory operand, and segment-offset pointers.
jmp , call , ret等指令会将程序的控制权从程序的一个部分转移到另一个。它们可以被分类为同一代码段转移(near)或不同代码段间转移(far)。分支寻址的可能类型有相对偏移(标签)、寄存器、内存操作数和段偏移指针。
Relative offsets, are specified using labels, as shown below.
相对偏移量,用标签来指定,如下所示。
1 | label1: |
Branch addressing using registers or memory operands must be prefixed by a ‘*’. To specify a “far” control tranfers, a ‘l’ must be prefixed, as in ‘ljmp’, ‘lcall’, etc. For example,
使用寄存器或内存操作数的分支寻址必须以’*’为前缀。要指定一个’far’的控制权转换,必须在前缀中加上 “l”,如 “ljmp”、”lcall “等。例如,
1 | GAS syntax NASM syntax |
Segment-offset pointers are specified using the following format:
段偏移指针需要用下面的格式指定。
1 | jmp $segment, $offset |
For example:
比如:
1 | jmp $0x10, $0x100000 |
Basis
Registers
most of this and the follow parts are translated from Guide to x86-64[1] and x86 Assembly Guide[2]。
64-bit | 32-bit | 16-bit | 8-bit | Comment |
---|---|---|---|---|
rax | eax | ax | al | reg with return value |
rbx | ebx | bx | bl | Caller-owned |
rcx | ecx | cx | cl | 4th argument |
rdx | edx | dx | dl | 3rd argument |
rsi | esi | si | sil | 2nd argument |
rdi | edi | di | dil | 1st argument |
rbp | ebp | bp | bpl | Caller-owned |
rsp | esp | sp | spl | Stack pointer |
r8 | r8d | r8w | r8b | 5th argument |
r9 | r9d | r9w | r9b | 6th argument |
r10 | r10d | r10w | r10b | Callee-owned |
r11 | r11d | r11w | r11b | Callee-owned |
r12 | r12d | r12w | r12b | Caller-owned |
r13 | r13d | r13w | r13b | Caller-owned |
r14 | r14d | r14w | r14b | Caller-owned |
r15 | r15d | r15w | r15b | Caller-owned |
AT&T格式前缀见前。
Data Declaretion
大部分编译器都会支持伪指令.DATA
来进入数据声明,这和高级语言中的变量声明很像,但是它们遵循一些更底层的设计原则,比如声明相邻的变量也会在内存中地址相邻,可以用内存访问的形式进行访问等。
1 |
|
在汇编这种低级语言中不存在数组,因此我们只能通过列出值得方法声明数组。这时我们就可以用DUP
伪指令来声明多个相同的值。比如4 DUP(2)
就相当于2, 2, 2, 2
。
1 | Z DD 1, 2, 3 ; Declare three 4-byte values, initialized to 1, 2, and 3. The value of location Z + 8 will be 3. |
Prefix in Intel
就像在AT&T格式中用b
,w
,l
的后缀来分别表示移动一个byte
,word
, long / dword
来表达移动大小,Intel格式中也会在操作数前面加上一些前缀来表示移动的大小。
1 | mov BYTE PTR [ebx], 2 ; 将 2 移动到存储在 EBX 中的地址处的单个字节中。 |
Instructions
For the full manual of Intel x64_86 Assembly , read the manual .
Data Movement Instructions
We can use rappel to evaluate assembly command inline .
mov
1 | mov eax, ebx — copy the value in ebx into eax |
push
1 | push eax — push eax on the stack |
pop
1 | pop edi — pop the top element of the stack into EDI. |
lea
[What’s the difference between lea and mov in assembly?](What’s the difference between lea and mov in assembly? (linuxquestions.org)
1 | lea edi, [ebx+4*esi] — the quantity EBX+4*ESI is placed in EDI. |
Arithmetic and Logic Instructions
1 | add src, dst # dst += src |
Control Flow Instructions
Labels
一般在正常的汇编程序中插入形式如label:
来标定位置的伪代码叫做标签。
1 | .global _start |
在这个例子中,_start
和dest
其实都是标签,但是_start
是一种非常特殊的标签,它提示了程序开始的内存位置。[3]
jmp
About the jmp label (short , near & far):
Short jumps (and near calls) are jumps whose target is in the same module (i.e. they are intramodular, however it is possible to get intermodular variants from certain hacks). They are most commonly up to 127 bytes of relative displacement (they change the flow of execution forward or backward from the address of the instruction), however there are 16bit variants offering 32k bytes. [4]
Which means generally we can just use jmp , more details at JMP - x86 Instruction Set Reference.
jmp if …
1 | jmp target # unconditional jump |
Quick Lookup Sheet
Practice
pwn.college assembly refresher
Function Call Stack
Let’s look at a typical linux program with compiled assembly code.
1 | .LC0: |
References
- 1.Guide to x86-64 ↩
- 2.x86 Assembly Guide ↩
- 3._start is a label which is equivalent to the memory location of the first instruction in the program. CDOT Wiki ↩
- 4.What does short jump mean in assembly language? - StackOverFlow ↩