PWN复建之路(一):基础汇编与函数调用栈

本文共3010字,阅读完需要约15分钟。
版权声明: 知识共享-版权归属-相同方式共享 3.0 授权协议 | CC BY-SA 3.0 CN
展开

And just let time go on.

Above All

之前有学长和我说读了高三一年技术全忘光了,我开玩笑“倒退两年多吧。”他瞪大眼睛,“倒退两年多就是高一中期,我要是技术水平能回到那个时候我就谢天谢地了。”当时我觉得学长在整活。现在读了半年准高三,想要把CTF技术捡起来,发现无论是整体把握还是细节处理,我都没什么记得的东西了。因此在博客里把我的PWN复建历程记录一下,顺便督促自己。

Basic Assembly

Docs / Tools / Websites

godbolt.org

A Complier that can complie c code to assembly code with the selected version of gcc.

https://godbolt.org/

rappel

A tools can evaluate assembly code inline & in time.

But even in the latest version of the program, it still can’t handle the stack function .

https://github.com/yrp604/rappel

Intel Assembly Manual

Intel汇编最权威的官方手册,Intel® 64 and IA-32 Architectures Software Developer’s Manual Instruction Set Reference, A-Z

Online Edition

gcc

1
gcc -nostdlib -static -S tmp.S -o tmp

AT&T vs. Intel (Assembly Syntax)

七月份去腾讯星火少年计划之前写的。

Use

GNU汇编器(Gas)和许多其他GNU工具(如gcc和gdb)都使用AT&T语法。

使用Intel语法的汇编器包括微软汇编器(MASM)、Borland的Turbo汇编器(TASM)和Netwide汇编器(NASM)

Most translated from this article.

The Basic Format

The structure of a program in AT&T-syntax is similar to any other assembler-syntax, consisting of a series of directives, labels, instructions - composed of a mnemonic followed by a maximum of three operands. The most prominent difference in the AT&T-syntax stems from the ordering of the operands. 使用AT&T语法的程序架构与其他汇编语法相似,由一系列的伪指令,标签和指令(由一个助记符和最大三个操作数组成)组成。AT&T语法中最突出的区别来自于操作数的顺序。

For example, the general format of a basic data movement instruction in INTEL-syntax is,

举个例子,INTEL语法中基本数据移动指令一般的格式是这样的,

1
mnemonic	destination, source

whereas, in the case of AT&T, the general format is

然而,在AT&T的情况下,一般的格式是这样的,

1
mnemonic	source, destination

Registers

All register names of the IA-32 architecture must be prefixed by a ‘%’ sign, eg. %al,%bx, %ds, %cr0 etc.IA-32架构中的所有寄存器名字都必须以’%’符号为前缀。

1
mov	%ax, %bx

The above example is the mov instruction that moves the value from the 16-bit register AX to 16-bit register BX.上面的例子是一条mov指令-它将16位寄存器AX中的价值移动向了另一个16位寄存器BX。

Literal Values

All literal values must be prefixed by a ‘$’ sign. For example,

所以直接数都必须以’$’符号为前缀。

1
2
3

mov $100, %bx
mov $A, %al

The first instruction moves the the value 100 into the register AX and the second one moves the numerical value of the ascii A into the AL register. To make things clearer, note that the below example is not a valid instruction,

第一条指令将值100移动到了寄存器AX中,第二条将ASCII’A’的数值移动到了AL寄存器中。让这个概念更清晰,注意到下面的指令并不是一条合法指令。

1
mov	%bx,	$100

as it just tries to move the value in register bx to a literal value. It just doesn’t make any sense.

因为它试图将寄存器中的值移动到一个直接数中,这没有任何意义。

Memory Addressing

In the AT&T Syntax, memory is referenced in the following way,

AT&T语法中,内存必须以下面的方式得到引用。

1
segment-override:signed-offset(base,index,scale)

parts of which can be omitted depending on the address you want.

部分可以省略,这取决于你想要的地址。

1
%es:100(%eax,%ebx,2)

Please note that the offsets and the scale should not be prefixed by ‘$’. A few more examples with their equivalent NASM-syntax, should make things clearer,

请注意offset和scale必须以’$’为前缀。再举几个与NASM语法相对应的例子,应该会使事情更清楚。

1
2
3
4
5
6
7
8
9
10
11
GAS memory operand			NASM memory operand
------------------ -------------------

100 [100]
%es:100 [es:100]
(%eax) [eax]
(%eax,%ebx) [eax+ebx]
(%ecx,%ebx,2) [ecx+ebx*2]
(,%ebx,2) [ebx*2]
-10(%eax) [eax-10]
%ds:-10(%ebp) [ds:ebp-10]
1
2
3
mov	%ax,	100
mov %eax, -100(%eax)

The first instruction moves the value in register AX into offset 100 of the data segment register (by default), and the second one moves the value in eax register to [eax-100].第一条指令将寄存器AX中的值移动到了数据段寄存器中的offset100处。第二条指令将eax寄存器中的值移动到了[eax - 100]处。

Operand Sizes

At times, especially when moving literal values to memory, it becomes neccessary to specify the size-of-transfer or the operand-size. For example the instruction,

有时,特别是在将直接数移到内存中时,有必要指定移动的的大小或操作数的大小。例如,

1
mov	$10,	100

only specfies that the value 10 is to be moved to the memory offset 100, but not the transfer size. In NASM this is done by adding the casting keyword byte/word/dword etc. to any of the operands. In AT&T syntax, this is done by adding a suffix - b/w/l - to the instruction. For example,

只规定了数值10要被移到内存偏移量100,但没有规定传输大小。在NASM中,这可以通过在任何操作数上添加关键字byte/word/dword等来完成。在AT&T语法中,这是通过在指令中添加一个后缀(b/w/l)来完成的。比如说。

1
movb	$10,	%es:(%eax)

moves a byte value 10 to the memory location [ea:eax], whereas,

将一个字节的数据10移动到内存位置[ea:ax],而

1
movl	$10,	%es:(%eax)

moves a long value (dword) 10 to the same place.

将一个long(dword)的数据10移动到同样的位置。

A few more examples,

几个更多的例子。

1
2
3
movl	$100, %ebx
pushl %eax
popw %ax

Control Transfer Instructions

The jmp, call, ret, etc., instructions transfer the control from one part of a program to another. They can be classified as control transfers to the same code segment (near) or to different code segments (far). The possible types of branch addressing are - relative offset (label), register, memory operand, and segment-offset pointers.

jmp , call , ret等指令会将程序的控制权从程序的一个部分转移到另一个。它们可以被分类为同一代码段转移(near)或不同代码段间转移(far)。分支寻址的可能类型有相对偏移(标签)、寄存器、内存操作数和段偏移指针。

Relative offsets, are specified using labels, as shown below.

相对偏移量,用标签来指定,如下所示。

1
2
3
4
5
label1:
.
.
jmp label1

Branch addressing using registers or memory operands must be prefixed by a ‘*’. To specify a “far” control tranfers, a ‘l’ must be prefixed, as in ‘ljmp’, ‘lcall’, etc. For example,

使用寄存器或内存操作数的分支寻址必须以’*’为前缀。要指定一个’far’的控制权转换,必须在前缀中加上 “l”,如 “ljmp”、”lcall “等。例如,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GAS syntax			NASM syntax
========== ===========

jmp *100 jmp near [100]
call *100 call near [100]
jmp *%eax jmp near eax
jmp *%ecx call near ecx
jmp *(%eax) jmp near [eax]
call *(%ebx) call near [ebx]
ljmp *100 jmp far [100]
lcall *100 call far [100]
ljmp *(%eax) jmp far [eax]
lcall *(%ebx) call far [ebx]
ret retn
lret retf
lret $0x100 retf 0x100

Segment-offset pointers are specified using the following format:

段偏移指针需要用下面的格式指定。

1
jmp	$segment, $offset

For example:

比如:

1
jmp	$0x10, $0x100000

Basis

Registers

most of this and the follow parts are translated from Guide to x86-64[1] and x86 Assembly Guide[2]

64-bit32-bit16-bit8-bitComment
raxeaxaxalreg with return value
rbxebxbxblCaller-owned
rcxecxcxcl4th argument
rdxedxdxdl3rd argument
rsiesisisil2nd argument
rdiedididil1st argument
rbpebpbpbplCaller-owned
rspespspsplStack pointer
r8r8dr8wr8b5th argument
r9r9dr9wr9b6th argument
r10r10dr10wr10bCallee-owned
r11r11dr11wr11bCallee-owned
r12r12dr12wr12bCaller-owned
r13r13dr13wr13bCaller-owned
r14r14dr14wr14bCaller-owned
r15r15dr15wr15bCaller-owned

32bit Reg

32位下的寄存器

AT&T格式前缀见前。

Data Declaretion

大部分编译器都会支持伪指令.DATA来进入数据声明,这和高级语言中的变量声明很像,但是它们遵循一些更底层的设计原则,比如声明相邻的变量也会在内存中地址相邻,可以用内存访问的形式进行访问等。

1
2
3
4
5
6
7

.DATA
var DB 64 ; Declare a byte, referred to as location var, containing the value 64.
var2 DB ? ; Declare an uninitialized byte, referred to as location var2.
DB 10 ; Declare a byte with no label, containing the value 10. Its location is var2 + 1.
X DW ? ; Declare a 2-byte uninitialized value, referred to as location X.
Y DD 30000 ; Declare a 4-byte value, referred to as location Y, initialized to 30000.

在汇编这种低级语言中不存在数组,因此我们只能通过列出值得方法声明数组。这时我们就可以用DUP伪指令来声明多个相同的值。比如4 DUP(2)就相当于2, 2, 2, 2

1
2
3
4
Z	DD 1, 2, 3	; Declare three 4-byte values, initialized to 1, 2, and 3. The value of location Z + 8 will be 3.
bytes DB 10 DUP(?) ; Declare 10 uninitialized bytes starting at location bytes.
arr DD 100 DUP(0) ; Declare 100 4-byte words starting at location arr, all initialized to 0
str DB 'hello',0 ; Declare 6 bytes starting at the address str, initialized to the ASCII character values for hello and the null (0) byte.

Prefix in Intel

就像在AT&T格式中用bwl的后缀来分别表示移动一个bytewordlong / dword来表达移动大小,Intel格式中也会在操作数前面加上一些前缀来表示移动的大小。

1
2
3
mov BYTE PTR [ebx], 2	; 将 2 移动到存储在 EBX 中的地址处的单个字节中。
mov WORD PTR [ebx], 2 ; 将 2 的 16 位整数表示移动到从 EBX 中的地址开始的 2 个字节中。
mov DWORD PTR [ebx], 2 ; 将 2 的 32 位整数表示移动到从 EBX 中的地址开始的 4 个字节中。

Instructions

For the full manual of Intel x64_86 Assembly , read the manual .

Data Movement Instructions

We can use rappel to evaluate assembly command inline .

mov

1
2
mov eax, ebx — copy the value in ebx into eax
mov byte ptr [var], 5 — store the value 5 into the byte at location var

push

1
2
push eax — push eax on the stack
push [var] — push the 4 bytes at address var onto the stack

pop

1
2
pop edi — pop the top element of the stack into EDI.
pop [ebx] — pop the top element of the stack into memory at the four bytes starting at location EBX.

lea

[What’s the difference between lea and mov in assembly?](What’s the difference between lea and mov in assembly? (linuxquestions.org)

1
2
3
lea edi, [ebx+4*esi] — the quantity EBX+4*ESI is placed in EDI.  
lea eax, [var] — the value in _var_ is placed in EAX.
lea eax, [val] — the value _val_ is placed in EAX.

Arithmetic and Logic Instructions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
add src, dst       # dst += src
sub src, dst # dst -= src
imul src, dst # dst *= src
neg dst # dst = -dst (arithmetic inverse)

and src, dst # dst &= src
or src, dst # dst |= src
xor src, dst # dst ^= src
not dst # dst = ~dst (bitwise inverse)

shl count, dst # dst <<= count (left shift dst by count positions), synonym sal
sar count, dst # dst >>= count (arithmetic right shift dst by count positions)
shr count, dst # dst >>= count (logical right shift dst by count positions)

# some instructions have special-case variants with different number of operands
imul src # single operand imul assumes other operand in %rax
computes 128-bit result, stores high 64-bits in %rdx, low 64-bits in %rax
shl dst # dst <<= 1 (no count => assume 1, same for sar, shr, sal)

Control Flow Instructions

Labels

一般在正常的汇编程序中插入形式如label:来标定位置的伪代码叫做标签。

1
2
3
4
5
6
7
8
9
10
11
.global _start
_start:
intel_syntax noprefix
jmp short dest
.rept 0x51
nop
.endr
dest:
pop rdi
mov rax , 0x403000
jmp rax

在这个例子中,_startdest其实都是标签,但是_start是一种非常特殊的标签,它提示了程序开始的内存位置。[3]

jmp

About the jmp label (short , near & far):

Short jumps (and near calls) are jumps whose target is in the same module (i.e. they are intramodular, however it is possible to get intermodular variants from certain hacks). They are most commonly up to 127 bytes of relative displacement (they change the flow of execution forward or backward from the address of the instruction), however there are 16bit variants offering 32k bytes. [4]

Which means generally we can just use jmp , more details at JMP - x86 Instruction Set Reference.

jmp if …

1
2
3
4
5
6
7
8
9
10
11
12
13
jmp target    # unconditional jump
je target # jump equal, synonym jz jump zero (ZF=1)
jne target # jump not equal, synonym jnz jump non zero (ZF=0)
js target # jump signed (SF=1)
jns target # jump not signed (SF=0)
jg target # jump greater than, synonym jnle jump not less or equal (ZF=0 and SF=OF)
jge target # jump greater or equal, synonym jnl jump not less (SF=OF)
jl target # jump less than, synonym jnge jump not greater or equal (SF!=OF)
jle target # jump less or equal, synonym jng jump not greater (ZF=1 or SF!=OF)
ja target # jump above, synonym jnbe jump not below or equal (CF=0 and ZF=0)
jae target # jump above or equal (CF=0)
jb target # jump below, synonym jnae jump not above or equal (CF=1)
jbe target # jump below or equal (CF=1 or ZF=1)

Quick Lookup Sheet

CS107 x86-64 Reference Sheet

Practice

pwn.college assembly refresher

Function Call Stack

Let’s look at a typical linux program with compiled assembly code.

1
2
3
4
5
6
7
8
9
10
11
.LC0:
.string "Hello , world"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret

References


  1. 1.Guide to x86-64
  2. 2.x86 Assembly Guide
  3. 3._start is a label which is equivalent to the memory location of the first instruction in the program. CDOT Wiki
  4. 4.What does short jump mean in assembly language? - StackOverFlow