My code works, I don’t know why.

國王的耳朵是驢耳朵

Linux中誰來呼叫C語言中的main?

| Comments

記得很久以前聽說在Linux執行檔案時,真正的起始點並不是main,加上之前有看到單純ld會幫你偷偷link一些沒看過的object檔案,所以這次就來看到底真相為何?

測試環境

因為很假掰想要順便接觸一下ARM的組語,所以這次測試就使用Qemu跑ARM的Debian。

1
2
3
4
5
6
7
8
9
$ lsb_release -a
No LSB modules are available.
Distributor ID:   Debian
Description:  Debian GNU/Linux 8.0 (jessie)
Release:  8.0
Codename: jessie

$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=571db48d9c9e4625b7da206e748e41c237f2b202, stripped

測試原始碼,一樣是大家熟悉的Hellow world

hello1.c
1
2
3
4
5
6
7
8
#include <stdio.h>

int main()
{
  printf("Hello World\n");

  return 0;
}

不知道各位還記得前面有提過,執行檔中有.text的section。要執行的機械碼會放在這邊。我們先來看看hello1執行檔會從那邊開始?

hello1.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ readelf -h hello1
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x102f0
  Start of program headers:          52 (bytes into file)
...
Section header string table index: 33

從readelf可以看到起始點為0x102f0,那麼0x102f0是在那邊呢?我們再去看symbol table可以看到很巧的就是.text的起始點。

hello1.c
1
2
3
4
5
6
7
8
$ objdump -t hello1

hello1:     file format elf32-littlearm

SYMBOL TABLE:
00010134 l    d  .interp  00000000              .interp
...
000102f0 l    d  .text    00000000              .text

好了,那麼.text這邊起始的程式是什麼?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Disassembly of section .text:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

很有趣,沒看到main(),反而看到_start。到底是_start是什麼呢?還記得Linker script嗎?裏面有一個ENTRY指令,可以指定程式從那邊開始跑,先來看一下預設的ENTRY是不是也是_start?

1
2
$ ld --verbose | grep ENTRY
ENTRY(_start)

目前我們只知道執行檔起始點是_start,而不是main,那顯然有人幫你把執行檔加碼,以至於你的執行檔出現了_start。最偷懶的方式就是去找binary看看是不是有這樣的symbol。

1
2
3
4
5
6
7
user@host:/usr/lib$ find -name "*.[ao]" -exec nm -A {} \;  2> /dev/null | grep " _start$"
./arm-linux-gnueabi/crt1.o:00000000 T _start
./arm-linux-gnueabi/gcrt1.o:00000000 T _start
./arm-linux-gnueabi/Scrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/crt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/gcrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/Scrt1.o:00000000 T _start

OK,的確有object檔案裡面有_start,我們再來確認編譯的時候會不會link這些檔案。

1
2
3
4
5
6
7
8
9
10
$ gcc -v hello1.c
Using built-in specs.
COLLECT_GCC=gcc
...
COLLECT_GCC_OPTIONS='-v' '-march=armv4t' '-mfloat-abi=soft'
...
-X --hash-style=gnu -m armelf_linux_eabi
...
/usr/lib/gcc/arm-linux-gnueabi/4.9/../../../arm-linux-gnueabi/crt1.o
...

_start會呼叫外部函數__libc_start_main,我們透過LD_DEBUG來看一下。

1
2
3
4
$ LD_DEBUG=all ./hello1 2>&1 |grep __libc_start_main
       890:  symbol=__libc_start_main;  lookup in file=./hello1 [0]
       890:  symbol=__libc_start_main;  lookup in file=/lib/arm-linux-gnueabi/libc.so.6 [0]
       890:  binding file ./hello1 [0] to /lib/arm-linux-gnueabi/libc.so.6 [0]: normal symbol `__libc_start_main' [GLIBC_2.4]

可以看到,在./hello1中有去找__libc_start_main,最後去libc.so.6找,並且找出libc.so.6__libc_start_main的位址(即binding)。而__libc_start_mainprototype如下

1
int __libc_start_main(int (*main) (int, char **, char **), int argc, char ** ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (*stack_end));

看到有趣的東西嘛?我有看到

  • main函數當作function pointer傳入
  • main函數的參數
  • 其他不知道三小的function pointer
    • init
    • fini
    • rtld_fini

從這邊我可以猜測這個函數就是呼叫一堆callback function,這些callback function就是上面列的死人骨頭。

手冊的說明可以看到__libc_start_main()是用來執行環境的初始化、呼叫main函數並且傳遞參數,當main函數結束後處理回傳值。手冊提到的範例詳細行為有

  • 檢查權限,確保安全性
  • thread subsystem初始化 (我可不知道什麼thread subsystem唷)
  • rtld_fini註冊release callback function,當shared object結束時使用該callback釋放資源
  • 呼叫init callback function
  • 呼叫main callback function並且帶入參數
  • 當main callback function結束後,將回傳值作為參數呼叫exit

我們再回頭看看_start的組合語言:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

有趣的地方是這3個位址

1
2
3
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

這邊可以看到這3個位址分別是

  • 10320: 000104b4 .word 0x000104b4
    • __libc_csu_fini
  • 10324: 00010420 .word 0x00010420
    • main
  • 10328: 00010448 .word 0x00010448
    • __libc_csu_init

也就是說,main__libc_csu_init分別當作第一和第四參數傳給__libc_start_main,而__libc_csu_fini則被丟到stack,一樣傳給__libc_start_main了。

結論

Linux執行程式的起始點並不是main,而是glibc binary中crt1.o準備的_start。這個start主要將你的main,還有一些hook函數丟給__libc_start_main,接下來libc的__libc_start_main樵好事情後才真正執行你的main,並且還要在main結束後清理戰場。

延伸閱讀

參考資料

完整反組譯程式碼

Hello.dis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
$ cat hello1.dis

hello1:     file format elf32-littlearm


Disassembly of section .init:

0001029c <_init>:
   1029c:    e92d4008    push    {r3, lr}
   102a0:    eb000021    bl  1032c <call_weak_fn>
   102a4:    e8bd4008    pop {r3, lr}
   102a8:    e12fff1e    bx  lr

Disassembly of section .plt:

000102ac <puts@plt-0x14>:
   102ac:    e52de004    push    {lr}      ; (str lr, [sp, #-4]!)
   102b0:    e59fe004    ldr lr, [pc, #4] ; 102bc <_init+0x20>
   102b4:    e08fe00e    add lr, pc, lr
   102b8:    e5bef008    ldr pc, [lr, #8]!
   102bc:    00010318   .word   0x00010318

000102c0 <puts@plt>:
   102c0:    e28fc600    add ip, pc, #0, 12
   102c4:    e28cca10    add ip, ip, #16, 20   ; 0x10000
   102c8:    e5bcf318    ldr pc, [ip, #792]!  ; 0x318

000102cc <__libc_start_main@plt>:
   102cc:    e28fc600    add ip, pc, #0, 12
   102d0:    e28cca10    add ip, ip, #16, 20   ; 0x10000
   102d4:    e5bcf310    ldr pc, [ip, #784]!  ; 0x310

000102d8 <__gmon_start__@plt>:
   102d8:    e28fc600    add ip, pc, #0, 12
   102dc:    e28cca10    add ip, ip, #16, 20   ; 0x10000
   102e0:    e5bcf308    ldr pc, [ip, #776]!  ; 0x308

000102e4 <abort@plt>:
   102e4:    e28fc600    add ip, pc, #0, 12
   102e8:    e28cca10    add ip, ip, #16, 20   ; 0x10000
   102ec:    e5bcf300    ldr pc, [ip, #768]!  ; 0x300

Disassembly of section .text:

000102f0 <_start>:
   102f0:    e3a0b000    mov fp, #0
   102f4:    e3a0e000    mov lr, #0
   102f8:    e49d1004    pop {r1}      ; (ldr r1, [sp], #4)
   102fc:    e1a0200d    mov r2, sp
   10300:    e52d2004    push    {r2}      ; (str r2, [sp, #-4]!)
   10304:    e52d0004    push    {r0}      ; (str r0, [sp, #-4]!)
   10308:    e59fc010    ldr ip, [pc, #16]    ; 10320 <_start+0x30>
   1030c:    e52dc004    push    {ip}      ; (str ip, [sp, #-4]!)
   10310:    e59f000c    ldr r0, [pc, #12]    ; 10324 <_start+0x34>
   10314:    e59f300c    ldr r3, [pc, #12]    ; 10328 <_start+0x38>
   10318:    ebffffeb    bl  102cc <__libc_start_main@plt>
   1031c:    ebfffff0    bl  102e4 <abort@plt>
   10320:    000104b4   .word   0x000104b4
   10324:    00010420   .word   0x00010420
   10328:    00010448   .word   0x00010448

0001032c <call_weak_fn>:
   1032c:    e59f3014    ldr r3, [pc, #20]    ; 10348 <call_weak_fn+0x1c>
   10330:    e59f2014    ldr r2, [pc, #20]    ; 1034c <call_weak_fn+0x20>
   10334:    e08f3003    add r3, pc, r3
   10338:    e7932002    ldr r2, [r3, r2]
   1033c:    e3520000    cmp r2, #0
   10340:    012fff1e   bxeq    lr
   10344:    eaffffe3    b   102d8 <__gmon_start__@plt>
   10348:    00010298   .word   0x00010298
   1034c:    0000001c   .word   0x0000001c

00010350 <deregister_tm_clones>:
   10350:    e59f301c    ldr r3, [pc, #28]    ; 10374 <deregister_tm_clones+0x24>
   10354:    e59f001c    ldr r0, [pc, #28]    ; 10378 <deregister_tm_clones+0x28>
   10358:    e0603003    rsb r3, r0, r3
   1035c:    e3530006    cmp r3, #6
   10360:    912fff1e   bxls    lr
   10364:    e59f3010    ldr r3, [pc, #16]    ; 1037c <deregister_tm_clones+0x2c>
   10368:    e3530000    cmp r3, #0
   1036c:    012fff1e   bxeq    lr
   10370:    e12fff13    bx  r3
   10374:    000205ff   .word   0x000205ff
   10378:    000205fc   .word   0x000205fc
   1037c:    00000000   .word   0x00000000

00010380 <register_tm_clones>:
   10380:    e59f1024    ldr r1, [pc, #36]    ; 103ac <register_tm_clones+0x2c>
   10384:    e59f0024    ldr r0, [pc, #36]    ; 103b0 <register_tm_clones+0x30>
   10388:    e0601001    rsb r1, r0, r1
   1038c:    e1a01141    asr r1, r1, #2
   10390:    e0811fa1    add r1, r1, r1, lsr #31
   10394:    e1b010c1    asrs    r1, r1, #1
   10398:    012fff1e   bxeq    lr
   1039c:    e59f3010    ldr r3, [pc, #16]    ; 103b4 <register_tm_clones+0x34>
   103a0:    e3530000    cmp r3, #0
   103a4:    012fff1e   bxeq    lr
   103a8:    e12fff13    bx  r3
   103ac:    000205fc   .word   0x000205fc
   103b0:    000205fc   .word   0x000205fc
   103b4:    00000000   .word   0x00000000

000103b8 <__do_global_dtors_aux>:
   103b8:    e92d4010    push    {r4, lr}
   103bc:    e59f401c    ldr r4, [pc, #28]    ; 103e0 <__do_global_dtors_aux+0x28>
   103c0:    e5d43000    ldrb    r3, [r4]
   103c4:    e3530000    cmp r3, #0
   103c8:    1a000002   bne 103d8 <__do_global_dtors_aux+0x20>
   103cc:    ebffffdf    bl  10350 <deregister_tm_clones>
   103d0:    e3a03001    mov r3, #1
   103d4:    e5c43000    strb    r3, [r4]
   103d8:    e8bd4010    pop {r4, lr}
   103dc:    e12fff1e    bx  lr
   103e0:    000205fc   .word   0x000205fc

000103e4 <frame_dummy>:
   103e4:    e92d4008    push    {r3, lr}
   103e8:    e59f0028    ldr r0, [pc, #40]    ; 10418 <frame_dummy+0x34>
   103ec:    e5903000    ldr r3, [r0]
   103f0:    e3530000    cmp r3, #0
   103f4:    1a000001   bne 10400 <frame_dummy+0x1c>
   103f8:    e8bd4008    pop {r3, lr}
   103fc:    eaffffdf    b   10380 <register_tm_clones>
   10400:    e59f3014    ldr r3, [pc, #20]    ; 1041c <frame_dummy+0x38>
   10404:    e3530000    cmp r3, #0
   10408:    0afffffa   beq 103f8 <frame_dummy+0x14>
   1040c:    e1a0e00f    mov lr, pc
   10410:    e12fff13    bx  r3
   10414:    eafffff7    b   103f8 <frame_dummy+0x14>
   10418:    000204e8   .word   0x000204e8
   1041c:    00000000   .word   0x00000000

00010420 <main>:
   10420:    e92d4800    push    {fp, lr}
   10424:    e28db004    add fp, sp, #4
   10428:    e59f0014    ldr r0, [pc, #20]    ; 10444 <main+0x24>
   1042c:    ebffffa3    bl  102c0 <puts@plt>
   10430:    e3a03000    mov r3, #0
   10434:    e1a00003    mov r0, r3
   10438:    e24bd004    sub sp, fp, #4
   1043c:    e8bd4800    pop {fp, lr}
   10440:    e12fff1e    bx  lr
   10444:    000104c8   .word   0x000104c8

00010448 <__libc_csu_init>:
   10448:    e92d43f8    push    {r3, r4, r5, r6, r7, r8, r9, lr}
   1044c:    e59f6058    ldr r6, [pc, #88]    ; 104ac <__libc_csu_init+0x64>
   10450:    e59f5058    ldr r5, [pc, #88]    ; 104b0 <__libc_csu_init+0x68>
   10454:    e08f6006    add r6, pc, r6
   10458:    e08f5005    add r5, pc, r5
   1045c:    e0656006    rsb r6, r5, r6
   10460:    e1a07000    mov r7, r0
   10464:    e1a08001    mov r8, r1
   10468:    e1a09002    mov r9, r2
   1046c:    ebffff8a    bl  1029c <_init>
   10470:    e1b06146    asrs    r6, r6, #2
   10474:    0a00000a   beq 104a4 <__libc_csu_init+0x5c>
   10478:    e2455004    sub r5, r5, #4
   1047c:    e3a04000    mov r4, #0
   10480:    e2844001    add r4, r4, #1
   10484:    e5b53004    ldr r3, [r5, #4]!
   10488:    e1a00007    mov r0, r7
   1048c:    e1a01008    mov r1, r8
   10490:    e1a02009    mov r2, r9
   10494:    e1a0e00f    mov lr, pc
   10498:    e12fff13    bx  r3
   1049c:    e1540006    cmp r4, r6
   104a0:    1afffff6   bne 10480 <__libc_csu_init+0x38>
   104a4:    e8bd43f8    pop {r3, r4, r5, r6, r7, r8, r9, lr}
   104a8:    e12fff1e    bx  lr
   104ac:    00010088   .word   0x00010088
   104b0:    00010080   .word   0x00010080

000104b4 <__libc_csu_fini>:
   104b4:    e12fff1e    bx  lr

Disassembly of section .fini:

000104b8 <_fini>:
   104b8:    e92d4008    push    {r3, lr}
   104bc:    e8bd4008    pop {r3, lr}
   104c0:    e12fff1e    bx  lr

Comments