Compiled C Lab
This post explores some of the gcc compiler options and how they affect the assembly code that gcc creates.
Files Used
Further in this post I will be referencing some simple files, here is the original c source://hello.c #include <stdio.h> int main(){ printf("Hello Everybody\n"); }
//hello2.c #include <stdio.h> int main(){ printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10); }
//hello3.c #include <stdio.h> void output(){ printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10); } int main(){ output(); }
Objdump
I will be using these three commands with objdump in order to analyze the gcc output file.-
objdump -f [filename]
This shows you header information, such as the architecture it was compiled for, file name and more. -
objdump -s [filename]
This shows you detailed section information. -
objdump --source [filename]
This disassembles your source code and shows you it, alongside the assembly instructions generated.
Compiling and analyzing
We will be compiling the source code with a number of different options and analyzing it with the objdump options shown above.First compile: gcc hello.c -g -O3 -fno-builtin -o origin
Using objdump -f gives us header information. This information changes minimally within one computer, but as soon as you start moving to different computers and different architectures it will change a lot. This is the output:origin: file format elf64-x86-64 architecture: i386:x86-64, flags 0x00000112: EXEC_P, HAS_SYMS, D_PAGED start address 0x0000000000400400
Contents of section .rodata: 4005d8 01000200 48656c6c 6f204576 65727962 ....Hello Everyb 4005e8 6f64790a 00 ody..
000000000040050c <main>: #include <stdio.h> int main(){ 40050c: 55 push %rbp 40050d: 48 89 e5 mov %rsp,%rbp printf("Hello Everybody\n"); 400510: bf dc 05 40 00 mov $0x4005dc,%edi 400515: b8 00 00 00 00 mov $0x0,%eax 40051a: e8 c1 fe ff ff callq 4003e0 <printf@plt> } 40051f: 5d pop %rbp 400520: c3 retq 400521: 90 nop 400522: 90 nop 400523: 90 nop 400524: 90 nop
Second compile: gcc hello.c -g -O0 -fno-builtin -static -o step1
Adding -static.Overall the section size and the source is much larger than the original because static adds all the linked code from various libraries in C.
As you can see below, after outputting the objdump results to files, the --source file is about 700 times larger and the section information is about 200 times larger than the original.
9399 Feb 25 12:11 origin-src.txt 10865 Feb 25 12:11 origin-s.txt 142 Feb 25 12:14 step1-f.txt 7103611 Feb 25 12:15 step1-src.txt 2723639 Feb 25 12:14 step1-s.txt
Third compile: gcc hello.c -g -O0 -o step2
Removing -fno-builtin.000000000040050c: #include int main(){ 40050c: 55 push %rbp 40050d: 48 89 e5 mov %rsp,%rbp printf("Hello Everybody\n"); 400510: bf cc 05 40 00 mov $0x4005cc,%edi 400515: e8 c6 fe ff ff callq 4003e0 } 40051a: 5d pop %rbp 40051b: c3 retq
Fourth compile: gcc hello.c -O0 -fno-builtin -o step3
Removing -gContents of section .got.plt: Contents of section .got.plt: 6008c8 e0066000 00000000 00000000 00000000 ..`............. 6008c8 e0066000 00000000 00000000 00000000 ..`............. 6008d8 00000000 00000000 e6034000 00000000 ..........@..... 6008d8 00000000 00000000 e6034000 00000000 ..........@..... 6008e8 f6034000 00000000 ..@..... 6008e8 f6034000 00000000 ..@..... Contents of section .data: Contents of section .data: 6008f0 00000000 00000000 00000000 00000000 ................ 6008f0 00000000 00000000 00000000 00000000 ........ ........ Contents of section .comment: Contents of section .comment: 0000 4743433a 20284465 6269616e 20342e37 GCC: (Debian 4.7 0000 4743433a 20284465 6269616e 20342e37 GCC: (Debian 4.7 0010 2e322d35 2920342e 372e3200 4743433a .2-5) 4.7.2.GCC: 0010 2e322d35 2920342e 372e3200 4743433a .2-5) 4.7.2.GCC: 0020 20284465 6269616e 20342e34 2e372d33 (Debian 4.4.7-3 0020 20284465 6269616e 20342e34 2e372d33 (Debian 4.4.7-3 0030 2920342e 342e3700 ) 4.4.7. 0030 2920342e 342e3700 ) 4.4.7. Contents of section .debug_aranges: > 0000 2c000000 02000000 00000800 00000000 ,............... > 0010 0c054000 00000000 15000000 00000000 ..@............. > 0020 00000000 00000000 00000000 00000000 ................ > Contents of section .debug_info: > 0000 91000000 02000000 00000801 53000000 ............S... > 0010 01690000 00120000 000c0540 00000000 .i.........@.... > 0020 00210540 00000000 00000000 00020807 .!.@............ > 0030 00000000 02010871 00000002 02074000 .......q......@. > 0040 00000204 07050000 00020106 73000000 ............s... > 0050 0202055f 00000003 0405696e 74000208 ..._......int... > 0060 057f0000 00020807 88000000 0201067a ...............z > 0070 00000004 01910000 00010357 0000000c ...........W.... > 0080 05400000 00000021 05400000 00000000 .@.....!.@...... > 0090 00000001 00 ..... > Contents of section .debug_abbrev: > 0000 01110125 0e130b03 0e1b0e11 01120110 ...%............ > 0010 06000002 24000b0b 3e0b030e 00000324 ....$...>......$ >
Fifth compile: gcc hello2.c -g -O0 -fno-builtin -o step4
Changing C file to hello2.c which has additional arguments in printf.000000000040050c: #include int main(){ 40050c: 55 push %rbp 40050d: 48 89 e5 mov %rsp,%rbp 400510: 48 83 ec 30 sub $0x30,%rsp printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10); 400514: c7 44 24 20 0a 00 00 movl $0xa,0x20(%rsp) 40051b: 00 40051c: c7 44 24 18 09 00 00 movl $0x9,0x18(%rsp) 400523: 00 400524: c7 44 24 10 08 00 00 movl $0x8,0x10(%rsp) 40052b: 00 40052c: c7 44 24 08 07 00 00 movl $0x7,0x8(%rsp) 400533: 00 400534: c7 04 24 06 00 00 00 movl $0x6,(%rsp) 40053b: 41 b9 05 00 00 00 mov $0x5,%r9d 400541: 41 b8 04 00 00 00 mov $0x4,%r8d 400547: b9 03 00 00 00 mov $0x3,%ecx 40054c: ba 02 00 00 00 mov $0x2,%edx 400551: be 01 00 00 00 mov $0x1,%esi 400556: bf 20 06 40 00 mov $0x400620,%edi 40055b: b8 00 00 00 00 mov $0x0,%eax 400560: e8 7b fe ff ff callq 4003e0 } 400565: c9 leaveq 400566: c3 retq 400567: 90 nop
movl $0x7,0x8(%rsp)
Sixth compile: gcc hello3.c -g -O0 -fno-builtin -o step5
We are now using hello3.c which moves the printf to another function named output. The code is still mostly similar but as expected it moves all the instructions related to printf outside main into a new function.000000000040050c <output>: #include <stdio.h> void output(){ 40050c: 55 push %rbp 40050d: 48 89 e5 mov %rsp,%rbp 400510: 48 83 ec 30 sub $0x30,%rsp printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10); 400514: c7 44 24 20 0a 00 00 movl $0xa,0x20(%rsp) 40051b: 00 40051c: c7 44 24 18 09 00 00 movl $0x9,0x18(%rsp) 400523: 00 400524: c7 44 24 10 08 00 00 movl $0x8,0x10(%rsp) 40052b: 00 40052c: c7 44 24 08 07 00 00 movl $0x7,0x8(%rsp) 400533: 00 400534: c7 04 24 06 00 00 00 movl $0x6,(%rsp) 40053b: 41 b9 05 00 00 00 mov $0x5,%r9d 400541: 41 b8 04 00 00 00 mov $0x4,%r8d 400547: b9 03 00 00 00 mov $0x3,%ecx 40054c: ba 02 00 00 00 mov $0x2,%edx 400551: be 01 00 00 00 mov $0x1,%esi 400556: bf 30 06 40 00 mov $0x400630,%edi 40055b: b8 00 00 00 00 mov $0x0,%eax 400560: e8 7b fe ff ff callq 4003e0 <printf@plt> } 400565: c9 leaveq 400566: c3 retq 0000000000400567 <main>: int main(){ 400567: 55 push %rbp 400568: 48 89 e5 mov %rsp,%rbp output(); 40056b: b8 00 00 00 00 mov $0x0,%eax 400570: e8 97 ff ff ff callq 40050c <output> } 400575: 5d pop %rbp 400576: c3 retq 400577: 90 nop
Seventh compile: gcc hello3.c -g -O3 -fno-builtin -o step6
We are changing the optimization level from 0 to 3, this will change our code quite a lot and add optimizations.0000000000400520 <output>: #include <stdio.h> void output(){ 400520: 48 83 ec 38 sub $0x38,%rsp printf("Hello Everybody , %d, %d, %d, %d, %d, %d, %d, %d, %d, %d \n",1,2,3,4,5,6,7,8,9,10); 400524: 41 b9 05 00 00 00 mov $0x5,%r9d 40052a: 41 b8 04 00 00 00 mov $0x4,%r8d 400530: c7 44 24 20 0a 00 00 movl $0xa,0x20(%rsp) 400537: 00 400538: c7 44 24 18 09 00 00 movl $0x9,0x18(%rsp) 40053f: 00 400540: b9 03 00 00 00 mov $0x3,%ecx 400545: c7 44 24 10 08 00 00 movl $0x8,0x10(%rsp) 40054c: 00 40054d: c7 44 24 08 07 00 00 movl $0x7,0x8(%rsp) 400554: 00 400555: ba 02 00 00 00 mov $0x2,%edx 40055a: c7 04 24 06 00 00 00 movl $0x6,(%rsp) 400561: be 01 00 00 00 mov $0x1,%esi 400566: bf 30 06 40 00 mov $0x400630,%edi 40056b: 31 c0 xor %eax,%eax 40056d: e8 6e fe ff ff callq 4003e0 <printf@plt> } 400572: 48 83 c4 38 add $0x38,%rsp 400576: c3 retq
xor %eax,%eax ;O3 mov $0x0,%eax ; O0
Note: when you xor any value to itself the result is 0.
Conclusions
The complexity of all the compiler options is massive and this post only scratches the surface of all the options available. This provides some insight into the amount of assembly code generated even in the most basic C programs. It gives you an idea how complex assembly can get especially if the C code was larger, it also allows you to have a glimpse behind the scenes of C.
Thanks for reading.