Tuesday 7 February 2017

Lab 4: Experimenting with Compiled C Optimizations

Today we are going to dive into how the compiler works, and test some different options available and see the results. The lab asks for a simple C program to be written, compiled and viewed in its ELF (Executable and Linkable Format) file.

We use the most basic program ever and create hello.c file:


#include <stdio.h>

int main() {
    printf("Hello World!\n");
}

We'll use the following compiler options as well:


-g               # enable debugging information
-O0              # do not optimize (that's a capital letter and then the digit zero)
-fno-builtin     # do not use builtin function optimizations

The line of code resulting is:

gcc -g -O0 -fno-builtin -o hello hello.c

The -o hello will create our object file 'hello'. Once that part is completed, we will create an objdump of the file and look into its contents. We have a few options we can include in our command:


-f          # display header information for the entire file
-s          # display per-section summary information
-d          # disassemble sections containing code
--source    # (implies -d) show source code, if available, along with disassembly


We'll skip the -d since --source includes that functionality, and a large list of metadata pops up. The lab asks us to answer the following questions:

Using objdump, find the answers to these questions: (i) Which section contains the code you wrote? (ii) Which section contains the string to be printed?

There are a lot of random (seemingly scambled data being presented) but for sure these stood out to me:

Contents of section .rodata:
400590 01000200 00000000 00000000 00000000  ................

4005a0 48656c6c 6f20576f 726c6421 0a00      Hello World!..



00000000004004f6 <main>:
#include <stdio.h>

int main() {
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5             mov    %rsp,%rbp

            printf("Hello World!\n");

  4004fa:      bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       b8 00 00 00 00         mov    $0x0,%eax
  400504:     e8 e7 fe ff ff             callq  4003f0 <printf@plt>
  400509:     b8 00 00 00 00         mov    $0x0,%eax
}
  40050e:     5d                           pop    %rbp

  40050f:      c3                           retq   

So, it appears that section .rodata (read only data) contains the string that is to be printed, while the <main> section includes the code that I wrote.

The next task is to learn to understand what the compiled code is doing. We are going to make some changes to the code and take a look at the results and some of their differences. We will probably focus mainly on how big the objdump of each variation is, the ultimate file size, and how the <main> looks in different cases.

FIRST VARIATION: Add the compiler option -static
Note and explain the change in size, section headers, and the function call.

gcc -g -O0 -fno-builtin -o hello_original hello.c
objdump -f -s --source hello_original > hello_original_display       (piped the result into a file of its own)

gcc -g -O0 -fno-builtin -static -o hello_static hello.c
objdump -f -s --source hello_static > hello_static_display

OKAY.. when I tried to display the hello_static_display file, it hand an endless stream of data.. that doesn't look right... Let me see if I did something wrong with the compilations. I deleted the hello_static files and started over again.

I made the adjustment to my code to see something:

gcc -static hello_static hello.c           (removed all compiler checks EXCEPT for -static)
The result was still an endless supply of metadata...

So now, I know that my code isn't wrong per say.. but static does something very big and probably not very useful... as I look back at it still streaming, I can start to see assembly codes like mov, nopw, jle, etc...

Let's check the file data:

[wawilliams@xerxes lab4]$ ls -l
total 12776
-rw-rw-r--. 1 wawilliams wawilliams       66 Feb  7 00:42 hello.c
-rwxrwxr-x. 1 wawilliams wawilliams    10984 Feb  7 01:08 hello_original
-rw-rw-r--. 1 wawilliams wawilliams    23762 Feb  7 01:10 hello_original_display
-rwxrwxr-x. 1 wawilliams wawilliams   917680 Feb  7 01:39 hello_static

-rw-rw-r--. 1 wawilliams wawilliams 12119908 Feb  7 01:39 hello_static_display

Wow!! That new file is HUGE!!.. about 510x bigger. Okay let's check the information on what -static actually does... the closest thing I could find was:

 -fkeep-static-functions

           Emit "static" functions into the object file, even if the function is never used.

It appears that any extra libraries in my code (and their codes) will be added into this file, even if they are never used. That certainly explains it!

The beginning of the hello_static_display has mainly hexagonal numbers and random symbols trailing...
















Then comes code like:




















By the way, displaying of the file has taken 5:50 to complete! There's no way I can search for the <main> function by scrolling... let's try something like:  grep -A20 -P '<main>' hello_static_display to view where <main> is and the next 20 lines...




As we can see, every library function has been added and it is all very lengthy and rather useless information for the most part.. MOVING ON!

SECOND VARIATION: Remove the compiler option -fno-builtin
Note and explain the change in the function call.

I made similar display files, so let's compare the original to the one without -fno-buildin:

00000000004004f6 <main>:
#include <stdio.h>

int main() {
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
            printf("Hello World!\n");
  4004fa:       bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       b8 00 00 00 00          mov    $0x0,%eax
  400504:       e8 e7 fe ff ff          callq  4003f0 <printf@plt>   -> this line has different print process!
  400509:       b8 00 00 00 00          mov    $0x0,%eax              -> this line repeated twice!!
}
  40050e:       5d                      pop    %rbp

  40050f:       c3                      retq   

00000000004004f6 <main>:
#include <stdio.h>

int main() {
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
            printf("Hello World!\n");
  4004fa:       bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       e8 ec fe ff ff          callq  4003f0 <puts@plt>
  400504:       b8 00 00 00 00          mov    $0x0,%eax
}
  400509:       5d                      pop    %rbp
  40050a:       c3                      retq   
  40050b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)  -> one extra line of code!

The optimization made a difference between using printf and puts for displaying the text, and the -fno-builtin had one less line of code. Most likely in larger code, the amount of line differences will increase.

THIRD VARIATION: Remove the compiler option -g. Note and explain the change in size, section headers, and disassembly output.

First thing, let's check the size difference:

[wawilliams@xerxes lab4]$ ls -l
total 68
-rw-rw-r--. 1 wawilliams wawilliams    66 Feb  7 00:42 hello.c
-rwxrwxr-x. 1 wawilliams wawilliams  8528 Feb  7 02:58 hello_g
-rw-rw-r--. 1 wawilliams wawilliams 16080 Feb  7 02:58 hello_g_display
-rwxrwxr-x. 1 wawilliams wawilliams 10984 Feb  7 01:08 hello_original
-rw-rw-r--. 1 wawilliams wawilliams 23762 Feb  7 01:10 hello_original_display

As we can see, the -g being removed creates smaller files.
And for the header:

00000000004004f6 <main>:
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
  4004fa:       bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       b8 00 00 00 00          mov    $0x0,%eax
  400504:       e8 e7 fe ff ff          callq  4003f0 <printf@plt>
  400509:       b8 00 00 00 00          mov    $0x0,%eax
  40050e:       5d                      pop    %rbp
  40050f:       c3                      retq   

Its missing the human code, like 

#include <stdio.h>

int main() {
printf("Hello World!\n");
}

which is probably why its smaller in size overall. Now the dissembly output seems to have no apparent differences. I did a 'diff hello_g_display hello_original_display' and nothing concerning disassembly code appeared.The code compilation missing the -g is also smaller in size. Its definitely missing a whole bunch of data concerning section.comment:

Contents of section .comment:
 0000 4743433a 2028474e 55292036 2e322e31  GCC: (GNU) 6.2.1
 0010 20323031 36303931 36202852 65642048   20160916 (Red H
 0020 61742036 2e322e31 2d322900 4743433a  at 6.2.1-2).GCC:
 0030 2028474e 55292036 2e332e31 20323031   (GNU) 6.3.1 201
 0040 36313232 31202852 65642048 61742036  61221 (Red Hat 6
 0050 2e332e31 2d312900                    .3.1-1).      

FOURTH VARIATION: Add additional arguments to the printf() function in your program. Note which register each argument is placed in. (Tip: Use sequential integer arguments after the first string argument. Go up to 10 arguments and note the pattern).

Here I'm going to add several extra arguments in the form of numbers and see what happens. Because I did this part with my group, I'll just skip to the end parts and show the differences:

00000000004004f6 <main>:
#include <stdio.h>

int main() {
  4004f6:       55                               push   %rbp
  4004f7:       48 89 e5                     mov    %rsp,%rbp
            printf("Hello World!\n, %d%d%d%d%d%d%d%d%d,%d", 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  4004fa:       48 83 ec 08                 sub    $0x8,%rsp
  4004fe:       6a 0a                           pushq  $0xa
  400500:       6a 09                          pushq  $0x9
  400502:       6a 08                          pushq  $0x8
  400504:       6a 07                          pushq  $0x7
  400506:       6a 06                          pushq  $0x6
  400508:       41 b9 05 00 00 00      mov    $0x5,%r9d
  40050e:       41 b8 04 00 00 00      mov    $0x4,%r8d
  400514:       b9 03 00 00 00           mov    $0x3,%ecx
  400519:       ba 02 00 00 00           mov    $0x2,%edx
  40051e:       be 01 00 00 00           mov    $0x1,%esi
  400523:       bf d0 05 40 00           mov    $0x4005d0,%edi
  400528:       b8 00 00 00 00          mov    $0x0,%eax
  40052d:       e8 be fe ff ff              callq  4003f0 <printf@plt>
  400532:       48 83 c4 30               add    $0x30,%rsp
  400536:       b8 00 00 00 00          mov    $0x0,%eax
}
  40053b:       c9                              leaveq 
  40053c:       c3                              retq   
  40053d:       0f 1f 00                     nopl   (%rax)


As we realised, the yellow part seemed to initialise the setting up of using registers for the argument calls. The light blue section held the first 6 arguments inside registers, and then everything else was put onto a stack (represented by green).

FIFTH VARIATION: Move the printf() call to a separate function named output(), and call that function from main(). Explain the changes in the object code.

The original code variation looked like:

#include <stdio.h>

void output() {
  printf("Hello World!\n");
}

int main() {
      output();
}

As you can see, output() is outside of the main() block of code, however when you look at the objdump version:

000000000040050c <main>:

int main() {
  40050c:       55                              push   %rbp
  40050d:       48 89 e5                    mov    %rsp,%rbp
            output();
  400510:       b8 00 00 00 00          mov    $0x0,%eax
  400515:       e8 dc ff ff ff               callq  4004f6 <output>
  40051a:       b8 00 00 00 00           mov    $0x0,%eax
}
  40051f:       5d                                pop    %rbp
  400520:       c3                                retq   
  400521:       66 2e 0f 1f 84 00 00   nopw   %cs:0x0(%rax,%rax,1)
  400528:       00 00 00 
  40052b:       0f 1f 44 00 00             nopl   0x0(%rax,%rax,1)


You will find that output() has been inserted in between the calling functions of main() !! Also the method of displaying the text is also new, using <output>, named after the function it seems.

SIXTH VARIATION: Remove -O0 and add -O3 to the gcc options. Note and explain the difference in the compiled code.

First we'll look at Optimization Level 0 and then Optimization Level 3:


[wawilliams@xerxes lab4]$ ls -l
total 112
-rw-rw-r--. 1 wawilliams wawilliams    66 Feb  7 00:42 hello.c
-rwxrwxr-x. 1 wawilliams wawilliams 10984 Feb  7 04:41 hello_opt0
-rw-rw-r--. 1 wawilliams wawilliams 23758 Feb  7 04:41 hello_opt0_display
-rwxrwxr-x. 1 wawilliams wawilliams 11208 Feb  7 04:41 hello_opt3
-rw-rw-r--. 1 wawilliams wawilliams 24327 Feb  7 04:42 hello_opt3_display
-rwxrwxr-x. 1 wawilliams wawilliams 10984 Feb  7 01:08 hello_original
-rw-rw-r--. 1 wawilliams wawilliams 23762 Feb  7 01:10 hello_original_display

So the optimization level 3 seems to make file a bit bigger, but the optimization level is greater. Looking at the gcc manual explanation:

-O1 Optimize.  Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

-O2 Optimize even more.  GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff.  As compared to -O, this option increases both compilation time and the performance of the generated code.

-O3 Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on [extra flags]



So, this concludes the research on optimization for our lab!! Hopefully, this has been informative regarding various optimizations techniques and how useful they are.