Thursday 2 March 2017

Lab 3: Working with X86_64 Registers and Aarch64 Registers

For this lab, we are beginning to dive into the codes we write as they appear in machine language.

Take your typical 'Hello World' basic C program [ hello.c ]:

/* Hello World in traditional C using printf() */

#include <stdio.h>

int main() {
        printf("Hello World!\n");
}

When I compile the code into an executable file, I can jump into the assembler code by using the command:

gcc -o hello hello.c          -> compile the hello.c file into the hello executable file
./hello                               -> to execute the file (make it run on command line)
objdump -d hello | less     -> break into the hello file's assembler code


It will produce a whole bunch of header code and other filler, but the section that produces the result we asked for in our program looks like this:

00000000004004f6 <main>:
  4004f6:       55                             push   %rbp
  4004f7:       48 89 e5                   mov    %rsp,%rbp
  4004fa:       bf a0 05 40 00          mov    $0x4005a0,%edi
  4004ff:       e8 ec fe ff ff              callq  4003f0 <puts@plt>
  400504:       b8 00 00 00 00        mov    $0x0,%eax
  400509:       5d                            pop    %rbp
  40050a:       c3                            retq   
  40050b:       0f 1f 44 00 00         nopl   0x0(%rax,%rax,1)

This is basically the way that you would deconstruct a C file.

For our lab, we are going to deconstruct similar files with X86_64 and Aarch Registers. For example, looking at the hello-gas.s file (the code, not an executable file yet):

/* 
   This is a 'hello world' program in x86_64 assembler using the 
   GNU assembler (gas) syntax. Note that this program runs in 64-bit
   mode.

   CTyler, Seneca College, 2014-01-20
   Licensed under GNU GPL v2+
*/

.text
.globl  _start

_start:
        movq    $len,%rdx                       /* message length */
        movq    $msg,%rsi                       /* message location */
        movq    $1,%rdi                         /* file descriptor stdout */
        movq    $1,%rax                         /* syscall sys_write */
        syscall

        movq    $0,%rdi                         /* exit status */
        movq    $60,%rax                        /* syscall sys_exit */
        syscall

.section   .rodata

msg:    .ascii      "Hello, world!\n"
            len = . - msg

We are going to need to build this into an executable file by using the following commands, which I read from the Makefile, describing how to build them:

as hello-gas.o hello-gas.s                 -> turn the assembler code into an intermediary object file 
ld hello-gas hello-gas.o                    -> turn the object file into an executable file

Then we can jump into what the assembler code looks like in the executable file:

objdump -d hello-gas

Which will give the result:

hello-gas:     file format elf64-x86-64

Disassembly of section .text:

0000000000400078 <_start>:
  400078:       48 c7 c2 0e 00 00 00      mov    $0xe,%rdx
  40007f:       48 c7 c6 a6 00 40 00      mov    $0x4000a6,%rsi
  400086:       48 c7 c7 01 00 00 00     mov    $0x1,%rdi
  40008d:       48 c7 c0 01 00 00 00     mov    $0x1,%rax
  400094:       0f 05                              syscall 
  400096:       48 c7 c7 00 00 00 00     mov    $0x0,%rdi
  40009d:       48 c7 c0 3c 00 00 00     mov    $0x3c,%rax
  4000a4:       0f 05                              syscall 

I repeated the same steps to build all the X86_64 files and then view their assembly code, then I moved on to the Aarch64 files. What is interesting to note before moving on, is that the x86_64 assembler code strictly had what was necessary to perform the task my program was designed for. The C programs that I had deconstructed earlier had a LOT more lines of assembler code that had nothing to do with the actual function of the program (which is simply print "Hello, world!"). So it appears that C programs have a bit of work to do in translating more information to the registers concerning the layout of the programs or something like that I suppose.

For the Aarch64, the hello.s file looks like:

.text
.globl _start
_start:
        mov    x0, 1              /* file descriptor: 1 is stdout */
        adr      x1, msg         /* message location (memory address) */
        mov    x2, len           /* message length (bytes) */

        mov     x8, 64          /* write is syscall #64 */
        svc       0                  /* invoke syscall */
        mov      x0, 0           /* status -> 0 */
        mov     x8, 93          /* exit is syscall #93 */
        svc       0                  /* invoke syscall */
.data
msg:    .ascii      "Hello, world!\n"
len=    . - msg

I follow a similar process of:      as -o hello.o hello.s

However, right away I got this response:

hello.s: Assembler messages:
hello.s:5: Error: too many memory references for `mov'
hello.s:6: Error: no such instruction: `adr x1,msg'
hello.s:7: Error: too many memory references for `mov'
hello.s:9: Error: too many memory references for `mov'
hello.s:10: Error: no such instruction: `svc 0'
hello.s:12: Error: too many memory references for `mov'
hello.s:13: Error: too many memory references for `mov'
hello.s:14: Error: no such instruction: `svc 0'

I'm not sure why I got that, but for now I will continue moving onwards with the rest of the lab until I figure it out... The next part of the lab is using a looping function with some variations. The original code looks like:

.text
.globl    _start

start = 0                       /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 10                        /* loop exits when the index hits this number (loop condition is i<max) */

_start:
    mov     $start,%r15         /* loop index */

loop:
    /* ... body of the loop ... do something useful here ... */

    inc     %r15                /* increment index */
    cmp     $max,%r15           /* see if we're done */
    jne     loop                /* loop if we're not */

    mov     $0,%rdi             /* exit status */
    mov     $60,%rax            /* syscall sys_exit */
    syscall

By itself, the code will do nothing, but I need to adjust the code so that it prints something like:

Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop
Loop

So I'm going to look for where I can make a change that creates that result.
Looking to our previous examples, I may need something like: msg:    .ascii      "Hello, world!\n"
So I'll change it to: msg:    .ascii      "Loop\n"

[ NEED TO FINISH THE LAB ]

No comments:

Post a Comment