HomeAboutArticles

ARM64 — Assembling with a listing

Start page1 page2 page3 page4 page5 page6 page7 page8 page9 page10

To know more of what actually happens in the assembler we can ask for a listing, using the -a argument:

user@rpi4:~/work $ as -a hello1.s -o hello1.o

The result is this listing — and some more we will not look at:

 1                          .text
 2                          .global _start
 3                  _start:
 4                  /* Print a message on stdout, x0=1 */
 5 0000 200080D2            mov x0,1
 6 0004 E1000010            adr x1,message
 7 0008 820180D2            mov x2,12
 8 000c 080880D2            mov x8,0x40
 9 0010 010000D4            svc 0
10                  /* And exit the program */
11 0014 000080D2            mov x0,0
12 0018 A80B80D2            mov x8, 93
13 001c 010000D4            svc 0
14                  
15 0020 48656C6C    message: .ascii "Hello World\n"
15      6F20576F 
15      726C640A

The mov-instruction

Look at the first instruction: mov x0,1. It assembled to this 32-bit value, written in hex: 200080D2. We will use an online service to disassemble this value, https://shell-storm.org/online/Online-Assembler-and-Disassembler/ , and we get:

20 00 80 D2    movz x0, #0x1

We can see register x0, and the value 1, written in hex with a #-sign. But it is no longer a mov, but a movz.

I think the # is optional when writing a literal number.

But the movz is more interesting, so let us read about it on https://stackoverflow.com/questions/53268118/ (I should find a better reference).

Turns out that the immediate value you can write in a move-instruction is only 16 bit, but we want a 64-bit value. movz to the rescue: it moves the 16 bit value to the least significant 16 bit of the register, and zeros the rest. So we get a correct 64-bit value for the number 1.

We need not think more about this as long as we just need small positive numbers, but there are two interesting things:

We can address all the 16 bit areas, not just the least significant:

movz x0,1
movz x0,2,lsl #16
movz x0,3,lsl #32
movz x0,4,lsl #48

Here lsl stands for logical shift left. All 4 variants above will zero out all the other 48 bits, so in this example only the last instruction wins. We get (in hex) 0004000000000000.

There is a similar instruction that keeps the values of the other 48 bits, called movkk for keep. We could do this:

movz x0,1
movk x0,2,lsl #16
movk x0,3,lsl #32
movk x0,4,lsl #48

and here we get (in hex): 0004000300020001.

We will later see more examples of this.

The adr-instruction

We need the address for the string, and since addresses are also 64-bit, we have a problem similar to mov.

The value (in hex) E1000010 is disassembled to adr x1, #0x1c, and it means that the address we need is computed from 0x1c added to the address of the current instruction — the value of the program counter.

Using our hex-calculator we get: 0x04 + 0x1c = 0x20, which is where the string is located relative to the program counter, pointing to the instruction.

We can only get addresses fairly close to where we are, but we will soon see what is done with addresses far away.

When the program is actually executed it is placed somewhere in memory, and the actual addresses can be anything.

Now change something

Move the line with the string and label message: up just before the _start:-label. Your program should still work, but the adr-instruction is now (in hex): 81FFFF10, indicating a value negative to the adr-instruction.

Start page1 page2 page3 page4 page5 page6 page7 page8 page9 page10

2025-06-16