Wednesday, January 3, 2024

Figuring out the Issues

So we find ourselves in a situation where the program runs in the emulator sometimes and not at other times. Why not?

Annoyingly, the fact that it works when we have tracing enabled, and not when it doesn't, means that we can't use tracing to find the problem.

But fortunately, we have determined that we can use the debugger. So let's try that again.

Continuing with our theme of optimization, let's add a new script, debug.sh, which contains the relevant commands to debug in the emulator. It will be recalled that this requires the debugger to start with "-s -S", and we can specify that to make with the GDB option as follows:
make "GDB=-s -S" run
We then need (again in a separate tab or window) to run the debugger, which we will put in a script gdb.sh:
gdb-multiarch -iex "file asm/kernel8.elf" -iex "target remote :1234"
where -iex is an option which specfies that the argument should be run as a command once gdb has started.

Sadly, when we try this, we are reminded that we switched to a release build to avoid some compilation/linking errors:
Reading symbols from asm/kernel8.elf...
(No debugging symbols found in asm/kernel8.elf)
Remote debugging using :1234
0x0000000000000000 in ?? ()
(gdb) 
So, we need to go back to trying to make a debug build work. A few judicious edits to our scripts, and debug.sh can now build and link a debug version of our executable. When we do this, it shows the following errors:
aarch64-linux-gnu-ld -nostdlib boot.o ../target/aarch64-unknown-linux-gnu/debug/libhomer_rust.rlib -T linker.ld -o kernel8.elf
aarch64-linux-gnu-ld: ../target/aarch64-unknown-linux-gnu/debug/libhomerrust.rlib(homerrust-c9f0f32593953886.3qdmjppfu6iwhc43.rcgu.o): in function `homerrust::lfbinit':
/home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:133: undefined reference to `memset'
aarch64-linux-gnu-ld: /home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:136: undefined reference to `core::panicking::panic'
aarch64-linux-gnu-ld: /home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:195: undefined reference to `core::panicking::panic'
aarch64-linux-gnu-ld: ../target/aarch64-unknown-linux-gnu/debug/libhomerrust.rlib(homerrust-c9f0f32593953886.3qdmjppfu6iwhc43.rcgu.o): in function `homerrust::showhomer':
/home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:213: undefined reference to `core::panicking::panic'
aarch64-linux-gnu-ld: /home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:214: undefined reference to `core::panicking::panic'
aarch64-linux-gnu-ld: /home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:222: undefined reference to `core::panicking::panic'
aarch64-linux-gnu-ld: ../target/aarch64-unknown-linux-gnu/debug/libhomerrust.rlib(homerrust-c9f0f32593953886.3qdmjppfu6iwhc43.rcgu.o):/home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:222: more undefined references to `core::panicking::panic' follow
Boiled down, this comes to two things:
  • memset, which is defined in the C standard library, is undefined.
  • core::panicking::panic is undefined.
The latter is actually quite easy to fix given that we have already fixed the array bounds panic: we simply find the relevant (mangled) symbol and define it in boot.S.
.globl _ZN4core9panicking5panic17h8f06a2df29fa4962E
_ZN4core9panicking5panic17h8f06a2df29fa4962E:
    b halt
memset is slightly trickier, since we actually need to implement it. But it's not that difficult an implementation, providing we do the right magic to make it seem to be a C function, not a Rust function:
#[no_mangle]
pub extern fn memset(mut buf: *mut u8, val: u8, cnt: usize) {
    let mut i=0;
    while i<cnt {
        unsafe {
            *buf = val;
            buf = buf.add(1);
        }
        i+=1;
    }
}
Now we can use gdb to set a breakpoint. Let's put one in at lfb_init and another at mbox_send:
Reading symbols from asm/kernel8.elf...
Remote debugging using :1234
0x0000000000000000 in ?? ()
(gdb) b lfb_init 
Breakpoint 1 at 0x80360: file src/lib.rs, line 145.
(gdb) b mbox_send 
Breakpoint 2 at 0x80e48: file src/lib.rs, line 249.
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, homer_rust::lfb_init (fb=0x7ffd8) at src/lib.rs:145
145            let mut buf: [u32;36] = [0; 36];
(gdb) n
148            buf[0] = 35 * 4; // the buffer has 35 4-byte words
(gdb) n
149            buf[1] = 0; // we indicate we are sending a MBOX_REQUEST as 0
(gdb) n
154            buf[2] = 0x48003;
(gdb) n
155            buf[3] = 8; // the number of bytes in the request value
(gdb) p/x buf[2]
$1 = 0x48003
So far, so good. We are going through the code and we seem to be correctly setting up the buffer. Let's carry on and see how mbox_send gets on.
(gdb) c
Continuing.
Thread 1 hit Breakpoint 2, homer_rust::mbox_send (ch=8, buf=0x7fedc) at src/lib.rs:249
249            while mmio_read(MBOX_STATUS) & MBOX_BUSY != 0 {
(gdb) n
253            let volbuf = buf as *const u32;
(gdb) n
256            let ptr:u32 = volbuf as u32;
(gdb) n
259            let addr = (ptr & !0x0F) | ((ch as u32) & 0x0f);
(gdb) p/x ptr
$2 = 0x7fedc
Wait a minute! This code is predicated on ptr being aligned to a 16-byte boundary, but it clearly isn't. Let's just go one line further and check what happens:
(gdb) n
262            mmio_write(MBOX_WRITE, addr);
(gdb) p/x addr
$3 = 0x7fed8
(gdb) 
Yeah, that's not going to end up too well. It's going to try and read and write a buffer 12 bytes before we've set it up. What exactly happens will obviously depend on what it sees, but it's not going to be what we wanted (I guess we already knew that).

OK, so the debugger helped us out there to see what the problem is, but why? And how do we fix it?

It would seem that the C compiler has different alignment rules to the Rust compiler. I'm not entirely sure about what either set are, and I'm too lazy to check, but I suspect that in C, arrays that have a size which is a multiple of 16 bytes are aligned to 16 byte boundaries (which is why the array is declared as 36 words when we only use 35). In Rust, this obviously does not happen.

We therefore need to manually force the alignment. I considered a number of ways of doing this. One is to allocate an array too big, and then to figure out an offset into the array which is aligned; another is to "do the job properly" and introduce memory allocation and have a method which returns a value with arbitrary alignment; and the third is to see what compatibility features Rust has. I decided that the last was probably the best compromise although, as you'll see, I fully accept I could be wrong.

Rust offers an attribte #[repr(align(16))] which says it aligns things to the appropriate boundary. This seems to be just what we need. However, applying it to the array gives an error:
error[E0517]: attribute should be applied to a struct, enum, function, associated function, or union
   --> src/lib.rs:145:12
    |
145 |     #[repr(align(16))]
    |            ^^^^^^^^^
146 |     let mut buf: [u32;36] = [0; 36];
    |     -------------------------------- not a struct, enum, function, associated function, or union
It doesn't come much clearer than that. You can only align some things, and arrays are not one of them. So let's package the array in a struct and align that.
#[repr(align(16))]
struct Message {
    pub buf: [u32; 36]
}
We can then create our buffer and create a Message from that, and then use that (aligned) buffer to send to the mailbox:
    let mut msg = Message { buf: buf };
    mbox_send(8, &mut msg.buf);

    let volbuf: *mut u32 = &mut msg.buf as *mut u32;
Sadly, compiling this throws up another linker error:
aarch64-linux-gnu-ld: ../target/aarch64-unknown-linux-gnu/debug/libhomer_rust.rlib(homer_rust-c9f0f32593953886.3qdmjppfu6iwhc43.rcgu.o): in function `homer_rust::lfb_init':
/home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:216: undefined reference to `memcpy'
aarch64-linux-gnu-ld: /home/gareth/Projects/IgnoranceBlog/homer_rust/src/lib.rs:216: undefined reference to `memcpy'
But fortunately, we are starting to get good at this:
#[no_mangle]
pub extern fn memcpy(mut dest: *mut u8, mut src: *const u8, cnt: usize) {
    let mut i=0;
    while i<cnt {
        unsafe {
            *dest = *src;
            dest = dest.add(1);
            src = src.add(1);
        }
        i+=1;
    }
}
And, now, it works! Great. Let's check that in before it stops working again. It's tagged as RUST_BARE_METAL_FIXED_EMULATOR

Conclusion

So, this is getting messier and messier, but at least it now works in the emulator somewhat reliably - with or without tracing turned on.

We learnt that alignment is different between C and Rust, and that we can (somewhat) hack that by using a struct with the retr attribute in Rust.

This also (coincidentally) seems to fix one of our problems on the physical box: Homer now appears. Sadly, he is once again the wrong color (blue this time). Looks like we still have work to do.

No comments:

Post a Comment