Wednesday, January 3, 2024

UART and Real Hardware


One thing I have been trying to avoid is to go down the road of connecting a serial cable to the Pi and sending signals to a USB port on a real computer. It just seems too hard (I'm a software guy, not a hardware guy). However, at this point I need to admit I misjudged how hard it is to get the HDMI console working, so I'm backing off and trying something else.

I ordered an FTDI Chip 1m USB to UART Cable in Black from Radio Spares (RS) in the UK.

Just wiring the cable up reminds me of why I hate hardware so much. On the PC end, of course, you just shove the USB connector into your laptop and your done. On the Pi end, however ...

First off, the cable comes with three wires which we will call black, orange and yellow. One of the things about serial connections that confuses me is the crossover: you connect "Rx" to "Tx" and "Tx" to "Rx" and then you have to remember which "Rx" and "Tx" you are thinking about. The Pi has a set of GPIO pins that (if you have the case the same way around I do) run down the left hand side of the box from the front to near the back. The pins are confusingly numbered twice: once by their physical location and once by their logical location. For now (we will come back to them), I'm going to ignore the GPIO numbers and just go with the physical location: the pins are numbered starting at "1" from the front, and each pair of pins has the lower, odd, pin on the right and the higher, even, pin on the left. Thus the first row has "2" and "1", the second "4" and "3", the third "6" and "5" and so on. We want to wire up the "black" lead to pin "6", the "yellow" lead to pin "8" and the "orange" lead to pin "10". That's the third, fourth and fifth pins from the front on the extreme left hand side of the box.

Doing this is not made easier by the fact that (for my cable at least), the orange and yellow leads "wanted" to be the other way around.

That's the clearest description I've been able to come up with - the one which would have helped me to get it set up before I started. The link to the website above gives the more technical description of how the cable itself is wired. This link explains in detail how the Pi is connected together.

Configuring the UART

Now that we have physically connected the two ends together, we need to set up the software on both ends. You need some terminal software to run on the PC end of things. I'm using Linux and after looking at what other people had done, decided to use picocom as a terminal. It's fairly simple to install and use:
$ sudo apt-get install picocom
$ sudo picocom --baud 115200 /dev/ttyUSB0
picocom v3.1

port is        : /dev/ttyUSB0
flowcontrol    : none
baudrate is    : 115200
parity is      : none
databits are   : 8
stopbits are   : 1
It comes back to you with all the settings it uses. Apart from the baud rate (which we specified), it is using no parity, no flow control, 8 data bits and 1 stop bit. I'm fairly optimistic that these are the settings that the code I'm copying from also used (to check, I installed the original code onto the SD card to try this and it worked as a teletype echo, so I know I have done everything correctly).

So now I need to go back and port that code to set up the port correctly.

Here is the original C code:
void uart_init()
{
    register unsigned int r;

    /* initialize UART */
    *UART0_CR = 0;         // turn off UART0

    /* set up clock for consistent divisor values */
    mbox[0] = 9*4;
    mbox[1] = MBOX_REQUEST;
    mbox[2] = MBOX_TAG_SETCLKRATE; // set clock rate
    mbox[3] = 12;
    mbox[4] = 8;
    mbox[5] = 2;           // UART clock
    mbox[6] = 4000000;     // 4Mhz
    mbox[7] = 0;           // clear turbo
    mbox[8] = MBOX_TAG_LAST;
    mbox_call(MBOX_CH_PROP);

    /* map UART0 to GPIO pins */
    r=*GPFSEL1;
    r&=~((7<<12)|(7<<15)); // gpio14, gpio15
    r|=(4<<12)|(4<<15);    // alt0
    *GPFSEL1 = r;
    *GPPUD = 0;            // enable pins 14 and 15
    wait_cycles(150);
    *GPPUDCLK0 = (1<<14)|(1<<15);
    wait_cycles(150);
    *GPPUDCLK0 = 0;        // flush GPIO setup

    *UART0_ICR = 0x7FF;    // clear interrupts
    *UART0_IBRD = 2;       // 115200 baud
    *UART0_FBRD = 0xB;
    *UART0_LCRH = 0x7<<4;  // 8n1, enable FIFOs
    *UART0_CR = 0x301;     // enable Tx, Rx, UART
}
All of this looks pretty hairy and none of it is completely transparent. (This, of course, is why I have avoided having anything to do with it - it feels as complicated as getting the main display to work.)

But let's take it slowly and see what we can get to work in our own Rust code.

First off, let's add a call to uart_init in kernel_main:
#[no_mangle]
pub extern fn kernel_main() {
    avoid_emulator_segv();
    uart_init();
In order to port this code, I've decided to do a limited amount of refactoring and cleaning up of the code. For example, we are going to reuse mbox_send to set the clock, so I've moved the code that was (wrongly) in there to check the response about the video buffer out to the lfb_init method. I've also bundled up the piece of code that is responsible for ensuring that the emulator doesn't SEGV into its own method (avoid_emulator_segv) and called that up front.

So what does the rest of this code do? Well, the first line claims (presumably correctly) to disable the UART by writing 0 to the UART0_CR. We can do that too:
const UART_CR: u32 = 0x3F201030;
...
fn uart_init() {
    // Turn off UART0 while we configure it
    mmio_write(UART_CR, 0);
The next block of code sets the UART clock. Setting the clocks in this way is described in the section of the wiki that deals with tagged mailbox messages.
    // Now, set the UART clock (yes, the Raspberry Pi seems 
    // to have about 10 separate clocks) to 4MHz.
    let mut buf: [u32;36] = [0; 36];

    buf[0] = 9 * 4; // this message has 9 4-byte words
    buf[1] = 0;
    buf[2] = 0x38002; // set one of the clock rates
    buf[3] = 12; // request has three words of data
    buf[4] = 0;  // space for response length, but is zero for request
    buf[5] = 2;  // 2 selects the "UART" clock
    buf[6] = 4000000; // set it to 4MHz
    buf[7] = 0;  // avoid setting "turbo" mode
    buf[8] = 0;

    let mut msg = Message { buf: buf };
    mbox_send(8, &mut msg.buf);
Note that this reuses the (refactored) mbox_send that we used previously to configure the display.

Now, to get to the rest of it, we need to understand the GPIO configuration registers. In the ARM peripherals guide, chapter 6 (p89) describes the GPIO pins and the following page (p90) has a table with all the registers and their alleged addresses (again, these are in the right order, but with the wrong offset, for the actual Raspberry Pi boards). This is somewhat confusing, in part because the first row is duplicated.

Then Table 6-1 explains how the registers are used. The first five registers are used to control the meaning of the 54 GPIO pins (forty in the strip down the side of the board, the other fourteen in the header at the front). For each pin, three bits are used, giving eight possible options for the pin. 000 means this pin is used as an input, 001 means this pin is used as an output, and the other six options identify special-purpose "alternative" functions. These alternative functions are specified in Table 6-31 in Section 6.2 on pp102-103.

Remember that earlier I said that the pins were numbered twice, once for their physical location and once for their "logical" location? Well, this time we use the logical location. Looking at the pinout for the GPIO header you can see that pin 8 is described as GPIO 14 (RXD) and pin 10 is described as GPIO 15 (TXD). Comparing this to Table 6-31, you can see that in the rows GPIO14 and GPIO15 the alternate functions in column "ALT0" are "TXD0" and "RXD0" respectively.

So what we need to do is to set the relevant bits of the second select register (GPFSEL1) to choose ALT0 without damaging any of the other bits. This is what the code does. My question is whether, in porting it, we can make it a little clearer? And whether this is the right time to introduce some unit tests?

I wrote this test:
#[cfg(tests)]
mod tests {
    use alloc::collections::btree_map::Values;

    use super::*;

    #[test]
    fn test_set_4_in_1_from_0() {
        let mut val = 0;
        gpf_select(&mut val, 1, ALT0);
        assert_eq!(val, 0b100000);
    }
}
and tried to run it using cargo test but it comes back and says 0 tests found. For now, I think I'm going to put that on my list of things that aren't working in my environment and that I need to get working and for now assume that I can write this function without help. So I end up with these three functions:
fn gpfsel_read(reg: u32) -> u32 {
    let addr = PERIPHERAL_BASE + GPIO_BASE + (reg*4);
    mmio_read(addr)
}

fn gpf_select(flags: &mut u32, pos: u32, fun: u32) {
    let lsb = pos * 3;
    *flags = *flags & !(7 << lsb); // clear these bits
    *flags = *flags | (fun << lsb);  // set these bits
}

fn gpfsel_write(reg: u32, value: u32) {
    let addr = PERIPHERAL_BASE + GPIO_BASE + (reg*4);
    mmio_write(addr, value);
}
and I can wire them up as follows inside init_uart:
    let mut fs1 = gpfsel_read(1);
    gpf_select(&mut fs1, 4, ALT0);
    gpf_select(&mut fs1, 5, ALT0);
    gpfsel_write(1, fs1);
The next part of the code seems something between arcane and bizarre, but definitely matches the description given on p101 of the Broadcom peripherals guide. It appears that the above code sets the values we want in memory, but does not propagate our choices to the hardware. To achieve that, we need to go through a cycle of telling the chip to make the changes.

In both places, the "magic number" of 150 cycles is specified as being the amount of time that is needed for the change to take effect. I have to imagine that this means "at least" 150 cycles because, apart from anything else, you can't really be sure that any code you write will not be subject to interrupts. And the code that I am copying - unless it is unrolled - would seem to me to use 150 cycles in executing the nop operation, along with at least twice as many in handling the control loop. So I am going to assume that as long as we wait for at least 150 cycles, we will be fine.

It has to be said that while I think I understand what this code is trying to achieve, I don't understand why it works the way it does, and, specifically, I don't understand what is meant by "pull-up" and "pull-down" and why that has anything to do with selecting the function associated with the GPIO pins.

So what this description says (and that the code seems to say) is:
  • Write 0 to the register GPPUD to remove the current pull-up/down setting;
  • Wait for the system to recognize the change;
  • Write a word with bits 14 & 15 set to PUDCLK0;
  • Wait for the system to process the change;
  • Clean up by removing the GPPUD and PUDCLK - in our case, we don't need to clean up GPPUD, so we just need to write PUDCLK.
    mmio_write(GPPUD, 0);
    wait_a_while(150);
    mmio_write(GPPUDCLK0, (1<<14) | (1<<15));
    wait_a_while(150);
    mmio_write(GPPUDCLK0, 0);
And, finally, we need to configure the UART itself. The documentation for the UART starts on p175, and the section on the registers begins on p177.

I am somewhat confused by the first line of C code, which is supposed to clear the interrupts, because it seems to contradict what the documentation says on how the register is to be used. The C code writes a 1 into each of the 10 well-defined bits of the ICR, but it seems to me that the definition in the documentation expects that 0s will be written to clear the bits.

Given that the code works, and I don't have a great deal of faith in any of the documentation, I am going to copy the code rather than the documentation but I'm not sure I will be able to tell if this is "correct" or "just happens to work".
    mmio_write(UART_ICR, 0x7ff);
Now onto setting the baud rate. We want to set the rate to 115,200 baud based on a clock speed of 4MHz: to set this, we need to provide the "divisor": basically a "wavelength". This is very poorly explained in the Broadcom documentation, so I searched the web for other documentation and found this which may in fact be a good source of documentation in general.

It's fairly obvious that the integer part needs to be an unsigned integer between 1 and 65535, while the fraction part is more obscure. It turns out that the value of the fractional part is a numerator where the denominator is always 64, so FBRD is the number of 64ths. In trying to repeat the calculation that I am copying, the clock speed of 4,000,000 is divided by the baud rate of 115,200 giving a divisor of 3.7222...; according to the documentation, this needs to be further divided by 16 which gives me 2.170138...; the integer portion is clearly 2, and the fractional portion is just under 11/64. So I want to write 2 and 11 into the IBRD and FBRD registers respectively.

As luck would have it, those are exactly the numbers in the code I am copying, but I am going to write them in decimal rather than hex for stylistic reasons: I think hex tends to suggest something which derives from bitwise operations.
    mmio_write(UART_IBRD, 2);
    mmio_write(UART_FBRD, 11);
OK, we're nearly there now. The Line Control Register sets the rest of the transmission parameters, and has eight significant bits. We want to turn most of these bits off, but we want to select 8 bit transmission, which involves setting bits 5 and 6, and enable the FIFO mode (so that we can transmit a buffer in one go and then have the UART do all the hard work), which is in bit 4. So we want to set the register to 0x70.
    mmio_write(UART_LCRH, 0x70);
And finally, we re-enable the UART. The control register is described on pp185-187, and the relevant bits we need to set are 0, 8 and 9, which has the hex mask 0x301.
    mmio_write(UART_CR, 0x301); // Enable UART with Rx and Tx
And that all works in the emulator. But on the real hardware, not so much.

This is kind of what I was afraid of. In order to be able to debug things, I need to be able to have some means of knowing what is going on. My first approach was to think that I could write to the console. My second approach was to say, well, if I can't do that, can I at least write to the UART? Apparently, I can't do that either.

What's really interesting is that when I try and simplify this by eliminating some of the code, it still doesn't work. Even if I comment out the whole of the body of uart_init, the Pi starts up and does nothing. If I comment out the calls to avoid_emulator_segv and uart_init, it goes back to almost working (showing Homer in the wrong colors). What seems obvious to me is that something, probably eithr memory related or timing related, is going wrong in a way that is affected by the size of the kernel_main function. But I cannot guess what that could be, nor why it would happen on real hardware and not on the emulator. Interestingly, it does seem to be consistent: if I make one random change and then undo it, it goes back to the behaviour that it had before.

Having spent some time thinking about this away from the computer, I'm becoming increasingly convinced that it must be a timing thing and that it is something I did not copy correctly from the sample program.

Looking back at the C version of the program, I found these lines and the end of mbox_call, which has become send_mbox in my code:
    while(1) {
        /* is there a response? */
        do{asm volatile("nop");}while(*MBOX_STATUS & MBOX_EMPTY);
        /* is it a response to our message? */
        if(r == *MBOX_READ)
            /* is it a valid successful response? */
            return mbox[1]==MBOX_RESPONSE;
    }
What this is doing is checking that the data available to read in the message box is actually the answer to our message (rather than some other message). I suspect that because we are not doing this, it seems we have a response to the second mailbox message, but it's the response to the previous message (setting up the UART). So I'm now going to copy this across.
    loop {
        let rb = mmio_read(MBOX_READ);
        if rb == addr {
            break;
        }
    }
Now, this is back to working and I am getting messages coming across on the console. Taffing with this, it appears to continue working, which is the most important thing to me. On the other hand, Homer, is still blue in the face but I think I know why.

This is all checked in as RUST_BARE_METAL_INIT_UART.

Conclusion

With a few twists and turns, we managed to successfully wire up and set up the UART to communicate with the host PC. In doing this, we now have access to more standard tracing. I'm still a little nervous about how stable everything is, but I'm increasingly convincing myself that the problems are all me doing things wrong and not random instability.

Let's stop Homer being so blue.

No comments:

Post a Comment