S01C02 Write a Hello World program that can run standalone without an OS

In the previous chapter we wrote a “Hello, World!” program, then cross-compiled it with GCC and ran it successfully using the QEMU emulator. However, we also found that the program was several hundred KiB in size, so we can conclude that it must contain a lot of content that we didn’t write. In addition, it relies on the operating system to run, which as you know, is itself a vary large program. In this chapter, we will write a “Hello, World!” program that runs on its own without an operating system and libraries (such a program called a bare-metal program), and then we will cross-compile it with GCC and try to run it with QEMU. Bare-metal programs give us an idea of how a complete program is made up, and also how it is executed when the machine is powered on.

1. What is a bare-metal program?
2. How to start a bare-metal program
3. Creating a bare-metal version of “Hello, World!”
- 3.1 How to access the hardware directly?
- 3.2 Sending characters through the serial port

1. What is a bare-metal program?

A bare-metal program is a program that runs in a “non-OS” environment, which may sound amazing at first: “How can a program run in a machine without operating system? How do I type in the name of a program without a terminal?” But if you think about it, the operating system’s bootloader e.g., GRUB, and the operating system kernel, they all run without an exists operating system. So surely a program can run without an OS, it just might use some magic that we don’t yet know about.

However, a bare-metal program doesn’t have magic, in fact it’s not much different from an ordinary program. The arithmetic operations, flow control, and the structure of the program, is the same as an ordinary program. The difference is that for some privileged operations or I/O operations on hardware devices, the program has to do it by itself and cannot ask the operating system for help. In addition, bare-metal programs are not started with a filename, because there is no file system in the environment. Bare-metal programs usually exist in binary form in ROM or in a fixed location on disk.

If you want a bare-metal application to play music and access the internet, you need to write your own application and hardware drivers for sound card and the network interface; if you want two applications to run at the same time, you need to write a process scheduler; if you want to save a file to the hard disk (or to an SSD), you need to write a filesystem module; and if you want to make the application run safety, you need to implement a virual memory manager. As you can see, the more features you need, the closer the bare-metal program is to an operating system, and in fact the operating system kernel is a typical bare-metal program.

Of course a bare-metal program can be extremely simple if you just want the machine to do something simple task, such as the “Hello, World!” program.

In general, the CPU has two modes, The Supervisor and the User when executing instructions. The kernel runs in the supervisor mode and it can execute most of the instructions, as well as it can access any hardware resource. Applications running in the user mode and it can only execute a limited number of instructions, such as arithmetic operations, memory loads and stores. In other words, in the user mode, the application can not directly access to peripherals such as keyboards, mice, monitors and network interface. It seems the application in this mode is almost useless, so how is the rich function of the application implemented? The answer is that the CPU provides a Trap instruction for the user mode, which corresponds to the operating system’s syscall. The application sends requests to the operating system (kernel) through the syscall, and the operating system accesses the privileged modules and the hardware resources on behalf of the application. It is worth mentioning that syscall is not the only way for user mode applications to communicate with the outside world. For example, the Linux kernel creates virtual filesystems such as /dev, /sys, etc., which can be read or written to by the user application to access the module or hardware. In addition, some hardware peripherals map their interfaces to a certain address in memory (called IOMMU), and the reading and writing of this memory data by the user application will be converted into access to the hardware.

Programs that run in the microcontroller (MCU) (these programs are often called firmware) are also bare-metal programs, because the MCU has too few resources to run an operating system.

There is a type of program called Real Time Operating System (RTOS), usually in the form of libraries, which is a completely different concept from the operating system we usually take about.

2. How to start a bare-metal program

After the machine is powered on or reset, the CPU will start executing its first instruction from a specified memory address, which usually stores a loader or hardware initialization program (e.g. BIOS/UEFI programs, sometimes called firmware) fixed in the ROM chip. The loader then tries to load the system boot loader from a specified location and jumps to the first instruction (a.k.a. entry), which in turn loads the kernel and jumps to the entry point. The process of booting a machine is running serveral programs one by one.

The Linux boot process on x86_64 platform

It is important to note that the boot process is not the same for all machines, some platforms may have only one or two steps, others may have more. But one thing is certain: the location (memory address) and the entry point of echo program is predefined. So the easiest way to get the machine to run our bare-metal program is to place the program in the location reserved for the boot loader or the kernel, so that it can “pretend” to be them, and the machine will execute our program when it is powered on (perhaps after a number of steps).

We are going to implement a bare-metal version of the “Hello, World!” program, which will run in the full system mode of the QEMU RISC-V 64-bit Virt virtual machine.

3. Creating a bare-metal version of “Hello, World!”

Let’s look at the source code of a traditional “Hello, World!” program:

#include <stdio.h>

int main(void) {
    printf("Hello, World!\n");
    return 0;
}

The above program is very simple, with only one key statement: it outputs the “Hello, World!” string to the screen by calling the printf function. If you step through the debugger, or look at the source code for libc, you’ll see that the printf function (and the puts function as well) goes through a series of functions, ending with the write syscall. This indicates that the program requires an operating system to run properly, and therefor cannot be a bare-metal program.

If you want to implement a bare-metal version of the “Hello, World!” program, you will have to implement your own functions for outputting characters to the screen hardware.

3.1 How to access the hardware directly?

Thankfully, interacting with hardware is not too complicated. Computer hardware consists of digital circuits. From a programming point of view, there are many “little switches” in these circuits. Some of these switches are used to change the state of the circuit’s components to perform specific functions, while others turn themselves on or off, like small light bulbs, to tell the state of the circuit.

These “switches” actually correspond to registers in digital circuits, whose inputs or outputs are mapped to specified addresses in memory spece. So when we write a bit 0 or 1 to one of these memories, we can set the state of the corresponding register, and vice versa, by reading these memories, we can get the state of the corresponding register.

So interacting with the hardware is simplified to writing or reading data to or from a specified address memory

3.2 Sending characters through the serial port

The virtual machine QEMU RISC-V 64-bit Virt contains a virtual hardware chip NS16550 which implements the UART communication protocol, which is often known as serial comminication. When data is written to this hardware, it is transferred via an RS-232 interface (in modern computers, this has been replaced by the USB interface) and a cable to a device on the other end, which in the QEMU RISC-V 64-bit Virt is the virtual terminal program running the QEMU program.

By reading the NS16550 datasheet, we can see that the chip has 13 registers, which are used to set the working conditions and to write or read communication data. Each register has a name according to its function. For example, the first register of NS16550 is THR (Transmitter Holding Register), which is used to hold the data to be sent. In addition, each register has its own data size, and in the NS16550, each register is exactly 8-bit. These registers are arranged together to form a data space. According to the datasheet, the NS16550’s registers form a block of 8 bytes of data. This data will be mapped into memory, so this space will have an address, and the program can locate each register by using the address and offset.

The reason why the register space of NS16550 is not 13 x 1 byte = 13 bytes is bacause some of the registers share the same location, for example, the RHR and THR both have address 0.

The QEMU RISC-V Virt source code lists the memory-mapped addresses of peripherals:

static const MemMapEntry virt_memmap[] = {
    ...
    [VIRT_MROM] =         {     0x1000,        0xf000 },
    [VIRT_UART0] =        { 0x10000000,         0x100 },
    [VIRT_FLASH] =        { 0x20000000,     0x4000000 },
    [VIRT_DRAM] =         { 0x80000000,           0x0 },
    ...
};

where UART stands for serial device. From the above list, we know that the memory address of the serial debice NS16550 of this VM is 0x10000000 (the starting address of a peripheral is generally referred to as the base address). According to the datasheet, the address of register THR should be 0x10000000 + 0 = 0x10000000, the address of register LSR (Line Status Register) is 0x10000000 + 0x5 = 0x10000005 and so on.

To send a character through the serial port, simply write the ASCII value (an integer of type uint8) to the address 0x10000000 (i.e. the register THR), and the character will be redirected to the virtual terminal program where the QEMU program is running.

TODO::