Archive for April, 2005

From Grub to protected mode (2)

Sunday, April 24th, 2005

In this second post I’ll try to explain, how to produce the binary image to use with grub.

As we saw in the last post we have choose an ELF Multiboot binary. The layout for the binary will be:

  • The Multiboot header at the beginning of the binary.
  • Our whole binary will be loaded at 1Mb physical memory.
  • With the ELF headers (using the linker) we will specify that our entry point is the start function.
  • The first stage of our binary, the one that executes before paging is enabled, will be linked starting at physical address 1Mb, a second part of our binary will be linked using the logical address 0xC0000000, this logical address will correspond, after enabling paging, to the physical address of 1 Mb.

The Multiboot header in our case is contained in the loader.asm file, and only consists on the MAGIC label, the FLAGS and the CHECKSUM. What follows is the declaration for the Multiboot header in nasm format.

ALIGN_VAR  equ   0x01 ; align loaded modules on page boundaries
MEMINFO equ  0x2 ; provide memory map
FLAGS equ  ALIGN_VAR | MEMINFO  ; this is the Multiboot 'flag' field
MAGIC equ    0x1BADB002       ; 'magic number' lets bootloader find the header
CHECKSUM equ -(MAGIC + FLAGS) ; checksum required

;; Align to 32 bit, as Multiboot header needs it,http://www.nilo.org/multiboot.html
align 4

;;Multiboot header
dd MAGIC
dd FLAGS
dd CHECKSUM

So after this our kernel needs to setup it own descriptor table, as the Multiboot specification says that the descriptors left by a Multiboot compliant bootloader do not need to be valid after boot, so the first thing our bootloader has to do is setup a 3 segment descriptors, a null segment descriptor, a code and a data ones. Then load their base to the GDTR and load again CR3, to make the change persistent.

Now we have ours segment selector registers loaded with our segment descriptors. We can now initialize the paging structure. Once the paging structure has been created we can safely activate paging and jump to the main function linked at 0xC0000000.

From Grub to protected mode (1)

Saturday, April 23rd, 2005

This is part of the project but not in its final presentation, those are just some notes on how to progress from Grub to protected mode.

First of all we need to have a bootable floppy image with Grub installed, then have our kernel copied inside. Our kernel must be Multiboot conformant. I’ve chosen to use an ELF with the Multiboot info inside, we will be seeing how to produce such a binary. First of all the Multiboot header must be contained in the executable, the structure of the Multiboot header is the following:

Offset Type Field Name Note
0 u32 magic required
4 u32 flags required
8 u32 checksum required
12 u32 header_addr if flags[16] is set
16 u32 load_addr if flags[16] is set
20 u32 load_end_addr if flags[16] is set
24 u32 bss_end_addr if flags[16] is set
28 u32 entry_addr if flags[16] is set
32 u32 mode_type if flags[2] is set
36 u32 width if flags[2] is set
40 u32 height if flags[2] is set
44 u32 depth if flags[2] is set

As we will be producing an ELF image the address information does not need to be contained in the Multiboot header, thus bit 16 won’t be set. In this first example we will not need to provide the video information, as the text mode is OK for now, so the bit 2 will be set to 0.

The idea to have an OS image loaded with Grub is the following: when the computer boots Grub is the default bootloader and then the user is prompted to choose which boot image wants to load. Our ELF will be selected, from this moment our binary OS image is running.

The state of the machine after Grub has just jumped to the entry point of our image is the following ( the info was extracted using Bochs):

eax:0x2badb002
ebx:0x2ca20
ecx:0x1f600
edx:0x0
ebp:0x67ee4
esi:0x2cb3f
edi:0x2cb40
esp:0x67ed4
eflags:0x46
eip:0x110098
cs:s=0x8, dl=0xffff, dh=0xcf9a00, valid=1
ss:s=0x10, dl=0xffff, dh=0xcf9300, valid=7
ds:s=0x10, dl=0xffff, dh=0xcf9300, valid=7
es:s=0x10, dl=0xffff, dh=0xcf9300, valid=1
fs:s=0x10, dl=0xffff, dh=0xcf9300, valid=1
gs:s=0x10, dl=0xffff, dh=0xcf9300, valid=1
ldtr:s=0x0, dl=0x0, dh=0x0, valid=0
tr:s=0x0, dl=0x0, dh=0x0, valid=0
gdtr:base=0x8f5c, limit=0x27
idtr:base=0x0, limit=0x3ff
dr0:0x0
dr1:0x0
dr2:0x0
dr3:0x0
dr6:0xffff0ff0
dr7:0x400
tr3:0x0
tr4:0x0
tr5:0x0
tr6:0x0
tr7:0x0
cr0:0x60000011
cr1:0x0
cr2:0x0
cr3:0x0
cr4:0x0
inhibit_mask:0
done

If we analyze a little bit this information we can get the following conclusions:

  • cr0:0×60000011:
    • bit 0 == 1: PE Protection Enables. We are in protected mode.
    • bit 31 ==0: PG Paging. Paging is disabled.
  • GDTR: We will have a look at the descriptor table.
    Null Segment Descriptor (SR=0):
    0×00008f5c : 0×00000000
    0×00008f60 : 0×00000000

    Code Segment Descriptor (SR=0×08)
    0×00008f64 : 0×0000ffff
    0×00008f68 : 0×00cf9a00

    The rest of the segments (SR = 0×10):
    0×00008f6c : 0×0000ffff
    0×00008f70 : 0×00cf9300

From the previous information we can extract that after the beginning of the execution of our code the processor has the A20 gate enabled, the processor is running in protected mode without paging enabled and that all the segment are 4Gb of size, we can access in a lineal way to all our memory.

In the following post I’ll explain how to progress from here to a paginated memory model.

Protected Mode

Monday, April 18th, 2005

Protected Mode

This mode is provided on the Intel X86 family since the 80286, it is the main operating mode and most features are only accessible from this mode.
This mode provides us with two different memory models to choose from, segmentation and pagination. Segmentation is used to isolate and protect different regions
of memory ( ie. code, data, stack \ldots ) from different task. Paging can be used either as a protection facility or/and a translation to map physical memory into
a virtual space. Segmentation is always turned on but we can choose whether or not to use paging. We will see later a trick to turn off segmentation.

Segmentation Theory

Segmentation provides us with a method for separating different ranges of physical address into groups, we can have a segment for the data, one for the code and
another for the stack, and the processor provides us with some protection, this way data will not be written into code stack. If we have a multitask system,
where multiple process are running we can give each process a set of segments, thus the processor will give us protection against one process writing to
an address space that does not belong to his segment.

Segmentation

As can be seen on figure \ref{fig:Segmentation}, the logical address consists of a segment selector and an offset. The segment selector is a unique identifier for
each segment, the segment selector points to a global directory table entry, called segment descriptor, there is a segment descriptor for every segment we have defined.
Each segment descriptor, specifies the size of the segment, the access rights and privilege level for the segment, the segment type, and the location of the first byte
of the segment in the linear address space (called the base address of the segment). To form the physical address, the base address of the segment is summed
to the offset provided in the logical address.

Paging Theory

Normally operating systems need a way of “virtualizing“ the memory, for example to give different address space to different process, paging allows us to
accomplish so. Paging lets us map different logical address regions to physical memory. A page is defined as the minimum amount of contiguous physical memory
that we can allocate. Each virtual address is mapped to a physical one, at a page range, so each page is contiguous in virtual and physical memory.

When the we try to access to a page that is not on memory the processor rises an exception which the operating system can handle
and get the page from the disk and put it back on memory, this is called page on demand, and allows to have in memory process that would not have fit
otherwise.

Paging

On figure \ref{fig:paging} we can see a simple paging scheme, the logical address consists of a directory entry pointer, a table offset and a byte offset.
The directory entry pointer points to a page directory entry (PDE), which contains a set of attributes and a base address to the page table.
The table base address is summed to the table offset ( from the logical address ) to obtain a page table entry, which also contains a set of
attributes and a physical base address, which is the base of the page. The byte offset is summed to the base of the page to obtain the physical address.
This same model can be extended with more tables, but the idea is the same.

Memory Model under Protected Mode

The memory model that we will be using for the operating system under protected mode is pagination without segmentation, as it has been said before it is not
possible to turn off segmentation. The trick to turn segmentation off is th use of a basic flat-model segmentation scheme. The basic flat-model consists of the
whole address space available in contiguous space, this is accomplished by assigning the data, code and stack segments the whole 4Gb of address space. This way
the segmentation unit will not generate exceptions under any circumstance.

On top of this segmentation scheme we will be adding paging. In practice it will be as if we only has pagination.

In protected mode the provided logical address space is of $2^{32}$ bytes ( 4Gb ), this is the flat segment of memory we will be logically addressing from
our processor. This logical space will be mapped into different physical space regions that do not have to be contiguous. If a bigger address space is needed
since the Intel Pentium III there is the possibility to use a 64 Gb address space ( $2^{36}$ bytes ), this extension is invoked by toggling the PAE bit in EFLAGS,
and using the 36 bit page extension. We will not be using this extension tough.

When accessing to a physical address the system always uses a two stages to get there. Even using the aforementioned flat-model, a logical address consists of
a 16 bit segment selector and a 32 bit offset, the segment selector identifies the segment in which the data is referred. The first stage consists on
translating the logical address into linear address, to accomplish this it uses the segment selector to get segment base, this base is added to the offset, the
second stage transforms this linear address into physical address, if paging is not enabled, the linear address is directly the physical address,
otherwise the paging tables are used to get the physical address.

Segment Selectors

Segment selectors are 16 bit identifiers that point to the segment descriptors which are held in the global or local descriptor tables (GDT or LDT).
The segment selectors are used to tell the processor which segment to use, but instead of directly pointing the segment, as in real-address mode, a
pointer is used.

Each segment selector contains:

  • Index: (Bits 3 through 15) Selects one of the 8192 descriptors in the GDT or LDT. The processor multiplies the index value by 8
    (the number of bytes in a segment descriptor) and adds the result to the base address of the GDT or LDT (from the GDTR or LDTR register, respectively).
  • TI: (Bit 2) Specifies which table to use, GDT or LDT.
  • RPL: (Bits 0,1) Specifies the privilege level of the selector. The privilege level can range from 0 to 3, with 0 being the most privileged level.

Segment selectors are loaded into the segment registers, each register hold a segment selector for a specified kind of data. The list of segment register is
as follows:

  • CS: Code segment, instructions are fetched from this segment.
  • DS: Data segment, data instructions get their content from this segment.
  • SS: Stack segment, the stack is hold in this segment.
  • ES, FS, GS: The processor also provides this three additional data-segment registers, which can be used to make additional data
    segments available to the currently executing program (or task).

Only 6 segments can be loaded at anytime, however we can have up to 8192 segments defined, which have to be loaded into a segment register before being
able to use them. When the segment tables are modified is our job to reload the segment registers, as the processor caches part of the information
on the segment table every time a segment register is loaded.

Segment Descriptor

The segment descriptor, shown on figure \ref{fig:SegDesc} is the structure held in the segment table (LDT or GDT) which has information about the segment.
Our segment descriptors will be really simple and will allow us to have the notion of not having segments at all.

Each segment descriptor is 4 bytes and has the following information:

Segment Descriptor

  • Base Address: A 32 value specifying the base address of the segment.
  • Limit: A 20 bit value which specifies the size of the segment, this value can be interpreted in two ways depending on bit G (granularity), this
    bit is also contained in the segment descriptor. If G is clear the segment can be from 1 byte to 1Mb ($2^{20}$). If G is set, the increments are in multiples of 4Kbytes
    ($2^{12}$), thus the available range is from 4Kb to 4Gb ( $2^{32}$ ).
  • Type: Indicates the kind of gate and specifies the access rights.
  • S: Flag to specify if is a System or Code segment.
  • DPL: Privilege level from 0 to 3.
  • P: Segment present flag. If the flag is not set the processor generates an exception when there is an access to this segment.
  • D/B: Its function depends on the kind of segment, we will see its values later in each case.
  • G: As we have seen it determines the granularity of the segment.
  • Reserved Bits: A reserved bit that should always be 0 for the processor and another bit that can be used for our own purpouses.

Segment Descriptor Tables

A segment descriptor table is an array of segment descriptors, starting at some base address. We can have two kind of tables, the global descriptor table(GDT)
and the local descriptor table (LDT).

The GDT is per system, and there is only one that must be defined. The GDT is used for all programs, and we can optionally have more than one LDT, which
can be used by different tasks. The GDT resides in the linear address space, the base of the GDT and its size is loaded into the GDTR register, the base
of the GDT must be 8-byte aligned.

The first segment of the GDT is used as the null segment and is not used as a regular segment, this segment does not generate an exception when loaded into
one of DS, ES,FS, or GS segment registers.

The LDT is contained in a system (S=1) segment of the LDT type, LDT are stored in the LDTR to be used. We will not make any further explanation of LDT’s as we will
not be using them for the operating system.

Paging

Now that we have seen all segmentation options, let’s have a look at paging configurations. Paging is activated by turning on the PG flag in the EFLAGS register,
together with this flag there are two more, PSE which enables 2Mb or 4Mb pages, otherwise if PSE is not enabled the most common size of the page is 4Kb. PAE
extens the physical address to 36 bits, this extension is to be used together with paging only.

The structures used to translate linear to physical address when paging is on are the following (when PAE!=1 and PSE!=1):

  • Page Directory: An array of 32-bit page-directory entries (PDEs) contained in a 4-KByte page. Up to 1024 page-directory entries can be held in a page directory.
  • Page table: An array of 32-bit page-table entries (PTEs) contained in a 4-KByte page. Up to 1024 page-table entries can be held in a page table.
  • The Page: A 4Kbytes of flat contiguous physical memory.

Linear Address Translation (4Kbytes Page)
4K Intel

Register CR3(PDBR) points to the base of the Page Directory. To get the physical translation of a linear address, first a page directory entry is obtained by
adding to CR3(PDBR) the directory offset (bits 31 through 22 of the linear address). The directory entry provides us with a base for the page table, this base
is added to the Table offset (bits 21 through 12 of the linear address) to obtain the page table entry, this table provides us with a 20 bit physical address
which is the base of the desired page. Once the physical base of the page has been obtained, the offset (bits 11 through 0 of the linear address) is added
to obtain the final physical byte.

As we have seen CR3(PDBR) contains the base address for the page directory, this has to be loaded with a valid value before enabling paging, and must
remain in memory while the task that needs it is active.

Page Table Entrie

Page directory and page table entries have a similar format, each of them must be page aligned and is always the size of a page.
The fields of figure \ref{fig:ptdentry} have the following meaning:

  • Base Address: (bit 31 through 12) Specifies a 4Kb page that contains the page table or the physical page, depending on the page directory kind.
  • Present Flag: It tells whether or not the physical page is on memory or not.
  • R/W Flag: Privileges for a page or a group of pages.
  • U/S Flag: User or supervisor memory page or group of pages.
  • PWT Flag: Page Level Write-through, controls the caching policy for a page or a set of pages.
  • PCD Flag: Page cache disable, controls whether a page should be cached or not.
  • A Flag: Accessed, indicates that the page has been written or read. The software must clear this flag.
  • D Flag: Dirty, indicates whether the page has been written to or not. The software must clear this flag.
  • PS Flag: Page Size, indicates the size of the page. For example 4Kb or 4Mb.
  • PAT Flag: Selects the PAT (Page Attribute Table) for the page.
  • G Flag: Global, when a page is marked as global and CR4 also has the G bit set the page it’s TLB entry is not invalidated after CR3 has changed. This is used for
    pages that do not change on context changes.

Real-address Mode

Saturday, April 2nd, 2005

This is an explanation of the Real-address mode of intel’s X86 processors. It is a subsection of target architecture chapter.

Real-address Mode

This mode is provided on the Intel X86 family to emulate the behavior of an 8086 computer, it is provided for backward compatibility for older operating systems.
When the system is started the CPU starts in Real-address mode and behaves exactly as a 8086 system with a fast processor.
The instruction set is the same as of an 8086 system but extended, this means that it has backwards compatibility for software designed for 8086.

It is the job of the operating system, or the bootloader to take the CPU to further modes, in this mode we have a large set of limitations, which are typical
on 8086 systems. Some of the most important features of this mode are the following\footnote{This data is taken from the Intel Darasheet for Penitum processors, Volume 3}( those features do not include the extended set and are native of 8086 systems):

  • The processor supports up to 1 Mb of physical memory ( 20 bit address ), the address space is divided into 64 Kb segments, the base of a segment is selected
    with a 16bit segment selector, which is zero extended to form the 20bit address offset from address 0, operation in a segment are addressed with 16bits
    addressing from the base of the segment. Physical address are 20bits, which are formed by adding the 16bit segment offset to the 20bit segment base.
  • Native operands are 8 or 16 bits.
  • Eight 16-bit general-purpose registers are provided: AX, BX, CX, DX, SP, BP, SI, and DI.
  • Four segment registers are provided: CS, DS, SS, and ES. The CS register contains the segment selector for the code segment; the DS and ES
    registers contain segment selectors for data segments; and the SS register contains the segment selector for the stack segment.
  • The 8086 16-bit instruction pointer (IP) is mapped to the lower 16-bits of the EIP register. Note this register is a 32-bit
    register and unintentional address wrapping may occur.
  • The 16-bit FLAGS register contains status and control flags.
  • All of the Intel 8086 instructions are supported.
  • A single, 16-bit-wide stack is provided for handling procedure calls and invocations of
    interrupt and exception handlers. This stack is contained in the stack segment identified
    with the SS register. The SP (stack pointer) register contains an offset into the stack
    segment. The stack grows down (toward lower segment offsets) from the stack pointer.
    The BP (base pointer) register also contains an offset into the stack segment that can be
    used as a pointer to a parameter list. When a CALL instruction is executed, the processor
    pushes the current instruction pointer (the 16 least-significant bits of the EIP register and,
    on far calls, the current value of the CS register) onto the stack. On a return, initiated with
    a RET instruction, the processor pops the saved instruction pointer from the stack into the
    EIP register (and CS register on far returns). When an implicit call to an interrupt or
    exception handler is executed, the processor pushes the EIP, CS, and EFLAGS (low-order
    16-bits only) registers onto the stack. On a return from an interrupt or exception handler,
    initiated with an IRET instruction, the processor pops the saved instruction pointer and
    EFLAGS image from the stack into the EIP, CS, and EFLAGS registers.
  • A single interrupt table, called the interrupt vector table or interrupt table, is provided
    for handling interrupts and exceptions. The interrupt table (which has 4-byte entries) takes the place of the interrupt descriptor table
    (IDT, with 8-byte entries) used when handling protected-mode interrupts and exceptions. Interrupt and exception vector numbers provide an index to
    entries in the interrupt table. Each entry provides a pointer (called a vector) to an interrupt- or exception-handling procedure.
  • The FPU is the same as that on 8086 systems, all programs can be run as if the math unit was the same.

Segments

The way the 8086 does segmentation differs from typical segmentation systems, as the segment selector does not reference to an entry to a segment table,
but it is the base of the segment itself.

As an example let’s imagine we are working with the Data Segment register ( DS ), this register is a 16bit register, to form the base offset of the register it is
shifted left 4 bits. For example DS=0×07C0 would become on being a base address 0×7C00, all data access made with this DS register would be added to that base.
So if we want to access the 16bit data address 0×7 we would be accessing the physical address 0×7C07. This is how segmentation works under real address mode.

The segment and offset is often noted as Segment:Offset, on our example it would be noted as 07C0:0007 which as we had previously seen refereed address 0×7C07.

A20 Line

The segment and offset structure leads us to be able to access a 21 bit address. As an example take the address FFFF:FFFF this is base FFFF0 plus offset FFFF which
is the physical address 10FFEF, this has bit 21 set to 1, on 8086 and 8088 the physical wiring was of 20bit so this 21st bit was ignored, and the address
was treated as if it were address FFEF, software had to be aware of this address wraping.

When the Intel 80286 came out it had a 24 address line, and support for both real and protected modes. Under real-address mode there was a bug on the processor
and the 21st line was not zeroed, thus making available the first 64Kb ( less 16 bytes ) when in real-address mode. This address range (100000-10FFEFh) is
call high memory area (HMA).

To ensure compatibility with old 8086 programs, under the AT specification IBM used a spare pin on the keyboard controller to control this address line, it was
the line A20 (21st address line), using the keyword control, software can turn on and off this line, enabling or not this wraparound.

Later on the manipulation on the A20 line was let to be handled by the BIOS. This is one of the many design bugs that persist on the Intel processor family,
though it make compatibility possible.

Care should be taken that the line A20 is handled correctly before entering in protected mode, if the A20 line is zeroed we will only be able to have
access to half of our space, only even MB size chunks will be accessible.

Interrupt and Exception Handling

When in real address mode the software must provide the handling facilities, separately from those of protected mode. When the processor receives an interrupt or
exception it refers to an interrupt table, it uses the number of the interrupt vector as an index at the interrupt table, the interrupt table is called IVT,
interrupt vector table. The IVT provides a pointer to the handler function for each vector, the pointer is a segment and offset pair, being each 4 bytes.

The processor does the following to access the handling function:

  • The interrupt is received by the processor.
  • Stores the current values of the CS and IP registers onto the stack.
  • Pushes the low-order 16 bits of the EFLAGS register onto the stack.
  • Clears the IF flag in the EFLAGS register to disable interrupts.
  • Clears the TF, RC, and AC flags, in the EFLAGS register.
  • Transfers program control to the location specified in the interrupt vector table.

To return from an interrupt the instruction IRET is used and does the inverse as what has just been decribed.

Following reset, the base of the interrupt vector table is located at physical address 0 and its limit is set to
3FFH. In the Intel 8086 processor, the base address and limit of the interrupt vector table cannot
be changed. In the later IA-32 processors, the base address and limit of the interrupt vector table
are contained in the IDTR register and can be changed using the LIDT instruction.