21 Commits

Author SHA1 Message Date
Justin C. Miller
cf22ed57a2 [docs] Update the README with roadmap info 2021-02-17 00:47:12 -08:00
Justin C. Miller
b6772ac2ea [kernel] Fix #DF when building with -O3
I had failed to specify in inline asm that an input variable was the
same as the output variable.
2021-02-17 00:22:22 -08:00
Justin C. Miller
f0025dbc47 [kernel] Schedule threads on other CPUs
Now that the other CPUs have been brought up, add support for scheduling
tasks on them. The scheduler now maintains separate ready/blocked lists
per CPU, and CPUs will attempt to balance load via periodic work
stealing.

Other changes as a result of this:
- The device manager no longer creates a local APIC object, but instead
  just gathers relevant info from the APCI tables. Each CPU creates its
  own local APIC object. This also spurred the APIC timer calibration to
  become a static value, as all APICs are assumed to be symmetrical.
- Fixed a bug where the scheduler was popping the current task off of
  its ready list, however the current task is never on the ready list
  (except the idle task was first set up as both current and ready).
  This was causing the lists to get into bad states. Now a task can only
  ever be current or in a ready or blocked list.
- Got rid of the unused static process::s_processes list of all
  processes, instead of trying to synchronize it via locks.
- Added spinlocks for synchronization to the scheduler and logger
  objects.
2021-02-15 12:56:22 -08:00
Justin C. Miller
2a347942bc [kernel] Fix SMP boot on KVM
KVM didn't like setting all the CR4 bits we wanted at once. I suspect
that means real hardware won't either. Delay the setting of the rest of
CR4 until after the CPU is in long mode - only set PAE and PGE from real
mode.
2021-02-13 01:45:17 -08:00
Justin C. Miller
36da65e15b [kernel] Add index to cpu_data
Because the firmware can set the APIC ids to whatever it wants, add a
sequential index to each cpu_data structure that jsix will use for its
main identifier, or for indexing into arrays, etc.
2021-02-11 00:00:34 -08:00
Justin C. Miller
214ff3eff0 Update gitignore
Adding files that have been hanging around my deveploment environment
that should not get checked in
2021-02-10 23:59:05 -08:00
Justin C. Miller
8c0d52d0fe [kernel] Add spinlocks to vm_space, frame_allocator
Also updated spinlock interface to be an object, and added a scoped lock
object that uses it as well.
2021-02-10 23:57:51 -08:00
Justin C. Miller
793bba95b5 [boot] Do address virtualization in the bootloader
More and more places in the kernel init code are taking addresses from
the bootloader and translating them to offset-mapped addresses. The
bootloader can do this, so it should.
2021-02-10 01:23:50 -08:00
Justin C. Miller
2d4a65c654 [kernel] Pre-allocate cpu_data and pass to APs
In order to avoid cyclic dependencies in the case of page faults while
bringing up an AP, pre-allocate the cpu_data structure and related CPU
control structures, and pass them to the AP startup code.

This also changes the following:
- cpu_early_init() was split out of cpu_early_init() to allow early
  usage of current_cpu() on the BSP before we're ready for the rest of
  cpu_init(). (These functions were also renamed to follow the preferred
  area_action naming style.)
- isr_handler now zeroes out the IST entry for its vector instead of
  trying to increment the IST stack pointer
- the IST stacks are allocated outside of cpu_init, to also help reduce
  stack pressue and chance of page faults before APs are ready
- share stack areas between AP idle threads so we only waste 1K per
  additional AP for the unused idle stack
2021-02-10 15:44:07 -08:00
Justin C. Miller
872f178d94 [kernel] Update syscall MSRs for all CPUs
Since SYSCALL/SYSRET rely on MSRs to control their function, split out
syscall_enable() into syscall_initialize() and syscall_enable(), the
latter being called on all CPUs. This affects not just syscalls but also
the kernel_to_user_trampoline.

Additionally, do away with the max syscalls, and just make a single page
of syscall pointers and name pointers. Max syscalls was fragile and
needed to be kept in sync in multiple places.
2021-02-10 15:25:17 -08:00
Justin C. Miller
70d6094f46 [kernel] Add fake preludes to isr handler to trick GDB
By adding more debug information to the symbols and adding function
frame preludes to the isr handler assembly functions, GDB sees them as
valid locations for stack frames, and can display backtraces through
interrupts.
2021-02-10 01:10:26 -08:00
Justin C. Miller
31289436f5 [kernel] Use PAUSE in spinwait
Using PAUSE in a tight loop allows other logical cores on the same
physical core to make use of more of the core's resources.
2021-02-07 23:52:06 -08:00
Justin C. Miller
5e7792c11f [scripts] Add GDB j6tw page table walker
Added the command "j6tw <pml4> <addr>" which takes any arguments that
evaluate to addresses or integers. It displays the full breakdown of the
page table walk for the given address, with flags.
2021-02-07 23:50:53 -08:00
Justin C. Miller
e73064a438 [kutil] Update spinlock to an MCS-style lock
Update the existing but unused spinlock class to an MCS-style queue
spinlock. This is probably still a WIP but I expect it to see more use
with SMP getting further integrated.
2021-02-07 23:50:00 -08:00
Justin C. Miller
72787c0652 [kernel] Make sure all vma types have (virtual) dtors 2021-02-07 23:45:07 -08:00
Justin C. Miller
c88170f6e0 [kernel] Start all other processors in the system
This very large commit is mainly focused on getting the APs started and
to a state where they're waiting to have work scheduled. (Actually
scheduling on them is for another commit.)

To do this, a bunch of major changes were needed:

- Moving a lot of the CPU initialization (including for the BSP) to
  init_cpu(). This includes setting up IST stacks, writing MSRs, and
  creating the cpu_data structure. For the APs, this also creates and
  installs the GDT and TSS, and installs the global IDT.

- Creating the AP startup code, which tries to be as position
  independent as possible. It's copied from its location to 0x8000 for
  AP startup, and some of it is fixed at that address. The AP startup
  code jumps from real mode to long mode with paging in one swell foop.

- Adding limited IPI capability to the lapic class. This will need to
  improve.

- Renaming cpu/cpu.* to cpu/cpu_id.* because it was just annoying in GDB
  and really isn't anything but cpu_id anymore.

- Moved all the GDT, TSS, and IDT code into their own files and made
  them classes instead of a mess of free functions.

- Got rid of bsp_cpu_data everywhere. Now always call the new
  current_cpu() to get the current CPU's cpu_data.

- Device manager keeps a list of APIC ids now. This should go somewhere
  else eventually, device_manager needs to be refactored away.

- Moved some more things (notably the g_kernel_stacks vma) to the
  pre-constructor setup in memory_bootstrap. That whole file is in bad
  need of a refactor.
2021-02-07 23:44:28 -08:00
Justin C. Miller
a65ecb157d [fb] Fix fb log scrolling
While working on the double-buffering issue, I ripped out this feature
from scrollback and didn't put it back in. Also having main allocate
extra space for the message buffer since calling malloc/free over again
several times was causing malloc to panic. (Which should also separately
be fixed..)
2021-02-06 00:33:45 -08:00
Justin C. Miller
eb8a3c0e09 [kernel] Fix frame allocator next-block bug
The frame allocator was causing page faults when exhausting the first
(well, last, because it starts from the end) block of free pages. Turns
out it was just incrementing instead of decrementing and thus running
off the end.
2021-02-06 00:06:29 -08:00
Justin C. Miller
572fade7ff [fb] Use rep stosl for screen fill
This is mostly cleanup after fighting the double buffering bug - bring
back rep stosl for screen fill, and move draw_pixel to be an inline
function.
2021-02-05 23:55:42 -08:00
Justin C. Miller
b5885ae35f [fb] Dynamically allocate log entry buffer
Since the kernel will tell us what size of buffer we need for
j6_system_get_log(), malloc the buffer for it from the heap instead of a
fixed array on the stack.
2021-02-05 23:51:34 -08:00
Justin C. Miller
335bc01185 [kernel] Fix page_tree growth bug
The logic was inverted in contains(), meaning that new parents were
never being created, and the same level-0 block was just getting reused.
2021-02-05 23:47:29 -08:00
56 changed files with 1748 additions and 737 deletions

2
.gitignore vendored
View File

@@ -3,8 +3,10 @@
*.bak
tags
jsix.log
*.out
*.o
*.a
sysroot
.gdb_history
.peru
__pycache__

View File

@@ -1,10 +1,10 @@
![jsix](assets/jsix.svg)
# jsix: A hobby operating system
# The jsix operating system
**jsix** is the hobby operating system that I am currently building. It's far
from finished, or even being usable. Instead, it's a sandbox for me to play
with kernel-level code and explore architectures.
**jsix** is a custom multi-core x64 operating system that I am building from
scratch. It's far from finished, or even being usable - see the *Status and
Roadmap* section, below.
The design goals of the project are:
@@ -23,9 +23,8 @@ The design goals of the project are:
by the traditional microkernel problems.
* Exploration - I'm really mostly doing this to have fun learning and exploring
modern OS development. Modular design may be tossed out (hopefully
temporarily) in some places to allow me to play around with the related
hardware.
modern OS development. Initial feature implementations may temporarily throw
away modular design to allow for exploration of the related hardware.
A note on the name: This kernel was originally named Popcorn, but I have since
discovered that the Popcorn Linux project is also developing a kernel with that
@@ -35,6 +34,67 @@ and my wonderful wife.
[cpu_features]: https://github.com/justinian/jsix/blob/master/src/libraries/cpu/include/cpu/features.inc
## Status and Roadmap
The following major feature areas are targets for jsix development:
#### UEFI boot loader
_Done._ The bootloader loads the kernel and initial userspace programs, and
sets up necessary kernel arguments about the memory map and EFI GOP
framebuffer. Possible future ideas:
- take over more init-time functions from the kernel
- rewrite it in Zig
#### Memory
_Virtual memory: Sufficient._ The kernel manages virtual memory with a number
of kinds of `vm_area` objects representing mapped areas, which can belong to
one or more `vm_space` objects which represent a whole virtual memory space.
(Each process has a `vm_space`, and so does the kernel itself.)
Remaining to do:
- TLB shootdowns
- Page swapping
_Physical page allocation: Sufficient._ The current physical page allocator
implementation suses a group of block representing up-to-1GiB areas of usable
memory as defined by the bootloader. Each block has a three-level bitmap
denoting free/used pages.
#### Multitasking
_Sufficient._ The global scheduler object keeps separate ready/blocked lists
per core. Cores periodically attempt to balance load via work stealing.
User-space tasks are able to create threads as well as other processes.
Several kernel-only tasks exist, though I'm trying to reduce that. Eventually
only the timekeeping task should be a separate kernel-only thread.
#### API
_In progress._ User-space tasks are able to make syscalls to the kernel via
fast SYSCALL/SYSRET instructions.
Major tasks still to do:
- The process initialization protocol needs to be re-built entirely.
- Processes' handles to kernel objects need the ability to check capabilities
#### Hardware Support
* Framebuffer driver: _In progress._ Currently on machines with a video
device accessible by UEFI, jsix starts a user-space framebuffer driver that
only prints out kernel logs.
* Serial driver: _To do._ Machines without a video device should have a
user-space log output task like the framebuffer driver, but currently this
is done inside the kernel.
* USB driver: _To do_
* AHCI (SATA) driver: _To do_
## Building
jsix uses the [Ninja][] build tool, and generates the build files for it with a

View File

@@ -56,8 +56,77 @@ class PrintBacktraceCommand(gdb.Command):
return
class TableWalkCommand(gdb.Command):
def __init__(self):
super().__init__("j6tw", gdb.COMMAND_DATA)
def invoke(self, arg, from_tty):
args = gdb.string_to_argv(arg)
if len(args) < 2:
raise Exception("Must be: j6tw <pml4> <addr>")
pml4 = int(gdb.parse_and_eval(args[0]))
addr = int(gdb.parse_and_eval(args[1]))
indices = [
(addr >> 39) & 0x1ff,
(addr >> 30) & 0x1ff,
(addr >> 21) & 0x1ff,
(addr >> 12) & 0x1ff,
]
names = ["PML4", "PDP", "PD", "PT"]
table_flags = [
(0x0001, "present"),
(0x0002, "write"),
(0x0004, "user"),
(0x0008, "pwt"),
(0x0010, "pcd"),
(0x0020, "accessed"),
(0x0040, "dirty"),
(0x0080, "largepage"),
(0x0100, "global"),
(0x1080, "pat"),
((1<<63), "xd"),
]
page_flags = [
(0x0001, "present"),
(0x0002, "write"),
(0x0004, "user"),
(0x0008, "pwt"),
(0x0010, "pcd"),
(0x0020, "accessed"),
(0x0040, "dirty"),
(0x0080, "pat"),
(0x0100, "global"),
((1<<63), "xd"),
]
flagsets = [table_flags, table_flags, table_flags, page_flags]
table = pml4
entry = 0
for i in range(len(indices)):
entry = int(gdb.parse_and_eval(f'((uint64_t*){table})[{indices[i]}]'))
flagset = flagsets[i]
flag_names = " | ".join([f[1] for f in flagset if (entry & f[0]) == f[0]])
print(f"{names[i]:>4}: {table:016x}")
print(f" index: {indices[i]:3} {entry:016x}")
print(f" flags: {flag_names}")
if (entry & 1) == 0 or (i < 3 and (entry & 0x80)):
break
table = (entry & 0x7ffffffffffffe00) | 0xffffc00000000000
PrintStackCommand()
PrintBacktraceCommand()
TableWalkCommand()
gdb.execute("target remote :1234")
gdb.execute("display/i $rip")

View File

@@ -12,6 +12,7 @@ modules:
- src/kernel
source:
- src/kernel/apic.cpp
- src/kernel/ap_startup.s
- src/kernel/assert.cpp
- src/kernel/boot.s
- src/kernel/clock.cpp
@@ -24,8 +25,9 @@ modules:
- src/kernel/frame_allocator.cpp
- src/kernel/fs/gpt.cpp
- src/kernel/gdt.cpp
- src/kernel/gdt.s
- src/kernel/gdtidt.s
- src/kernel/hpet.cpp
- src/kernel/idt.cpp
- src/kernel/interrupts.cpp
- src/kernel/interrupts.s
- src/kernel/io.cpp
@@ -56,6 +58,7 @@ modules:
- src/kernel/syscalls/thread.cpp
- src/kernel/syscalls/vm_area.cpp
- src/kernel/task.s
- src/kernel/tss.cpp
- src/kernel/vm_space.cpp
boot:
@@ -111,6 +114,7 @@ modules:
- src/libraries/kutil/logger.cpp
- src/libraries/kutil/memory.cpp
- src/libraries/kutil/printf.c
- src/libraries/kutil/spinlock.cpp
cpu:
kind: lib
@@ -118,7 +122,7 @@ modules:
includes:
- src/libraries/cpu/include
source:
- src/libraries/cpu/cpu.cpp
- src/libraries/cpu/cpu_id.cpp
j6:
kind: lib

View File

@@ -8,7 +8,7 @@
#include <stdint.h>
#include "console.h"
#include "cpu/cpu.h"
#include "cpu/cpu_id.h"
#include "error.h"
#include "fs.h"
#include "hardware.h"
@@ -93,6 +93,8 @@ add_module(args::header *args, args::mod_type type, buffer &data)
m.type = type;
m.location = data.data;
m.size = data.size;
change_pointer(m.location);
}
/// Check that all required cpu features are supported
@@ -198,12 +200,15 @@ efi_main(uefi::handle image, uefi::system_table *st)
reinterpret_cast<kernel::entrypoint>(kernel.entrypoint);
status.next();
hw::setup_control_regs();
memory::virtualize(args->pml4, map, st->runtime_services);
status.next();
change_pointer(args);
change_pointer(args->pml4);
change_pointer(args->modules);
change_pointer(args->programs);
status.next();
kentry(args);

View File

@@ -92,14 +92,28 @@ main(int argc, const char **argv)
scrollback scroll(rows, cols);
int pending = 0;
constexpr int pending_threshold = 10;
constexpr int pending_threshold = 5;
j6_handle_t sys = __handle_sys;
size_t buffer_size = 0;
void *message_buffer = nullptr;
char message_buffer[256];
while (true) {
size_t size = sizeof(message_buffer);
j6_system_get_log(__handle_sys, message_buffer, &size);
if (size != 0) {
entry *e = reinterpret_cast<entry*>(&message_buffer);
size_t size = buffer_size;
j6_status_t s = j6_system_get_log(sys, message_buffer, &size);
if (s == j6_err_insufficient) {
free(message_buffer);
message_buffer = malloc(size * 2);
buffer_size = size;
continue;
} else if (s != j6_status_ok) {
j6_system_log("fb driver got error from get_log, quitting");
return s;
}
if (size > 0) {
entry *e = reinterpret_cast<entry*>(message_buffer);
size_t eom = e->bytes - sizeof(entry);
e->message[eom] = 0;
@@ -119,7 +133,6 @@ main(int argc, const char **argv)
}
}
j6_system_log("fb driver done, exiting");
return 0;
}

View File

@@ -9,7 +9,8 @@ screen::screen(volatile void *addr, unsigned hres, unsigned vres, unsigned scanl
m_resx(hres),
m_resy(vres)
{
m_back = reinterpret_cast<pixel_t*>(malloc(scanline*vres*sizeof(pixel_t)));
const size_t size = scanline * vres;
m_back = reinterpret_cast<pixel_t*>(malloc(size * sizeof(pixel_t)));
}
screen::pixel_t
@@ -33,15 +34,9 @@ screen::color(uint8_t r, uint8_t g, uint8_t b) const
void
screen::fill(pixel_t color)
{
const size_t len = m_resx * m_resy;
for (size_t i = 0; i < len; ++i)
m_back[i] = color;
}
void
screen::draw_pixel(unsigned x, unsigned y, pixel_t color)
{
m_back[x + y * m_resx] = color;
const size_t len = m_scanline * m_resy;
asm volatile ( "rep stosl" : :
"a"(color), "c"(len), "D"(m_back) );
}
void

View File

@@ -17,7 +17,11 @@ public:
pixel_t color(uint8_t r, uint8_t g, uint8_t b) const;
void fill(pixel_t color);
void draw_pixel(unsigned x, unsigned y, pixel_t color);
inline void draw_pixel(unsigned x, unsigned y, pixel_t color) {
const size_t index = x + y * m_scanline;
m_back[index] = color;
}
void update();

View File

@@ -45,8 +45,12 @@ scrollback::render(screen &scr, font &fnt)
const unsigned xstride = (m_margin + fnt.width());
const unsigned ystride = (m_margin + fnt.height());
unsigned start = m_count <= m_rows ? 0 :
m_count % m_rows;
for (unsigned y = 0; y < m_rows; ++y) {
char *line = &m_data[y*m_cols];
unsigned i = (start + y) % m_rows;
char *line = &m_data[i*m_cols];
for (unsigned x = 0; x < m_cols; ++x) {
fnt.draw_glyph(scr, line[x], fg, bg, m_margin+x*xstride, m_margin+y*ystride);
}

149
src/kernel/ap_startup.s Normal file
View File

@@ -0,0 +1,149 @@
%include "tasking.inc"
section .ap_startup
BASE equ 0x8000 ; Where the kernel will map this at runtime
CR0_PE equ (1 << 0)
CR0_MP equ (1 << 1)
CR0_ET equ (1 << 4)
CR0_NE equ (1 << 5)
CR0_WP equ (1 << 16)
CR0_PG equ (1 << 31)
CR0_VAL equ CR0_PE|CR0_MP|CR0_ET|CR0_NE|CR0_WP|CR0_PG
CR4_DE equ (1 << 3)
CR4_PAE equ (1 << 5)
CR4_MCE equ (1 << 6)
CR4_PGE equ (1 << 7)
CR4_OSFXSR equ (1 << 9)
CR4_OSCMMEXCPT equ (1 << 10)
CR4_FSGSBASE equ (1 << 16)
CR4_PCIDE equ (1 << 17)
CR4_INIT equ CR4_PAE|CR4_PGE
CR4_VAL equ CR4_DE|CR4_PAE|CR4_MCE|CR4_PGE|CR4_OSFXSR|CR4_OSCMMEXCPT|CR4_FSGSBASE|CR4_PCIDE
EFER_MSR equ 0xC0000080
EFER_SCE equ (1 << 0)
EFER_LME equ (1 << 8)
EFER_NXE equ (1 << 11)
EFER_VAL equ EFER_SCE|EFER_LME|EFER_NXE
bits 16
default rel
align 8
global ap_startup
ap_startup:
jmp .start_real
align 8
.pml4: dq 0
.cpu: dq 0
.ret: dq 0
align 16
.gdt:
dq 0x0 ; Null GDT entry
dq 0x00209A0000000000 ; Code
dq 0x0000920000000000 ; Data
align 4
.gdtd:
dw ($ - .gdt)
dd BASE + (.gdt - ap_startup)
align 4
.idtd:
dw 0 ; zero-length IDT descriptor
dd 0
.start_real:
cli
cld
xor ax, ax
mov ds, ax
; set the temporary null IDT
lidt [BASE + (.idtd - ap_startup)]
; Enter long mode
mov eax, cr4
or eax, CR4_INIT
mov cr4, eax
mov eax, [BASE + (.pml4 - ap_startup)]
mov cr3, eax
mov ecx, EFER_MSR
rdmsr
or eax, EFER_VAL
wrmsr
mov eax, CR0_VAL
mov cr0, eax
; Set the temporary minimal GDT
lgdt [BASE + (.gdtd - ap_startup)]
jmp (1 << 3):(BASE + (.start_long - ap_startup))
bits 64
default abs
align 8
.start_long:
; set data segments
mov ax, (2 << 3)
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov eax, CR4_VAL
mov rdi, [BASE + (.cpu - ap_startup)]
mov rax, [rdi + CPU_DATA.rsp0]
mov rsp, rax
mov rax, [BASE + (.ret - ap_startup)]
jmp rax
global ap_startup_code_size
ap_startup_code_size:
dq ($ - ap_startup)
section .text
global init_ap_trampoline
init_ap_trampoline:
push rbp
mov rbp, rsp
; rdi is the kernel pml4
mov [BASE + (ap_startup.pml4 - ap_startup)], rdi
; rsi is the cpu data for this AP
mov [BASE + (ap_startup.cpu - ap_startup)], rsi
; rdx is the address to jump to
mov [BASE + (ap_startup.ret - ap_startup)], rdx
; rcx is the processor id
mov rdi, rdx
pop rbp
ret
extern long_ap_startup
global ap_idle
ap_idle:
call long_ap_startup
sti
.hang:
hlt
jmp .hang

View File

@@ -6,11 +6,18 @@
#include "kernel_memory.h"
#include "log.h"
uint64_t lapic::s_ticks_per_us = 0;
static constexpr uint16_t lapic_id = 0x0020;
static constexpr uint16_t lapic_spurious = 0x00f0;
static constexpr uint16_t lapic_icr_low = 0x0300;
static constexpr uint16_t lapic_icr_high = 0x0310;
static constexpr uint16_t lapic_lvt_timer = 0x0320;
static constexpr uint16_t lapic_lvt_lint0 = 0x0350;
static constexpr uint16_t lapic_lvt_lint1 = 0x0360;
static constexpr uint16_t lapic_lvt_error = 0x0370;
static constexpr uint16_t lapic_timer_init = 0x0380;
static constexpr uint16_t lapic_timer_cur = 0x0390;
@@ -25,6 +32,7 @@ apic_read(uint32_t volatile *apic, uint16_t offset)
static void
apic_write(uint32_t volatile *apic, uint16_t offset, uint32_t value)
{
log::debug(logs::apic, "LAPIC write: %x = %08lx", offset, value);
*(apic + offset/sizeof(uint32_t)) = value;
}
@@ -48,14 +56,58 @@ apic::apic(uintptr_t base) :
}
lapic::lapic(uintptr_t base, isr spurious) :
lapic::lapic(uintptr_t base) :
apic(base),
m_divisor(0)
{
apic_write(m_base, lapic_spurious, static_cast<uint32_t>(spurious));
apic_write(m_base, lapic_lvt_error, static_cast<uint32_t>(isr::isrAPICError));
apic_write(m_base, lapic_spurious, static_cast<uint32_t>(isr::isrSpurious));
log::info(logs::apic, "LAPIC created, base %lx", m_base);
}
uint8_t
lapic::get_id()
{
return static_cast<uint8_t>(apic_read(m_base, lapic_id) >> 24);
}
void
lapic::send_ipi(ipi mode, uint8_t vector, uint8_t dest)
{
// Wait until the APIC is ready to send
ipi_wait();
uint32_t command =
static_cast<uint32_t>(vector) |
static_cast<uint32_t>(mode);
apic_write(m_base, lapic_icr_high, static_cast<uint32_t>(dest) << 24);
apic_write(m_base, lapic_icr_low, command);
}
void
lapic::send_ipi_broadcast(ipi mode, bool self, uint8_t vector)
{
// Wait until the APIC is ready to send
ipi_wait();
uint32_t command =
static_cast<uint32_t>(vector) |
static_cast<uint32_t>(mode) |
(self ? 0 : (1 << 18)) |
(1 << 19);
apic_write(m_base, lapic_icr_high, 0);
apic_write(m_base, lapic_icr_low, command);
}
void
lapic::ipi_wait()
{
while (apic_read(m_base, lapic_icr_low) & (1<<12))
asm volatile ("pause" : : : "memory");
}
void
lapic::calibrate_timer()
{
@@ -72,10 +124,10 @@ lapic::calibrate_timer()
clock::get().spinwait(us);
uint32_t remaining = apic_read(m_base, lapic_timer_cur);
uint32_t ticks_total = initial - remaining;
m_ticks_per_us = ticks_total / us;
uint64_t ticks_total = initial - remaining;
s_ticks_per_us = ticks_total / us;
log::info(logs::apic, "APIC timer ticks %d times per microsecond.", m_ticks_per_us);
log::info(logs::apic, "APIC timer ticks %d times per microsecond.", s_ticks_per_us);
interrupts_enable();
}
@@ -95,7 +147,7 @@ lapic::set_divisor(uint8_t divisor)
case 64: divbits = 0x9; break;
case 128: divbits = 0xa; break;
default:
kassert(0, "Invalid divisor passed to lapic::enable_timer");
kassert(0, "Invalid divisor passed to lapic::set_divisor");
}
apic_write(m_base, lapic_timer_div, divbits);

View File

@@ -3,6 +3,7 @@
/// Classes to control both local and I/O APICs.
#include <stdint.h>
#include "kutil/enum_bitfields.h"
enum class isr : uint8_t;
@@ -18,6 +19,22 @@ protected:
uint32_t *m_base;
};
enum class ipi : uint32_t
{
// Delivery modes
fixed = 0x0000,
smi = 0x0200,
nmi = 0x0400,
init = 0x0500,
startup = 0x0600,
// Flags
deassert = 0x0000,
assert = 0x4000,
edge = 0x0000, ///< edge-triggered
level = 0x8000, ///< level-triggered
};
IS_BITFIELD(ipi);
/// Controller for processor-local APICs
class lapic :
@@ -26,8 +43,26 @@ class lapic :
public:
/// Constructor
/// \arg base Physicl base address of the APIC's MMIO registers
/// \arg spurious Vector of the spurious interrupt handler
lapic(uintptr_t base, isr spurious);
lapic(uintptr_t base);
/// Get the local APIC's ID
uint8_t get_id();
/// Send an inter-processor interrupt.
/// \arg mode The sending mode
/// \arg vector The interrupt vector
/// \arg dest The APIC ID of the destination
void send_ipi(ipi mode, uint8_t vector, uint8_t dest);
/// Send an inter-processor broadcast interrupt to all other CPUs
/// \arg mode The sending mode
/// \arg self If true, include this CPU in the broadcast
/// \arg vector The interrupt vector
void send_ipi_broadcast(ipi mode, bool self, uint8_t vector);
/// Wait for an IPI to finish sending. This is done automatically
/// before sending another IPI with send_ipi().
void ipi_wait();
/// Enable interrupts for the LAPIC timer.
/// \arg vector Interrupt vector the timer should use
@@ -57,19 +92,14 @@ public:
void calibrate_timer();
private:
inline uint64_t ticks_to_us(uint32_t ticks) const {
return static_cast<uint64_t>(ticks) / m_ticks_per_us;
}
inline uint64_t us_to_ticks(uint64_t interval) const {
return interval * m_ticks_per_us;
}
inline static uint64_t ticks_to_us(uint64_t ticks) { return ticks / s_ticks_per_us; }
inline static uint64_t us_to_ticks(uint64_t interval) { return interval * s_ticks_per_us; }
void set_divisor(uint8_t divisor);
void set_repeat(bool repeat);
uint32_t m_divisor;
uint32_t m_ticks_per_us;
static uint64_t s_ticks_per_us;
};

View File

@@ -17,6 +17,6 @@ void
clock::spinwait(uint64_t us) const
{
uint64_t when = value() + us;
while (value() < when);
while (value() < when) asm ("pause");
}

View File

@@ -2,10 +2,18 @@
#include "kutil/assert.h"
#include "kutil/memory.h"
#include "cpu.h"
#include "cpu/cpu.h"
#include "cpu/cpu_id.h"
#include "device_manager.h"
#include "gdt.h"
#include "idt.h"
#include "kernel_memory.h"
#include "log.h"
#include "msr.h"
#include "objects/vm_area.h"
#include "syscall.h"
#include "tss.h"
cpu_data bsp_cpu_data;
cpu_data g_bsp_cpu_data;
void
cpu_validate()
@@ -29,3 +37,30 @@ cpu_validate()
#undef CPU_FEATURE_OPT
#undef CPU_FEATURE_REQ
}
void
cpu_early_init(cpu_data *cpu)
{
IDT::get().install();
cpu->gdt->install();
// Install the GS base pointint to the cpu_data
wrmsr(msr::ia32_gs_base, reinterpret_cast<uintptr_t>(cpu));
}
void
cpu_init(cpu_data *cpu, bool bsp)
{
if (!bsp) {
// The BSP already called cpu_early_init
cpu_early_init(cpu);
}
// Set up the syscall MSRs
syscall_enable();
// Set up the page attributes table
uint64_t pat = rdmsr(msr::ia32_pat);
pat = (pat & 0x00ffffffffffffffull) | (0x01ull << 56); // set PAT 7 to WC
wrmsr(msr::ia32_pat, pat);
}

View File

@@ -2,9 +2,12 @@
#include <stdint.h>
class GDT;
class lapic;
class process;
struct TCB;
class thread;
class process;
class TSS;
struct cpu_state
{
@@ -18,15 +21,39 @@ struct cpu_state
/// version in 'tasking.inc'
struct cpu_data
{
cpu_data *self;
uint16_t id;
uint16_t index;
uint32_t reserved;
uintptr_t rsp0;
uintptr_t rsp3;
TCB *tcb;
thread *t;
process *p;
thread *thread;
process *process;
TSS *tss;
GDT *gdt;
// Members beyond this point do not appear in
// the assembly version
lapic *apic;
};
extern cpu_data bsp_cpu_data;
extern "C" cpu_data * _current_gsbase();
// We already validated the required options in the bootloader,
// but iterate the options and log about them.
/// Set up the running CPU. This sets GDT, IDT, and necessary MSRs as well as creating
/// the cpu_data structure for this processor.
/// \arg cpu The cpu_data structure for this CPU
/// \arg bsp True if this CPU is the BSP
void cpu_init(cpu_data *cpu, bool bsp);
/// Do early (before cpu_init) initialization work. Only needs to be called manually for
/// the BSP, otherwise cpu_init will call it.
/// \arg cpu The cpu_data structure for this CPU
void cpu_early_init(cpu_data *cpu);
/// Get the cpu_data struct for the current executing CPU
inline cpu_data & current_cpu() { return *_current_gsbase(); }
/// Validate the required CPU features are present. Really, the bootloader already
/// validated the required features, but still iterate the options and log about them.
void cpu_validate();

View File

@@ -13,6 +13,7 @@ void
print_regs(const cpu_state &regs)
{
console *cons = console::get();
cpu_data &cpu = current_cpu();
uint64_t cr2 = 0;
__asm__ __volatile__ ("mov %%cr2, %0" : "=r"(cr2));
@@ -20,8 +21,8 @@ print_regs(const cpu_state &regs)
uintptr_t cr3 = 0;
__asm__ __volatile__ ( "mov %%cr3, %0" : "=r" (cr3) );
cons->printf(" process: %llx", bsp_cpu_data.p->koid());
cons->printf(" thread: %llx\n", bsp_cpu_data.t->koid());
cons->printf(" process: %llx", cpu.process->koid());
cons->printf(" thread: %llx\n", cpu.thread->koid());
print_regL("rax", regs.rax);
print_regM("rbx", regs.rbx);
@@ -43,7 +44,7 @@ print_regs(const cpu_state &regs)
cons->puts("\n\n");
print_regL("rbp", regs.rbp);
print_regM("rsp", regs.user_rsp);
print_regR("sp0", bsp_cpu_data.rsp0);
print_regR("sp0", cpu.rsp0);
print_regL("rip", regs.rip);
print_regM("cr3", cr3);

View File

@@ -4,6 +4,8 @@
#include <stdint.h>
struct cpu_state;
extern "C" {
uintptr_t get_rsp();
uintptr_t get_rip();

View File

@@ -63,7 +63,7 @@ void irq4_callback(void *)
device_manager::device_manager() :
m_lapic(nullptr)
m_lapic_base(0)
{
m_irqs.ensure_capacity(32);
m_irqs.set_size(16);
@@ -106,6 +106,26 @@ device_manager::parse_acpi(const void *root_table)
load_xsdt(memory::to_virtual(acpi2->xsdt_address));
}
const device_manager::apic_nmi *
device_manager::get_lapic_nmi(uint8_t id) const
{
for (const auto &nmi : m_nmis) {
if (nmi.cpu == 0xff || nmi.cpu == id)
return &nmi;
}
return nullptr;
}
const device_manager::irq_override *
device_manager::get_irq_override(uint8_t irq) const
{
for (const auto &o : m_overrides)
if (o.source == irq) return &o;
return nullptr;
}
ioapic *
device_manager::get_ioapic(int i)
{
@@ -163,38 +183,38 @@ device_manager::load_apic(const acpi_table_header *header)
{
const auto *apic = check_get_table<acpi_apic>(header);
uintptr_t local = apic->local_address;
m_lapic = new lapic(local, isr::isrSpurious);
m_lapic_base = apic->local_address;
size_t count = acpi_table_entries(apic, 1);
uint8_t const *p = apic->controller_data;
uint8_t const *end = p + count;
// Pass one: count IOAPIC objcts
int num_ioapics = 0;
// Pass one: count objcts
unsigned num_lapics = 0;
unsigned num_ioapics = 0;
unsigned num_overrides = 0;
unsigned num_nmis = 0;
while (p < end) {
const uint8_t type = p[0];
const uint8_t length = p[1];
if (type == 1) num_ioapics++;
switch (type) {
case 0: ++num_lapics; break;
case 1: ++num_ioapics; break;
case 2: ++num_overrides; break;
case 4: ++num_nmis; break;
default: break;
}
p += length;
}
m_apic_ids.set_capacity(num_lapics);
m_ioapics.set_capacity(num_ioapics);
m_overrides.set_capacity(num_overrides);
m_nmis.set_capacity(num_nmis);
// Pass two: set up IOAPIC objcts
p = apic->controller_data;
while (p < end) {
const uint8_t type = p[0];
const uint8_t length = p[1];
if (type == 1) {
uintptr_t base = kutil::read_from<uint32_t>(p+4);
uint32_t base_gsr = kutil::read_from<uint32_t>(p+8);
m_ioapics.emplace(base, base_gsr);
}
p += length;
}
// Pass three: configure APIC objects
// Pass two: configure objects
p = apic->controller_data;
while (p < end) {
const uint8_t type = p[0];
@@ -204,38 +224,42 @@ device_manager::load_apic(const acpi_table_header *header)
case 0: { // Local APIC
uint8_t uid = kutil::read_from<uint8_t>(p+2);
uint8_t id = kutil::read_from<uint8_t>(p+3);
log::debug(logs::device, " Local APIC uid %x id %x", id);
m_apic_ids.append(id);
log::debug(logs::device, " Local APIC uid %x id %x", uid, id);
}
break;
case 1: // I/O APIC
case 1: { // I/O APIC
uintptr_t base = kutil::read_from<uint32_t>(p+4);
uint32_t base_gsi = kutil::read_from<uint32_t>(p+8);
m_ioapics.emplace(base, base_gsi);
log::debug(logs::device, " IO APIC gsi %x base %x", base_gsi, base);
}
break;
case 2: { // Interrupt source override
uint8_t source = kutil::read_from<uint8_t>(p+3);
isr gsi = isr::irq00 + kutil::read_from<uint32_t>(p+4);
uint16_t flags = kutil::read_from<uint16_t>(p+8);
irq_override o;
o.source = kutil::read_from<uint8_t>(p+3);
o.gsi = kutil::read_from<uint32_t>(p+4);
o.flags = kutil::read_from<uint16_t>(p+8);
m_overrides.append(o);
log::debug(logs::device, " Intr source override IRQ %d -> %d Pol %d Tri %d",
source, gsi, (flags & 0x3), ((flags >> 2) & 0x3));
// TODO: in a multiple-IOAPIC system this might be elsewhere
m_ioapics[0].redirect(source, static_cast<isr>(gsi), flags, true);
o.source, o.gsi, (o.flags & 0x3), ((o.flags >> 2) & 0x3));
}
break;
case 4: {// LAPIC NMI
uint8_t cpu = kutil::read_from<uint8_t>(p + 2);
uint8_t num = kutil::read_from<uint8_t>(p + 5);
uint16_t flags = kutil::read_from<uint16_t>(p + 3);
apic_nmi nmi;
nmi.cpu = kutil::read_from<uint8_t>(p + 2);
nmi.lint = kutil::read_from<uint8_t>(p + 5);
nmi.flags = kutil::read_from<uint16_t>(p + 3);
m_nmis.append(nmi);
log::debug(logs::device, " LAPIC NMI Proc %d LINT%d Pol %d Tri %d",
kutil::read_from<uint8_t>(p+2),
kutil::read_from<uint8_t>(p+5),
kutil::read_from<uint16_t>(p+3) & 0x3,
(kutil::read_from<uint16_t>(p+3) >> 2) & 0x3);
m_lapic->enable_lint(num, num == 0 ? isr::isrLINT0 : isr::isrLINT1, true, flags);
log::debug(logs::device, " LAPIC NMI Proc %02x LINT%d Pol %d Tri %d",
nmi.cpu, nmi.lint, nmi.flags & 0x3, (nmi.flags >> 2) & 0x3);
}
break;
@@ -245,17 +269,6 @@ device_manager::load_apic(const acpi_table_header *header)
p += length;
}
/*
for (uint8_t i = 0; i < m_ioapics[0].get_num_gsi(); ++i) {
switch (i) {
case 2: break;
default: m_ioapics[0].mask(i, false);
}
}
*/
m_lapic->enable();
}
void

View File

@@ -24,10 +24,6 @@ public:
/// \returns A reference to the system device manager
static device_manager & get() { return s_instance; }
/// Get the LAPIC
/// \returns An object representing the local APIC
lapic * get_lapic() { return m_lapic; }
/// Get an IOAPIC
/// \arg i Index of the requested IOAPIC
/// \returns An object representing the given IOAPIC if it exists,
@@ -68,6 +64,39 @@ public:
/// \returns True if the interrupt was handled
bool dispatch_irq(unsigned irq);
struct apic_nmi
{
uint8_t cpu;
uint8_t lint;
uint16_t flags;
};
struct irq_override
{
uint8_t source;
uint16_t flags;
uint32_t gsi;
};
/// Get the list of APIC ids for other CPUs
inline const kutil::vector<uint8_t> & get_apic_ids() const { return m_apic_ids; }
/// Get the LAPIC base address
/// \returns The physical base address of the local apic registers
uintptr_t get_lapic_base() const { return m_lapic_base; }
/// Get the NMI mapping for the given local APIC
/// \arg id ID of the local APIC
/// \returns apic_nmi structure describing the NMI configuration,
/// or null if no configuration was provided
const apic_nmi * get_lapic_nmi(uint8_t id) const;
/// Get the IRQ source override for the given IRQ
/// \arg irq IRQ number (not isr vector)
/// \returns irq_override structure describing that IRQ's
/// configuration, or null if no configuration was provided
const irq_override * get_irq_override(uint8_t irq) const;
/// Register the existance of a block device.
/// \arg blockdev Pointer to the block device
void register_block_device(block_device *blockdev);
@@ -119,9 +148,13 @@ private:
/// that has no callback.
void bad_irq(uint8_t irq);
lapic *m_lapic;
uintptr_t m_lapic_base;
kutil::vector<ioapic> m_ioapics;
kutil::vector<hpet> m_hpets;
kutil::vector<uint8_t> m_apic_ids;
kutil::vector<apic_nmi> m_nmis;
kutil::vector<irq_override> m_overrides;
kutil::vector<pci_group> m_pci;
kutil::vector<pci_device> m_devices;

View File

@@ -1,6 +1,6 @@
#include "kernel_memory.h"
#include "kutil/assert.h"
#include "kutil/memory.h"
#include "frame_allocator.h"
#include "kernel_args.h"
#include "kernel_memory.h"
@@ -17,22 +17,24 @@ frame_allocator::get()
}
frame_allocator::frame_allocator(kernel::args::frame_block *frames, size_t count) :
m_blocks(frames),
m_count(count)
m_blocks {frames},
m_count {count}
{
}
inline unsigned
bsf(uint64_t v)
{
asm ("tzcntq %q0, %q1" : "=r"(v) : "r"(v) : "cc");
asm ("tzcntq %q0, %q1" : "=r"(v) : "0"(v) : "cc");
return v;
}
size_t
frame_allocator::allocate(size_t count, uintptr_t *address)
{
for (long i = m_count - 1; i >= 0; ++i) {
kutil::scoped_lock lock {m_lock};
for (long i = m_count - 1; i >= 0; --i) {
frame_block &block = m_blocks[i];
if (!block.map1)
@@ -80,6 +82,8 @@ frame_allocator::allocate(size_t count, uintptr_t *address)
void
frame_allocator::free(uintptr_t address, size_t count)
{
kutil::scoped_lock lock {m_lock};
kassert(address % frame_size == 0, "Trying to free a non page-aligned frame!");
if (!count)
@@ -116,6 +120,8 @@ frame_allocator::free(uintptr_t address, size_t count)
void
frame_allocator::used(uintptr_t address, size_t count)
{
kutil::scoped_lock lock {m_lock};
kassert(address % frame_size == 0, "Trying to mark a non page-aligned frame!");
if (!count)

View File

@@ -3,6 +3,7 @@
/// Allocator for physical memory frames
#include <stdint.h>
#include "kutil/spinlock.h"
namespace kernel {
namespace args {
@@ -43,7 +44,9 @@ public:
private:
frame_block *m_blocks;
long m_count;
size_t m_count;
kutil::spinlock m_lock;
frame_allocator() = delete;
frame_allocator(const frame_allocator &) = delete;

View File

@@ -1,36 +1,80 @@
#include <stdint.h>
#include "kutil/assert.h"
#include "kutil/enum_bitfields.h"
#include "kutil/memory.h"
#include "kutil/no_construct.h"
#include "console.h"
#include "kernel_memory.h"
#include "cpu.h"
#include "gdt.h"
#include "log.h"
#include "tss.h"
extern "C" void gdt_write(const void *gdt_ptr, uint16_t cs, uint16_t ds, uint16_t tr);
static constexpr uint8_t kern_cs_index = 1;
static constexpr uint8_t kern_ss_index = 2;
static constexpr uint8_t user_cs32_index = 3;
static constexpr uint8_t user_ss_index = 4;
static constexpr uint8_t user_cs64_index = 5;
static constexpr uint8_t tss_index = 6; // Note that this takes TWO GDT entries
// The BSP's GDT is initialized _before_ global constructors are called,
// so we don't want it to have a global constructor, lest it overwrite
// the previous initialization.
static kutil::no_construct<GDT> __g_bsp_gdt_storage;
GDT &g_bsp_gdt = __g_bsp_gdt_storage.value;
enum class gdt_type : uint8_t
GDT::GDT(TSS *tss) :
m_tss(tss)
{
accessed = 0x01,
read_write = 0x02,
conforming = 0x04,
execute = 0x08,
system = 0x10,
ring1 = 0x20,
ring2 = 0x40,
ring3 = 0x60,
present = 0x80
};
IS_BITFIELD(gdt_type);
kutil::memset(this, 0, sizeof(GDT));
struct gdt_descriptor
m_ptr.limit = sizeof(m_entries) - 1;
m_ptr.base = &m_entries[0];
// Kernel CS/SS - always 64bit
set(kern_cs_index, 0, 0xfffff, true, gdt_type::read_write | gdt_type::execute);
set(kern_ss_index, 0, 0xfffff, true, gdt_type::read_write);
// User CS32/SS/CS64 - layout expected by SYSRET
set(user_cs32_index, 0, 0xfffff, false, gdt_type::ring3 | gdt_type::read_write | gdt_type::execute);
set(user_ss_index, 0, 0xfffff, true, gdt_type::ring3 | gdt_type::read_write);
set(user_cs64_index, 0, 0xfffff, true, gdt_type::ring3 | gdt_type::read_write | gdt_type::execute);
set_tss(tss);
}
GDT &
GDT::current()
{
uint16_t limit_low;
uint16_t base_low;
uint8_t base_mid;
gdt_type type;
uint8_t size;
uint8_t base_high;
} __attribute__ ((packed));
cpu_data &cpu = current_cpu();
return *cpu.gdt;
}
void
GDT::install() const
{
gdt_write(
static_cast<const void*>(&m_ptr),
kern_cs_index << 3,
kern_ss_index << 3,
tss_index << 3);
}
void
GDT::set(uint8_t i, uint32_t base, uint64_t limit, bool is64, gdt_type type)
{
m_entries[i].limit_low = limit & 0xffff;
m_entries[i].size = (limit >> 16) & 0xf;
m_entries[i].size |= (is64 ? 0xa0 : 0xc0);
m_entries[i].base_low = base & 0xffff;
m_entries[i].base_mid = (base >> 16) & 0xff;
m_entries[i].base_high = (base >> 24) & 0xff;
m_entries[i].type = type | gdt_type::system | gdt_type::present;
}
struct tss_descriptor
{
@@ -44,72 +88,16 @@ struct tss_descriptor
uint32_t reserved;
} __attribute__ ((packed));
struct tss_entry
{
uint32_t reserved0;
uint64_t rsp[3]; // stack pointers for CPL 0-2
uint64_t ist[8]; // ist[0] is reserved
uint64_t reserved1;
uint16_t reserved2;
uint16_t iomap_offset;
} __attribute__ ((packed));
struct idt_descriptor
{
uint16_t base_low;
uint16_t selector;
uint8_t ist;
uint8_t flags;
uint16_t base_mid;
uint32_t base_high;
uint32_t reserved; // must be zero
} __attribute__ ((packed));
struct table_ptr
{
uint16_t limit;
uint64_t base;
} __attribute__ ((packed));
gdt_descriptor g_gdt_table[10];
idt_descriptor g_idt_table[256];
table_ptr g_gdtr;
table_ptr g_idtr;
tss_entry g_tss;
extern "C" {
void idt_write();
void idt_load();
void gdt_write(uint16_t cs, uint16_t ds, uint16_t tr);
void gdt_load();
}
void
gdt_set_entry(uint8_t i, uint32_t base, uint64_t limit, bool is64, gdt_type type)
{
g_gdt_table[i].limit_low = limit & 0xffff;
g_gdt_table[i].size = (limit >> 16) & 0xf;
g_gdt_table[i].size |= (is64 ? 0xa0 : 0xc0);
g_gdt_table[i].base_low = base & 0xffff;
g_gdt_table[i].base_mid = (base >> 16) & 0xff;
g_gdt_table[i].base_high = (base >> 24) & 0xff;
g_gdt_table[i].type = type | gdt_type::system | gdt_type::present;
}
void
tss_set_entry(uint8_t i, uint64_t base, uint64_t limit)
GDT::set_tss(TSS *tss)
{
tss_descriptor tssd;
size_t limit = sizeof(TSS);
tssd.limit_low = limit & 0xffff;
tssd.size = (limit >> 16) & 0xf;
uintptr_t base = reinterpret_cast<uintptr_t>(tss);
tssd.base_00 = base & 0xffff;
tssd.base_16 = (base >> 16) & 0xff;
tssd.base_24 = (base >> 24) & 0xff;
@@ -121,123 +109,26 @@ tss_set_entry(uint8_t i, uint64_t base, uint64_t limit)
gdt_type::execute |
gdt_type::ring3 |
gdt_type::present;
kutil::memcpy(&g_gdt_table[i], &tssd, sizeof(tss_descriptor));
kutil::memcpy(&m_entries[tss_index], &tssd, sizeof(tss_descriptor));
}
void
idt_set_entry(uint8_t i, uint64_t addr, uint16_t selector, uint8_t flags)
GDT::dump(unsigned index) const
{
g_idt_table[i].base_low = addr & 0xffff;
g_idt_table[i].base_mid = (addr >> 16) & 0xffff;
g_idt_table[i].base_high = (addr >> 32) & 0xffffffff;
g_idt_table[i].selector = selector;
g_idt_table[i].flags = flags;
g_idt_table[i].ist = 0;
g_idt_table[i].reserved = 0;
}
void
tss_set_stack(unsigned ring, uintptr_t rsp)
{
kassert(ring < 3, "Bad ring passed to tss_set_stack.");
g_tss.rsp[ring] = rsp;
}
uintptr_t
tss_get_stack(unsigned ring)
{
kassert(ring < 3, "Bad ring passed to tss_get_stack.");
return g_tss.rsp[ring];
}
void
idt_set_ist(unsigned i, unsigned ist)
{
g_idt_table[i].ist = ist;
}
void
tss_set_ist(unsigned ist, uintptr_t rsp)
{
kassert(ist > 0 && ist < 7, "Bad ist passed to tss_set_ist.");
g_tss.ist[ist] = rsp;
}
void
ist_increment(unsigned i)
{
uint8_t ist = g_idt_table[i].ist;
if (ist)
g_tss.ist[ist] += memory::frame_size;
}
void
ist_decrement(unsigned i)
{
uint8_t ist = g_idt_table[i].ist;
if (ist)
g_tss.ist[ist] -= memory::frame_size;
}
uintptr_t
tss_get_ist(unsigned ist)
{
kassert(ist > 0 && ist < 7, "Bad ist passed to tss_get_ist.");
return g_tss.ist[ist];
}
void
gdt_init()
{
kutil::memset(&g_gdt_table, 0, sizeof(g_gdt_table));
kutil::memset(&g_idt_table, 0, sizeof(g_idt_table));
g_gdtr.limit = sizeof(g_gdt_table) - 1;
g_gdtr.base = reinterpret_cast<uint64_t>(&g_gdt_table);
// Kernel CS/SS - always 64bit
gdt_set_entry(1, 0, 0xfffff, true, gdt_type::read_write | gdt_type::execute);
gdt_set_entry(2, 0, 0xfffff, true, gdt_type::read_write);
// User CS32/SS/CS64 - layout expected by SYSRET
gdt_set_entry(3, 0, 0xfffff, false, gdt_type::ring3 | gdt_type::read_write | gdt_type::execute);
gdt_set_entry(4, 0, 0xfffff, true, gdt_type::ring3 | gdt_type::read_write);
gdt_set_entry(5, 0, 0xfffff, true, gdt_type::ring3 | gdt_type::read_write | gdt_type::execute);
kutil::memset(&g_tss, 0, sizeof(tss_entry));
g_tss.iomap_offset = sizeof(tss_entry);
uintptr_t tss_base = reinterpret_cast<uintptr_t>(&g_tss);
// Note that this takes TWO GDT entries
tss_set_entry(6, tss_base, sizeof(tss_entry));
gdt_write(1 << 3, 2 << 3, 6 << 3);
g_idtr.limit = sizeof(g_idt_table) - 1;
g_idtr.base = reinterpret_cast<uint64_t>(&g_idt_table);
idt_write();
}
void
gdt_dump(unsigned index)
{
const table_ptr &table = g_gdtr;
console *cons = console::get();
unsigned start = 0;
unsigned count = (table.limit + 1) / sizeof(gdt_descriptor);
unsigned count = (m_ptr.limit + 1) / sizeof(descriptor);
if (index != -1) {
start = index;
count = 1;
} else {
cons->printf(" GDT: loc:%lx size:%d\n", table.base, table.limit+1);
cons->printf(" GDT: loc:%lx size:%d\n", m_ptr.base, m_ptr.limit+1);
}
const gdt_descriptor *gdt =
reinterpret_cast<const gdt_descriptor *>(table.base);
const descriptor *gdt =
reinterpret_cast<const descriptor *>(m_ptr.base);
for (int i = start; i < start+count; ++i) {
uint32_t base =
@@ -275,51 +166,3 @@ gdt_dump(unsigned index)
(gdt[i].size & 0x60) == 0x40 ? "32" : "16");
}
}
void
idt_dump(unsigned index)
{
const table_ptr &table = g_idtr;
unsigned start = 0;
unsigned count = (table.limit + 1) / sizeof(idt_descriptor);
if (index != -1) {
start = index;
count = 1;
log::info(logs::boot, "IDT FOR INDEX %02x", index);
} else {
log::info(logs::boot, "Loaded IDT at: %lx size: %d bytes", table.base, table.limit+1);
}
const idt_descriptor *idt =
reinterpret_cast<const idt_descriptor *>(table.base);
for (int i = start; i < start+count; ++i) {
uint64_t base =
(static_cast<uint64_t>(idt[i].base_high) << 32) |
(static_cast<uint64_t>(idt[i].base_mid) << 16) |
idt[i].base_low;
char const *type;
switch (idt[i].flags & 0xf) {
case 0x5: type = " 32tsk "; break;
case 0x6: type = " 16int "; break;
case 0x7: type = " 16trp "; break;
case 0xe: type = " 32int "; break;
case 0xf: type = " 32trp "; break;
default: type = " ????? "; break;
}
if (idt[i].flags & 0x80) {
log::debug(logs::boot,
" Entry %3d: Base:%lx Sel(rpl %d, ti %d, %3d) IST:%d %s DPL:%d", i, base,
(idt[i].selector & 0x3),
((idt[i].selector & 0x4) >> 2),
(idt[i].selector >> 3),
idt[i].ist,
type,
((idt[i].flags >> 5) & 0x3));
}
}
}

View File

@@ -1,58 +1,66 @@
#pragma once
/// \file gdt.h
/// Definitions relating to system descriptor tables: GDT, IDT, TSS
/// Definitions relating to a CPU's GDT table
#include <stdint.h>
/// Set up the GDT and TSS, and switch segment registers to point
/// to them.
void gdt_init();
#include "kutil/enum_bitfields.h"
/// Set an entry in the IDT
/// \arg i Index in the IDT (vector of the interrupt this handles)
/// \arg addr Address of the handler
/// \arg selector GDT selector to set when invoking this handler
/// \arg flags Descriptor flags to set
void idt_set_entry(uint8_t i, uint64_t addr, uint16_t selector, uint8_t flags);
class TSS;
/// Set the stack pointer for a given ring in the TSS
/// \arg ring Ring to set for (0-2)
/// \arg rsp Stack pointer to set
void tss_set_stack(unsigned ring, uintptr_t rsp);
enum class gdt_type : uint8_t
{
accessed = 0x01,
read_write = 0x02,
conforming = 0x04,
execute = 0x08,
system = 0x10,
ring1 = 0x20,
ring2 = 0x40,
ring3 = 0x60,
present = 0x80
};
IS_BITFIELD(gdt_type);
/// Get the stack pointer for a given ring in the TSS
/// \arg ring Ring to get (0-2)
/// \returns Stack pointers for that ring
uintptr_t tss_get_stack(unsigned ring);
class GDT
{
public:
GDT(TSS *tss);
/// Set the given IDT entry to use the given IST entry
/// \arg i Which IDT entry to set
/// \arg ist Which IST entry to set (1-7)
void idt_set_ist(unsigned i, unsigned ist);
/// Get the currently running CPU's GDT
static GDT & current();
/// Set the stack pointer for a given IST in the TSS
/// \arg ist Which IST entry to set (1-7)
/// \arg rsp Stack pointer to set
void tss_set_ist(unsigned ist, uintptr_t rsp);
/// Install this GDT to the current CPU
void install() const;
/// Increment the stack pointer for the given vector,
/// if it's using an IST entry
/// \arg i Which IDT entry to use
void ist_increment(unsigned i);
/// Get the addrss of the pointer
inline const void * pointer() const { return static_cast<const void*>(&m_ptr); }
/// Decrement the stack pointer for the given vector,
/// if it's using an IST entry
/// \arg i Which IDT entry to use
void ist_decrement(unsigned i);
/// Get the stack pointer for a given IST in the TSS
/// \arg ring Which IST entry to get (1-7)
/// \returns Stack pointers for that IST entry
uintptr_t tss_get_ist(unsigned ist);
/// Dump information about the current GDT to the screen
/// Dump debug information about the GDT to the console.
/// \arg index Which entry to print, or -1 for all entries
void gdt_dump(unsigned index = -1);
void dump(unsigned index = -1) const;
/// Dump information about the current IDT to the screen
/// \arg index Which entry to print, or -1 for all entries
void idt_dump(unsigned index = -1);
private:
void set(uint8_t i, uint32_t base, uint64_t limit, bool is64, gdt_type type);
void set_tss(TSS *tss);
struct descriptor
{
uint16_t limit_low;
uint16_t base_low;
uint8_t base_mid;
gdt_type type;
uint8_t size;
uint8_t base_high;
} __attribute__ ((packed, align(8)));
struct ptr
{
uint16_t limit;
descriptor *base;
} __attribute__ ((packed, align(4)));
descriptor m_entries[8];
TSS *m_tss;
ptr m_ptr;
};

View File

@@ -1,35 +0,0 @@
extern g_idtr
extern g_gdtr
global idt_write
idt_write:
lidt [rel g_idtr]
ret
global idt_load
idt_load:
sidt [rel g_idtr]
ret
global gdt_write
gdt_write:
lgdt [rel g_gdtr]
mov ax, si ; second arg is data segment
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
push qword rdi ; first arg is code segment
lea rax, [rel .next]
push rax
o64 retf
.next:
ltr dx ; third arg is the TSS
ret
global gdt_load
gdt_load:
sgdt [rel g_gdtr]
ret

35
src/kernel/gdtidt.s Normal file
View File

@@ -0,0 +1,35 @@
global idt_write
idt_write:
lidt [rdi] ; first arg is the IDT pointer location
ret
global idt_load
idt_load:
sidt [rdi] ; first arg is where to write the idtr value
ret
global gdt_write
gdt_write:
lgdt [rdi] ; first arg is the GDT pointer location
mov ax, dx ; third arg is data segment
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
push qword rsi ; second arg is code segment
lea rax, [rel .next]
push rax
o64 retf
.next:
ltr cx ; fourth arg is the TSS
ret
global gdt_load
gdt_load:
sgdt [rdi] ; first arg is where to write the gdtr value
ret

137
src/kernel/idt.cpp Normal file
View File

@@ -0,0 +1,137 @@
#include "kutil/memory.h"
#include "kutil/no_construct.h"
#include "idt.h"
#include "log.h"
extern "C" {
void idt_write(const void *idt_ptr);
#define ISR(i, s, name) extern void name ();
#define EISR(i, s, name) extern void name ();
#define IRQ(i, q, name) extern void name ();
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
}
// The IDT is initialized _before_ global constructors are called,
// so we don't want it to have a global constructor, lest it overwrite
// the previous initialization.
static kutil::no_construct<IDT> __g_idt_storage;
IDT &g_idt = __g_idt_storage.value;
IDT::IDT()
{
kutil::memset(this, 0, sizeof(IDT));
m_ptr.limit = sizeof(m_entries) - 1;
m_ptr.base = &m_entries[0];
#define ISR(i, s, name) set(i, & name, 0x08, 0x8e);
#define EISR(i, s, name) set(i, & name, 0x08, 0x8e);
#define IRQ(i, q, name) set(i, & name, 0x08, 0x8e);
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
}
IDT &
IDT::get()
{
return g_idt;
}
void
IDT::install() const
{
idt_write(static_cast<const void*>(&m_ptr));
}
void
IDT::add_ist_entries()
{
#define ISR(i, s, name) if (s) { set_ist(i, s); }
#define EISR(i, s, name) if (s) { set_ist(i, s); }
#define IRQ(i, q, name)
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
}
uint8_t
IDT::used_ist_entries() const
{
uint8_t entries = 0;
#define ISR(i, s, name) if (s) { entries |= (1 << s); }
#define EISR(i, s, name) if (s) { entries |= (1 << s); }
#define IRQ(i, q, name)
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
return entries;
}
void
IDT::set(uint8_t i, void (*handler)(), uint16_t selector, uint8_t flags)
{
uintptr_t addr = reinterpret_cast<uintptr_t>(handler);
m_entries[i].base_low = addr & 0xffff;
m_entries[i].base_mid = (addr >> 16) & 0xffff;
m_entries[i].base_high = (addr >> 32) & 0xffffffff;
m_entries[i].selector = selector;
m_entries[i].flags = flags;
m_entries[i].ist = 0;
m_entries[i].reserved = 0;
}
void
IDT::dump(unsigned index) const
{
unsigned start = 0;
unsigned count = (m_ptr.limit + 1) / sizeof(descriptor);
if (index != -1) {
start = index;
count = 1;
log::info(logs::boot, "IDT FOR INDEX %02x", index);
} else {
log::info(logs::boot, "Loaded IDT at: %lx size: %d bytes", m_ptr.base, m_ptr.limit+1);
}
const descriptor *idt =
reinterpret_cast<const descriptor *>(m_ptr.base);
for (int i = start; i < start+count; ++i) {
uint64_t base =
(static_cast<uint64_t>(idt[i].base_high) << 32) |
(static_cast<uint64_t>(idt[i].base_mid) << 16) |
idt[i].base_low;
char const *type;
switch (idt[i].flags & 0xf) {
case 0x5: type = " 32tsk "; break;
case 0x6: type = " 16int "; break;
case 0x7: type = " 16trp "; break;
case 0xe: type = " 32int "; break;
case 0xf: type = " 32trp "; break;
default: type = " ????? "; break;
}
if (idt[i].flags & 0x80) {
log::debug(logs::boot,
" Entry %3d: Base:%lx Sel(rpl %d, ti %d, %3d) IST:%d %s DPL:%d", i, base,
(idt[i].selector & 0x3),
((idt[i].selector & 0x4) >> 2),
(idt[i].selector >> 3),
idt[i].ist,
type,
((idt[i].flags >> 5) & 0x3));
}
}
}

63
src/kernel/idt.h Normal file
View File

@@ -0,0 +1,63 @@
#pragma once
/// \file idt.h
/// Definitions relating to a CPU's IDT table
#include <stdint.h>
class IDT
{
public:
IDT();
/// Install this IDT to the current CPU
void install() const;
/// Add the IST entries listed in the ISR table into the IDT.
/// This can't be done until after memory is set up so the
/// stacks can be created.
void add_ist_entries();
/// Get the IST entry used by an entry.
/// \arg i Which IDT entry to look in
/// \returns The IST index used by entry i, or 0 for none
inline uint8_t get_ist(uint8_t i) const {
return m_entries[i].ist;
}
/// Set the IST entry used by an entry.
/// \arg i Which IDT entry to set
/// \arg ist The IST index for entry i, or 0 for none
void set_ist(uint8_t i, uint8_t ist) { m_entries[i].ist = ist; }
/// Get the IST entries that are used by this table, as a bitmap
uint8_t used_ist_entries() const;
/// Dump debug information about the IDT to the console.
/// \arg index Which entry to print, or -1 for all entries
void dump(unsigned index = -1) const;
/// Get the global IDT
static IDT & get();
private:
void set(uint8_t i, void (*handler)(), uint16_t selector, uint8_t flags);
struct descriptor
{
uint16_t base_low;
uint16_t selector;
uint8_t ist;
uint8_t flags;
uint16_t base_mid;
uint32_t base_high;
uint32_t reserved; // must be zero
} __attribute__ ((packed, aligned(16)));
struct ptr
{
uint16_t limit;
descriptor *base;
} __attribute__ ((packed, aligned(4)));
descriptor m_entries[256];
ptr m_ptr;
};

View File

@@ -240,6 +240,7 @@ IRQ (0xdf, 0xbf, irqBF)
ISR (0xe0, 0, isrTimer)
ISR (0xe1, 0, isrLINT0)
ISR (0xe2, 0, isrLINT1)
ISR (0xe3, 0, isrAPICError)
ISR (0xe4, 0, isrAssert)
ISR (0xef, 0, isrSpurious)

View File

@@ -8,6 +8,7 @@
#include "debug.h"
#include "device_manager.h"
#include "gdt.h"
#include "idt.h"
#include "interrupts.h"
#include "io.h"
#include "kernel_memory.h"
@@ -15,6 +16,7 @@
#include "objects/process.h"
#include "scheduler.h"
#include "syscall.h"
#include "tss.h"
#include "vm_space.h"
static const uint16_t PIC1 = 0x20;
@@ -22,19 +24,14 @@ static const uint16_t PIC2 = 0xa0;
constexpr uintptr_t apic_eoi_addr = 0xfee000b0 + ::memory::page_offset;
constexpr size_t increment_offset = 0x1000;
extern "C" {
void _halt();
void isr_handler(cpu_state*);
void irq_handler(cpu_state*);
#define ISR(i, s, name) extern void name ();
#define EISR(i, s, name) extern void name ();
#define IRQ(i, q, name) extern void name ();
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
}
isr
@@ -60,7 +57,7 @@ get_irq(unsigned vector)
}
}
static void
void
disable_legacy_pic()
{
// Mask all interrupts
@@ -80,28 +77,17 @@ disable_legacy_pic()
outb(PIC2+1, 0x02); io_wait();
}
void
interrupts_init()
{
#define ISR(i, s, name) idt_set_entry(i, reinterpret_cast<uint64_t>(& name), 0x08, 0x8e);
#define EISR(i, s, name) idt_set_entry(i, reinterpret_cast<uint64_t>(& name), 0x08, 0x8e);
#define IRQ(i, q, name) idt_set_entry(i, reinterpret_cast<uint64_t>(& name), 0x08, 0x8e);
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
disable_legacy_pic();
log::info(logs::boot, "Interrupts enabled.");
}
void
isr_handler(cpu_state *regs)
{
console *cons = console::get();
uint8_t vector = regs->interrupt & 0xff;
ist_decrement(vector);
// Clear out the IST for this vector so we just keep using
// this stack
uint8_t old_ist = IDT::get().get_ist(vector);
if (old_ist)
IDT::get().set_ist(vector, 0);
switch (static_cast<isr>(vector)) {
@@ -137,6 +123,16 @@ isr_handler(cpu_state *regs)
}
break;
case isr::isrDoubleFault:
cons->set_color(9);
cons->printf("\nDouble Fault:\n");
cons->set_color();
print_regs(*regs);
print_stacktrace(2);
_halt();
break;
case isr::isrGPFault: {
cons->set_color(9);
cons->puts("\nGeneral Protection Fault:\n");
@@ -150,13 +146,13 @@ isr_handler(cpu_state *regs)
switch ((regs->errorcode & 0x07) >> 1) {
case 0:
cons->printf(" GDT[%x]\n", index);
gdt_dump(index);
GDT::current().dump(index);
break;
case 1:
case 3:
cons->printf(" IDT[%x]\n", index);
idt_dump(index);
IDT::get().dump(index);
break;
default:
@@ -275,7 +271,10 @@ isr_handler(cpu_state *regs)
print_stacktrace(2);
_halt();
}
ist_increment(vector);
// Return the IST for this vector to what it was
if (old_ist)
IDT::get().set_ist(vector, old_ist);
*reinterpret_cast<uint32_t *>(apic_eoi_addr) = 0;
}

View File

@@ -29,6 +29,5 @@ extern "C" {
void interrupts_disable();
}
/// Fill the IDT with our ISRs, and disable the legacy
/// PIC interrupts.
void interrupts_init();
/// Disable the legacy PIC
void disable_legacy_pic();

View File

@@ -1,8 +1,14 @@
%include "push_all.inc"
section .text
extern isr_handler
global isr_handler_prelude
global isr_handler_prelude:function (isr_handler_prelude.end - isr_handler_prelude)
isr_handler_prelude:
push rbp ; Never executed, fake function prelude
mov rbp, rsp ; to calm down gdb
.real:
push_all
check_swap_gs
@@ -10,10 +16,15 @@ isr_handler_prelude:
mov rsi, rsp
call isr_handler
jmp isr_handler_return
.end:
extern irq_handler
global irq_handler_prelude
global irq_handler_prelude:function (irq_handler_prelude.end - irq_handler_prelude)
irq_handler_prelude:
push rbp ; Never executed, fake function prelude
mov rbp, rsp ; to calm down gdb
.real:
push_all
check_swap_gs
@@ -21,36 +32,41 @@ irq_handler_prelude:
mov rsi, rsp
call irq_handler
; fall through to isr_handler_return
.end:
global isr_handler_return
global isr_handler_return:function (isr_handler_return.end - isr_handler_return)
isr_handler_return:
check_swap_gs
pop_all
add rsp, 16 ; because the ISRs added err/num
iretq
.end:
%macro EMIT_ISR 2
global %1
global %1:function (%1.end - %1)
%1:
push 0
push %2
jmp isr_handler_prelude
jmp isr_handler_prelude.real
.end:
%endmacro
%macro EMIT_EISR 2
global %1
global %1:function (%1.end - %1)
%1:
push %2
jmp isr_handler_prelude
jmp isr_handler_prelude.real
.end:
%endmacro
%macro EMIT_IRQ 2
global %1
global %1:function (%1.end - %1)
%1:
push 0
push %2
jmp irq_handler_prelude
jmp irq_handler_prelude.real
.end:
%endmacro
%define EISR(i, s, name) EMIT_EISR name, i ; ISR with error code

View File

@@ -6,22 +6,28 @@
#include "kutil/assert.h"
#include "apic.h"
#include "block_device.h"
#include "clock.h"
#include "console.h"
#include "cpu.h"
#include "device_manager.h"
#include "gdt.h"
#include "idt.h"
#include "interrupts.h"
#include "io.h"
#include "kernel_args.h"
#include "kernel_memory.h"
#include "log.h"
#include "msr.h"
#include "objects/channel.h"
#include "objects/event.h"
#include "objects/thread.h"
#include "objects/vm_area.h"
#include "scheduler.h"
#include "serial.h"
#include "symbol_table.h"
#include "syscall.h"
#include "tss.h"
#include "vm_space.h"
#ifndef GIT_VERSION
#define GIT_VERSION
@@ -31,18 +37,26 @@ extern "C" {
void kernel_main(kernel::args::header *header);
void (*__ctors)(void);
void (*__ctors_end)(void);
void long_ap_startup(cpu_data *cpu);
void ap_startup();
void ap_idle();
void init_ap_trampoline(void*, cpu_data *, void (*)());
}
extern void __kernel_assert(const char *, unsigned, const char *);
using namespace kernel;
volatile size_t ap_startup_count;
static bool scheduler_ready = false;
/// Bootstrap the memory managers.
void setup_pat();
void memory_initialize_pre_ctors(args::header &kargs);
void memory_initialize_post_ctors(args::header &kargs);
process * load_simple_process(args::program &program);
unsigned start_aps(lapic &apic, const kutil::vector<uint8_t> &ids, void *kpml4);
/// TODO: not this. this is awful.
args::framebuffer *fb = nullptr;
@@ -77,12 +91,23 @@ kernel_main(args::header *header)
logger_init();
cpu_validate();
setup_pat();
log::debug(logs::boot, " jsix header is at: %016lx", header);
log::debug(logs::boot, " Memory map is at: %016lx", header->mem_map);
log::debug(logs::boot, "ACPI root table is at: %016lx", header->acpi_table);
log::debug(logs::boot, "Runtime service is at: %016lx", header->runtime_services);
log::debug(logs::boot, " Kernel PML4 is at: %016lx", header->pml4);
uint64_t cr0, cr4;
asm ("mov %%cr0, %0" : "=r"(cr0));
asm ("mov %%cr4, %0" : "=r"(cr4));
uint64_t efer = rdmsr(msr::ia32_efer);
log::debug(logs::boot, "Control regs: cr0:%lx cr4:%lx efer:%lx", cr0, cr4, efer);
bool has_video = false;
if (header->video.size > 0) {
has_video = true;
fb = memory::to_virtual<args::framebuffer>(reinterpret_cast<uintptr_t>(&header->video));
fb = &header->video;
const args::framebuffer &video = header->video;
log::debug(logs::boot, "Framebuffer: %dx%d[%d] type %d @ %llx size %llx",
@@ -95,20 +120,37 @@ kernel_main(args::header *header)
logger_clear_immediate();
}
gdt_init();
interrupts_init();
extern IDT &g_idt;
extern TSS &g_bsp_tss;
extern GDT &g_bsp_gdt;
extern cpu_data g_bsp_cpu_data;
extern uintptr_t idle_stack_end;
IDT *idt = new (&g_idt) IDT;
cpu_data *cpu = &g_bsp_cpu_data;
kutil::memset(cpu, 0, sizeof(cpu_data));
cpu->self = cpu;
cpu->tss = new (&g_bsp_tss) TSS;
cpu->gdt = new (&g_bsp_gdt) GDT {cpu->tss};
cpu->rsp0 = idle_stack_end;
cpu_early_init(cpu);
disable_legacy_pic();
memory_initialize_pre_ctors(*header);
run_constructors();
memory_initialize_post_ctors(*header);
cpu->tss->create_ist_stacks(idt->used_ist_entries());
for (size_t i = 0; i < header->num_modules; ++i) {
args::module &mod = header->modules[i];
void *virt = memory::to_virtual<void>(mod.location);
switch (mod.type) {
case args::mod_type::symbol_table:
new symbol_table {virt, mod.size};
new symbol_table {mod.location, mod.size};
break;
default:
@@ -116,16 +158,29 @@ kernel_main(args::header *header)
}
}
log::debug(logs::boot, " jsix header is at: %016lx", header);
log::debug(logs::boot, " Memory map is at: %016lx", header->mem_map);
log::debug(logs::boot, "ACPI root table is at: %016lx", header->acpi_table);
log::debug(logs::boot, "Runtime service is at: %016lx", header->runtime_services);
syscall_initialize();
device_manager &devices = device_manager::get();
devices.parse_acpi(header->acpi_table);
// Need the local APIC to get the BSP's id
uintptr_t apic_base = devices.get_lapic_base();
lapic *apic = new lapic(apic_base);
apic->enable();
cpu->id = apic->get_id();
cpu->apic = apic;
cpu_init(cpu, true);
devices.init_drivers();
devices.get_lapic()->calibrate_timer();
apic->calibrate_timer();
const auto &apic_ids = devices.get_apic_ids();
unsigned num_cpus = start_aps(*apic, apic_ids, header->pml4);
idt->add_ist_entries();
interrupts_enable();
/*
@@ -152,8 +207,8 @@ kernel_main(args::header *header)
}
*/
syscall_enable();
scheduler *sched = new scheduler(devices.get_lapic());
scheduler *sched = new scheduler {num_cpus};
scheduler_ready = true;
// Skip program 0, which is the kernel itself
for (unsigned i = 1; i < header->num_programs; ++i)
@@ -164,3 +219,126 @@ kernel_main(args::header *header)
sched->start();
}
unsigned
start_aps(lapic &apic, const kutil::vector<uint8_t> &ids, void *kpml4)
{
using memory::frame_size;
using memory::kernel_stack_pages;
extern size_t ap_startup_code_size;
extern process &g_kernel_process;
extern vm_area_guarded &g_kernel_stacks;
clock &clk = clock::get();
ap_startup_count = 1; // BSP processor
log::info(logs::boot, "Starting %d other CPUs", ids.count() - 1);
// Since we're using address space outside kernel space, make sure
// the kernel's vm_space is used
cpu_data &bsp = current_cpu();
bsp.process = &g_kernel_process;
uint16_t index = bsp.index;
// Copy the startup code somwhere the real mode trampoline can run
uintptr_t addr = 0x8000; // TODO: find a valid address, rewrite addresses
uint8_t vector = addr >> 12;
vm_area *vma = new vm_area_fixed(addr, 0x1000, vm_flags::write);
vm_space::kernel_space().add(addr, vma);
kutil::memcpy(
reinterpret_cast<void*>(addr),
reinterpret_cast<void*>(&ap_startup),
ap_startup_code_size);
// AP idle stacks need less room than normal stacks, so pack multiple
// into a normal stack area
static constexpr size_t idle_stack_bytes = 2048; // 2KiB is generous
static constexpr size_t full_stack_bytes = kernel_stack_pages * frame_size;
static constexpr size_t idle_stacks_per = full_stack_bytes / idle_stack_bytes;
uint8_t ist_entries = IDT::get().used_ist_entries();
size_t free_stack_count = 0;
uintptr_t stack_area_start = 0;
ipi mode = ipi::init | ipi::level | ipi::assert;
apic.send_ipi_broadcast(mode, false, 0);
for (uint8_t id : ids) {
if (id == bsp.id) continue;
// Set up the CPU data structures
TSS *tss = new TSS;
GDT *gdt = new GDT {tss};
cpu_data *cpu = new cpu_data;
kutil::memset(cpu, 0, sizeof(cpu_data));
cpu->self = cpu;
cpu->id = id;
cpu->index = ++index;
cpu->gdt = gdt;
cpu->tss = tss;
tss->create_ist_stacks(ist_entries);
// Set up the CPU's idle task stack
if (free_stack_count == 0) {
stack_area_start = g_kernel_stacks.get_section();
free_stack_count = idle_stacks_per;
}
uintptr_t stack_end = stack_area_start + free_stack_count-- * idle_stack_bytes;
stack_end -= 2 * sizeof(void*); // Null frame
*reinterpret_cast<uint64_t*>(stack_end) = 0; // pre-fault the page
cpu->rsp0 = stack_end;
// Set up the trampoline with this CPU's data
init_ap_trampoline(kpml4, cpu, ap_idle);
// Kick it off!
size_t current_count = ap_startup_count;
log::debug(logs::boot, "Starting AP %d: stack %llx", cpu->index, stack_end);
ipi startup = ipi::startup | ipi::assert;
apic.send_ipi(startup, vector, id);
for (unsigned i = 0; i < 20; ++i) {
if (ap_startup_count > current_count) break;
clk.spinwait(20);
}
// If the CPU already incremented ap_startup_count, it's done
if (ap_startup_count > current_count)
continue;
// Send the second SIPI (intel recommends this)
apic.send_ipi(startup, vector, id);
for (unsigned i = 0; i < 100; ++i) {
if (ap_startup_count > current_count) break;
clk.spinwait(100);
}
log::warn(logs::boot, "No response from AP %d within timeout", id);
}
log::info(logs::boot, "%d CPUs running", ap_startup_count);
vm_space::kernel_space().remove(vma);
return ap_startup_count;
}
void
long_ap_startup(cpu_data *cpu)
{
cpu_init(cpu, false);
++ap_startup_count;
while (!scheduler_ready) asm ("pause");
uintptr_t apic_base =
device_manager::get().get_lapic_base();
cpu->apic = new lapic(apic_base);
cpu->apic->enable();
scheduler::get().start();
}

View File

@@ -39,11 +39,8 @@ frame_allocator &g_frame_allocator = __g_frame_allocator_storage.value;
static kutil::no_construct<vm_area_untracked> __g_kernel_heap_area_storage;
vm_area_untracked &g_kernel_heap_area = __g_kernel_heap_area_storage.value;
vm_area_guarded g_kernel_stacks {
memory::stacks_start,
memory::kernel_stack_pages,
memory::kernel_max_stacks,
vm_flags::write};
static kutil::no_construct<vm_area_guarded> __g_kernel_stacks_storage;
vm_area_guarded &g_kernel_stacks = __g_kernel_stacks_storage.value;
vm_area_guarded g_kernel_buffers {
memory::buffers_start,
@@ -61,11 +58,19 @@ namespace kutil {
void kfree(void *p) { return g_kernel_heap.free(p); }
}
template <typename T>
uintptr_t
get_physical_page(T *p) {
return memory::page_align_down(reinterpret_cast<uintptr_t>(p));
}
void
memory_initialize_pre_ctors(args::header &kargs)
{
using kernel::args::frame_block;
page_table *kpml4 = static_cast<page_table*>(kargs.pml4);
new (&g_kernel_heap) kutil::heap_allocator {heap_start, kernel_max_heap};
frame_block *blocks = reinterpret_cast<frame_block*>(memory::bitmap_start);
@@ -73,17 +78,21 @@ memory_initialize_pre_ctors(args::header &kargs)
// Mark all the things the bootloader allocated for us as used
g_frame_allocator.used(
reinterpret_cast<uintptr_t>(kargs.frame_blocks),
get_physical_page(&kargs),
memory::page_count(sizeof(kargs)));
g_frame_allocator.used(
get_physical_page(kargs.frame_blocks),
kargs.frame_block_pages);
g_frame_allocator.used(
reinterpret_cast<uintptr_t>(kargs.pml4),
get_physical_page(kargs.pml4),
kargs.table_pages);
for (unsigned i = 0; i < kargs.num_modules; ++i) {
const kernel::args::module &mod = kargs.modules[i];
g_frame_allocator.used(
reinterpret_cast<uintptr_t>(mod.location),
get_physical_page(mod.location),
memory::page_count(mod.size));
}
@@ -97,7 +106,6 @@ memory_initialize_pre_ctors(args::header &kargs)
}
}
page_table *kpml4 = reinterpret_cast<page_table*>(kargs.pml4);
process *kp = process::create_kernel_process(kpml4);
vm_space &vm = kp->space();
@@ -105,42 +113,28 @@ memory_initialize_pre_ctors(args::header &kargs)
vm_area_untracked(kernel_max_heap, vm_flags::write);
vm.add(heap_start, heap);
vm_area *stacks = new (&g_kernel_stacks) vm_area_guarded {
memory::stacks_start,
memory::kernel_stack_pages,
memory::kernel_max_stacks,
vm_flags::write};
vm.add(memory::stacks_start, &g_kernel_stacks);
// Clean out any remaning bootloader page table entries
for (unsigned i = 0; i < memory::pml4e_kernel; ++i)
kpml4->entries[i] = 0;
}
void
memory_initialize_post_ctors(args::header &kargs)
{
vm_space &vm = vm_space::kernel_space();
vm.add(memory::stacks_start, &g_kernel_stacks);
vm.add(memory::buffers_start, &g_kernel_buffers);
g_frame_allocator.free(
reinterpret_cast<uintptr_t>(kargs.page_tables),
get_physical_page(kargs.page_tables),
kargs.table_count);
using memory::frame_size;
using memory::kernel_stack_pages;
constexpr size_t stack_size = kernel_stack_pages * frame_size;
for (int ist = 1; ist <= 3; ++ist) {
uintptr_t bottom = g_kernel_stacks.get_section();
log::debug(logs::boot, "Installing IST%d stack at %llx", ist, bottom);
// Pre-realize and xerothese stacks, they're no good
// if they page fault
kutil::memset(reinterpret_cast<void*>(bottom), 0, stack_size);
// Skip two entries to be the null frame
tss_set_ist(ist, bottom + stack_size - 2 * sizeof(uintptr_t));
}
#define ISR(i, s, name) if (s) { idt_set_ist(i, s); }
#define EISR(i, s, name) if (s) { idt_set_ist(i, s); }
#define IRQ(i, q, name)
#include "interrupt_isrs.inc"
#undef IRQ
#undef EISR
#undef ISR
}
static void
@@ -198,15 +192,6 @@ log_mtrrs()
pat_names[(pat >> (6*8)) & 7], pat_names[(pat >> (7*8)) & 7]);
}
void
setup_pat()
{
uint64_t pat = rdmsr(msr::ia32_pat);
pat = (pat & 0x00ffffffffffffffull) | (0x01ull << 56); // set PAT 7 to WC
wrmsr(msr::ia32_pat, pat);
log_mtrrs();
}
process *
load_simple_process(args::program &program)

View File

@@ -13,15 +13,11 @@ static kutil::no_construct<process> __g_kernel_process_storage;
process &g_kernel_process = __g_kernel_process_storage.value;
kutil::vector<process*> process::s_processes;
process::process() :
kobject {kobject::type::process},
m_next_handle {1},
m_state {state::running}
{
s_processes.append(this);
j6_handle_t self = add_handle(this);
kassert(self == self_handle(), "Process self-handle is not 1");
}
@@ -39,10 +35,9 @@ process::~process()
{
for (auto &it : m_handles)
if (it.val) it.val->handle_release();
s_processes.remove_swap(this);
}
process & process::current() { return *bsp_cpu_data.p; }
process & process::current() { return *current_cpu().process; }
process & process::kernel_process() { return g_kernel_process; }
process *
@@ -63,7 +58,7 @@ process::exit(int32_t code)
thread->exit(code);
}
if (this == bsp_cpu_data.p)
if (this == current_cpu().process)
scheduler::get().schedule();
}

View File

@@ -94,6 +94,4 @@ private:
enum class state : uint8_t { running, exited };
state m_state;
static kutil::vector<process*> s_processes;
};

View File

@@ -9,7 +9,7 @@
extern "C" void kernel_to_user_trampoline();
static constexpr j6_signal_t thread_default_signals = 0;
extern vm_area_guarded g_kernel_stacks;
extern vm_area_guarded &g_kernel_stacks;
thread::thread(process &parent, uint8_t pri, uintptr_t rsp0) :
kobject(kobject::type::thread, thread_default_signals),
@@ -43,13 +43,9 @@ thread::from_tcb(TCB *tcb)
return reinterpret_cast<thread*>(kutil::offset_pointer(tcb, offset));
}
thread &
thread::current()
{
return *bsp_cpu_data.t;
}
thread & thread::current() { return *current_cpu().thread; }
inline void schedule_if_current(thread *t) { if (t == bsp_cpu_data.t) scheduler::get().schedule(); }
inline void schedule_if_current(thread *t) { if (t == current_cpu().thread) scheduler::get().schedule(); }
void
thread::wait_on_signals(kobject *obj, j6_signal_t signals)
@@ -225,7 +221,5 @@ thread::create_idle_thread(process &kernel, uint8_t pri, uintptr_t rsp0)
thread *idle = new thread(kernel, pri, rsp0);
idle->set_state(state::constant);
idle->set_state(state::ready);
log::info(logs::task, "Created idle thread as koid %llx", idle->koid());
return idle;
}

View File

@@ -66,9 +66,7 @@ vm_area_fixed::vm_area_fixed(uintptr_t start, size_t size, vm_flags flags) :
{
}
vm_area_fixed::~vm_area_fixed()
{
}
vm_area_fixed::~vm_area_fixed() {}
size_t vm_area_fixed::resize(size_t size)
{
@@ -91,9 +89,7 @@ vm_area_untracked::vm_area_untracked(size_t size, vm_flags flags) :
{
}
vm_area_untracked::~vm_area_untracked()
{
}
vm_area_untracked::~vm_area_untracked() {}
bool
vm_area_untracked::get_page(uintptr_t offset, uintptr_t &phys)
@@ -119,6 +115,8 @@ vm_area_open::vm_area_open(size_t size, vm_flags flags) :
{
}
vm_area_open::~vm_area_open() {}
bool
vm_area_open::get_page(uintptr_t offset, uintptr_t &phys)
{
@@ -134,6 +132,8 @@ vm_area_guarded::vm_area_guarded(uintptr_t start, size_t buf_pages, size_t size,
{
}
vm_area_guarded::~vm_area_guarded() {}
uintptr_t
vm_area_guarded::get_section()
{

View File

@@ -114,6 +114,7 @@ public:
/// \arg size Initial virtual size of the memory area
/// \arg flags Flags for this memory area
vm_area_open(size_t size, vm_flags flags);
virtual ~vm_area_open();
virtual bool get_page(uintptr_t offset, uintptr_t &phys) override;
@@ -155,6 +156,8 @@ public:
size_t size,
vm_flags flags);
virtual ~vm_area_guarded();
/// Get an available section in this area
uintptr_t get_section();

View File

@@ -44,7 +44,7 @@ inline bool contains(uint64_t page_off, uint64_t word, uint8_t &index) {
uint64_t base = to_base(word);
uint64_t bits = to_level(word) * bits_per_level;
index = (page_off >> bits) & 0x3f;
return (page_off & (~0x3full << bits)) != base;
return (page_off & (~0x3full << bits)) == base;
}
inline uint64_t index_for(uint64_t page_off, uint8_t level) {

View File

@@ -17,6 +17,7 @@
#include "objects/channel.h"
#include "objects/process.h"
#include "objects/system.h"
#include "objects/thread.h"
#include "objects/vm_area.h"
#include "scheduler.h"
@@ -25,40 +26,37 @@
#include "kutil/assert.h"
extern "C" void task_switch(TCB *tcb);
scheduler *scheduler::s_instance = nullptr;
const uint64_t rflags_noint = 0x002;
const uint64_t rflags_int = 0x202;
extern uint64_t idle_stack_end;
scheduler::scheduler(lapic *apic) :
m_apic(apic),
m_next_pid(1),
m_clock(0),
m_last_promotion(0)
struct run_queue
{
kassert(!s_instance, "Multiple schedulers created!");
tcb_node *current = nullptr;
tcb_list ready[scheduler::num_priorities];
tcb_list blocked;
uint64_t last_promotion = 0;
uint64_t last_steal = 0;
kutil::spinlock lock;
};
scheduler::scheduler(unsigned cpus) :
m_next_pid {1},
m_clock {0}
{
kassert(!s_instance, "Created multiple schedulers!");
if (!s_instance)
s_instance = this;
process *kp = &process::kernel_process();
m_run_queues.set_size(cpus);
}
log::debug(logs::task, "Kernel process koid %llx", kp->koid());
thread *idle = thread::create_idle_thread(*kp, max_priority,
reinterpret_cast<uintptr_t>(&idle_stack_end));
log::debug(logs::task, "Idle thread koid %llx", idle->koid());
auto *tcb = idle->tcb();
m_runlists[max_priority].push_back(tcb);
m_current = tcb;
bsp_cpu_data.rsp0 = tcb->rsp0;
bsp_cpu_data.tcb = tcb;
bsp_cpu_data.p = kp;
bsp_cpu_data.t = idle;
scheduler::~scheduler()
{
// Not truly necessary - if the scheduler is going away, the whole
// system is probably going down. But let's be clean.
if (s_instance == this)
s_instance = nullptr;
}
template <typename T>
@@ -69,20 +67,6 @@ inline T * push(uintptr_t &rsp, size_t size = sizeof(T)) {
return p;
}
thread *
scheduler::create_process(bool user)
{
process *p = new process;
thread *th = p->create_thread(default_priority, user);
TCB *tcb = th->tcb();
log::debug(logs::task, "Creating thread %llx, priority %d, time slice %d",
th->koid(), tcb->priority, tcb->time_left);
th->set_state(thread::state::ready);
return th;
}
void
scheduler::create_kernel_task(void (*task)(), uint8_t priority, bool constant)
{
@@ -112,25 +96,42 @@ scheduler::quantum(int priority)
void
scheduler::start()
{
log::info(logs::sched, "Starting scheduler.");
wrmsr(msr::ia32_gs_base, reinterpret_cast<uintptr_t>(&bsp_cpu_data));
m_apic->enable_timer(isr::isrTimer, false);
m_apic->reset_timer(10);
cpu_data &cpu = current_cpu();
run_queue &queue = m_run_queues[cpu.index];
kutil::scoped_lock lock {queue.lock};
process *kp = &process::kernel_process();
thread *idle = thread::create_idle_thread(*kp, max_priority, cpu.rsp0);
log::debug(logs::task, "CPU%02x idle thread koid %llx", cpu.index, idle->koid());
auto *tcb = idle->tcb();
cpu.process = kp;
cpu.thread = idle;
cpu.tcb = tcb;
queue.current = tcb;
log::info(logs::sched, "CPU%02x starting scheduler", cpu.index);
cpu.apic->enable_timer(isr::isrTimer, false);
cpu.apic->reset_timer(10);
}
void
scheduler::add_thread(TCB *t)
{
m_blocked.push_back(static_cast<tcb_node*>(t));
t->time_left = quantum(t->priority);
cpu_data &cpu = current_cpu();
run_queue &queue = m_run_queues[cpu.index];
kutil::scoped_lock lock {queue.lock};
queue.blocked.push_back(static_cast<tcb_node*>(t));
t->time_left = quantum(t->priority);
}
void scheduler::prune(uint64_t now)
void scheduler::prune(run_queue &queue, uint64_t now)
{
// Find processes that are ready or have exited and
// move them to the appropriate lists.
auto *tcb = m_blocked.front();
auto *tcb = queue.blocked.front();
while (tcb) {
thread *th = thread::from_tcb(tcb);
uint8_t priority = tcb->priority;
@@ -138,7 +139,7 @@ void scheduler::prune(uint64_t now)
bool ready = th->has_state(thread::state::ready);
bool exited = th->has_state(thread::state::exited);
bool constant = th->has_state(thread::state::constant);
bool current = tcb == m_current;
bool current = tcb == queue.current;
ready |= th->wake_on_time(now);
@@ -153,7 +154,7 @@ void scheduler::prune(uint64_t now)
// page tables
if (current) continue;
m_blocked.remove(remove);
queue.blocked.remove(remove);
process &p = th->parent();
// thread_exited deletes the thread, and returns true if the process
@@ -161,19 +162,19 @@ void scheduler::prune(uint64_t now)
if(!current && p.thread_exited(th))
delete &p;
} else {
m_blocked.remove(remove);
queue.blocked.remove(remove);
log::debug(logs::sched, "Prune: readying unblocked thread %llx", th->koid());
m_runlists[remove->priority].push_back(remove);
queue.ready[remove->priority].push_back(remove);
}
}
}
void
scheduler::check_promotions(uint64_t now)
scheduler::check_promotions(run_queue &queue, uint64_t now)
{
for (auto &pri_list : m_runlists) {
for (auto &pri_list : queue.ready) {
for (auto *tcb : pri_list) {
const thread *th = thread::from_tcb(m_current);
const thread *th = thread::from_tcb(queue.current);
const bool constant = th->has_state(thread::state::constant);
if (constant)
continue;
@@ -188,80 +189,145 @@ scheduler::check_promotions(uint64_t now)
if (stale) {
// If the thread is stale, promote it
m_runlists[priority].remove(tcb);
queue.ready[priority].remove(tcb);
tcb->priority -= 1;
tcb->time_left = quantum(tcb->priority);
m_runlists[tcb->priority].push_back(tcb);
queue.ready[tcb->priority].push_back(tcb);
log::info(logs::sched, "Scheduler promoting thread %llx, priority %d",
th->koid(), tcb->priority);
}
}
}
m_last_promotion = now;
queue.last_promotion = now;
}
static size_t
balance_lists(tcb_list &to, tcb_list &from)
{
size_t to_len = to.length();
size_t from_len = from.length();
// Only steal from the rich, don't be Dennis Moore
if (from_len <= to_len)
return 0;
size_t steal = (from_len - to_len) / 2;
for (size_t i = 0; i < steal; ++i)
to.push_front(from.pop_front());
return steal;
}
void
scheduler::steal_work(cpu_data &cpu)
{
// First grab a scheduler-wide lock to avoid deadlock
kutil::scoped_lock steal_lock {m_steal_lock};
// Lock this cpu's queue for the whole time while we modify it
run_queue &my_queue = m_run_queues[cpu.index];
kutil::scoped_lock my_queue_lock {my_queue.lock};
const unsigned count = m_run_queues.count();
for (unsigned i = 0; i < count; ++i) {
if (i == cpu.index) continue;
run_queue &other_queue = m_run_queues[i];
kutil::scoped_lock other_queue_lock {other_queue.lock};
size_t stolen = 0;
// Don't steal from max_priority, that's the idle thread
for (unsigned pri = 0; pri < max_priority; ++pri)
stolen += balance_lists(my_queue.ready[pri], other_queue.ready[pri]);
stolen += balance_lists(my_queue.blocked, other_queue.blocked);
if (stolen)
log::debug(logs::sched, "CPU%02x stole %2d tasks from CPU%02x",
cpu.index, stolen, i);
}
}
void
scheduler::schedule()
{
uint8_t priority = m_current->priority;
uint32_t remaining = m_apic->stop_timer();
m_current->time_left = remaining;
thread *th = thread::from_tcb(m_current);
cpu_data &cpu = current_cpu();
run_queue &queue = m_run_queues[cpu.index];
lapic &apic = *cpu.apic;
uint32_t remaining = apic.stop_timer();
if (m_clock - queue.last_steal > steal_frequency) {
steal_work(cpu);
queue.last_steal = m_clock;
}
// We need to explicitly lock/unlock here instead of
// using a scoped lock, because the scope doesn't "end"
// for the current thread until it gets scheduled again
kutil::spinlock::waiter waiter;
queue.lock.acquire(&waiter);
queue.current->time_left = remaining;
thread *th = thread::from_tcb(queue.current);
uint8_t priority = queue.current->priority;
const bool constant = th->has_state(thread::state::constant);
if (remaining == 0) {
if (priority < max_priority && !constant) {
// Process used its whole timeslice, demote it
++m_current->priority;
log::info(logs::sched, "Scheduler demoting thread %llx, priority %d",
th->koid(), m_current->priority);
++queue.current->priority;
log::debug(logs::sched, "Scheduler demoting thread %llx, priority %d",
th->koid(), queue.current->priority);
}
m_current->time_left = quantum(m_current->priority);
queue.current->time_left = quantum(queue.current->priority);
} else if (remaining > 0) {
// Process gave up CPU, give it a small bonus to its
// remaining timeslice.
uint32_t bonus = quantum(priority) >> 4;
m_current->time_left += bonus;
queue.current->time_left += bonus;
}
m_runlists[priority].remove(m_current);
if (th->has_state(thread::state::ready)) {
m_runlists[m_current->priority].push_back(m_current);
queue.ready[queue.current->priority].push_back(queue.current);
} else {
m_blocked.push_back(m_current);
queue.blocked.push_back(queue.current);
}
clock::get().update();
prune(++m_clock);
if (m_clock - m_last_promotion > promote_frequency)
check_promotions(m_clock);
prune(queue, ++m_clock);
if (m_clock - queue.last_promotion > promote_frequency)
check_promotions(queue, m_clock);
priority = 0;
while (m_runlists[priority].empty()) {
while (queue.ready[priority].empty()) {
++priority;
kassert(priority < num_priorities, "All runlists are empty");
}
m_current->last_ran = m_clock;
queue.current->last_ran = m_clock;
auto *next = m_runlists[priority].pop_front();
auto *next = queue.ready[priority].pop_front();
next->last_ran = m_clock;
m_apic->reset_timer(next->time_left);
apic.reset_timer(next->time_left);
if (next == queue.current) {
queue.lock.release(&waiter);
return;
}
if (next != m_current) {
thread *next_thread = thread::from_tcb(next);
bsp_cpu_data.t = next_thread;
bsp_cpu_data.p = &next_thread->parent();
m_current = next;
cpu.thread = next_thread;
cpu.process = &next_thread->parent();
queue.current = next;
log::debug(logs::sched, "Scheduler switching threads %llx->%llx",
th->koid(), next_thread->koid());
log::debug(logs::sched, "CPU%02x switching threads %llx->%llx",
cpu.index, th->koid(), next_thread->koid());
log::debug(logs::sched, " priority %d time left %d @ %lld.",
m_current->priority, m_current->time_left, m_clock);
log::debug(logs::sched, " PML4 %llx", m_current->pml4);
next->priority, next->time_left, m_clock);
log::debug(logs::sched, " PML4 %llx", next->pml4);
task_switch(m_current);
}
queue.lock.release(&waiter);
task_switch(queue.current);
}

View File

@@ -3,20 +3,19 @@
/// The task scheduler and related definitions
#include <stdint.h>
#include "objects/thread.h"
#include "kutil/spinlock.h"
#include "kutil/vector.h"
namespace kernel {
namespace args {
struct program;
}}
struct cpu_data;
class lapic;
class process;
struct page_table;
struct cpu_state;
extern "C" void isr_handler(cpu_state*);
extern "C" void task_switch(TCB *next);
struct run_queue;
/// The task scheduler
@@ -42,8 +41,9 @@ public:
static const uint16_t process_quanta = 10;
/// Constructor.
/// \arg apic Pointer to the local APIC object
scheduler(lapic *apic);
/// \arg cpus The number of CPUs to schedule for
scheduler(unsigned cpus);
~scheduler();
/// Create a new process from a program image in memory.
/// \arg program The descriptor of the pogram in memory
@@ -69,47 +69,35 @@ public:
/// Run the scheduler, possibly switching to a new task
void schedule();
/// Get the current TCB.
/// \returns A pointer to the current thread's TCB
inline TCB * current() { return m_current; }
/// Start scheduling a new thread.
/// \arg t The new thread's TCB
void add_thread(TCB *t);
/// Get a reference to the system scheduler
/// Get a reference to the scheduler
/// \returns A reference to the global system scheduler
static scheduler & get() { return *s_instance; }
private:
friend uintptr_t syscall_dispatch(uintptr_t, cpu_state &);
friend class process;
static constexpr uint64_t promote_frequency = 10;
static constexpr uint64_t steal_frequency = 10;
/// Create a new process object. This process will have its pid
/// set but nothing else.
/// \arg user True if this thread will enter userspace
/// \returns The new process' main thread
thread * create_process(bool user);
void prune(uint64_t now);
void check_promotions(uint64_t now);
lapic *m_apic;
void prune(run_queue &queue, uint64_t now);
void check_promotions(run_queue &queue, uint64_t now);
void steal_work(cpu_data &cpu);
uint32_t m_next_pid;
uint32_t m_tick_count;
process *m_kernel_process;
tcb_node *m_current;
tcb_list m_runlists[num_priorities];
tcb_list m_blocked;
kutil::vector<run_queue> m_run_queues;
// TODO: lol a real clock
uint64_t m_clock = 0;
uint64_t m_last_promotion;
kutil::spinlock m_steal_lock;
static scheduler *s_instance;
};

View File

@@ -1,20 +1,19 @@
#include <stddef.h>
#include "kutil/memory.h"
#include "console.h"
#include "cpu.h"
#include "debug.h"
#include "log.h"
#include "msr.h"
#include "scheduler.h"
#include "syscall.h"
extern "C" {
void syscall_invalid(uint64_t call);
void syscall_handler_prelude();
}
uintptr_t syscall_registry[static_cast<unsigned>(syscall::MAX)];
const char * syscall_names[static_cast<unsigned>(syscall::MAX)];
uintptr_t syscall_registry[256] __attribute__((section(".syscall_registry")));
const char * syscall_names[256] __attribute__((section(".syscall_registry")));
static constexpr size_t num_syscalls = sizeof(syscall_registry) / sizeof(syscall_registry[0]);
void
syscall_invalid(uint64_t call)
@@ -23,13 +22,10 @@ syscall_invalid(uint64_t call)
cons->set_color(9);
cons->printf("\nReceived unknown syscall: %02x\n", call);
const unsigned num_calls =
static_cast<unsigned>(syscall::MAX);
cons->printf(" Known syscalls:\n");
cons->printf(" invalid %016lx\n", syscall_invalid);
for (unsigned i = 0; i < num_calls; ++i) {
for (unsigned i = 0; i < num_syscalls; ++i) {
const char *name = syscall_names[i];
uintptr_t handler = syscall_registry[i];
if (name)
@@ -41,33 +37,14 @@ syscall_invalid(uint64_t call)
}
void
syscall_enable()
syscall_initialize()
{
// IA32_STAR - high 32 bits contain k+u CS
// Kernel CS: GDT[1] ring 0 bits[47:32]
// User CS: GDT[3] ring 3 bits[63:48]
uint64_t star =
(((1ull << 3) | 0) << 32) |
(((3ull << 3) | 3) << 48);
wrmsr(msr::ia32_star, star);
// IA32_LSTAR - RIP for syscall
wrmsr(msr::ia32_lstar,
reinterpret_cast<uintptr_t>(&syscall_handler_prelude));
// IA32_FMASK - FLAGS mask inside syscall
wrmsr(msr::ia32_fmask, 0x200);
static constexpr unsigned num_calls =
static_cast<unsigned>(syscall::MAX);
kutil::memset(&syscall_registry, 0, sizeof(syscall_registry));
kutil::memset(&syscall_names, 0, sizeof(syscall_names));
#define SYSCALL(id, name, result, ...) \
syscall_registry[id] = reinterpret_cast<uintptr_t>(syscalls::name); \
syscall_names[id] = #name; \
static_assert( id <= num_calls, "Syscall " #name " has id > syscall::MAX" ); \
log::debug(logs::syscall, "Enabling syscall 0x%02x as " #name , id);
#include "j6/tables/syscalls.inc"
#undef SYSCALL

View File

@@ -10,13 +10,10 @@ enum class syscall : uint64_t
#define SYSCALL(id, name, ...) name = id,
#include "j6/tables/syscalls.inc"
#undef SYSCALL
// Maximum syscall id. If you change this, also change
// MAX_SYSCALLS in syscall.s
MAX = 0x40
};
void syscall_enable();
void syscall_initialize();
extern "C" void syscall_enable();
namespace syscalls
{

View File

@@ -1,17 +1,32 @@
%include "tasking.inc"
; Make sure to keep MAX_SYSCALLS in sync with
; syscall::MAX in syscall.h
MAX_SYSCALLS equ 0x40
; SYSCALL/SYSRET control MSRs
MSR_STAR equ 0xc0000081
MSR_LSTAR equ 0xc0000082
MSR_FMASK equ 0xc0000084
; IA32_STAR - high 32 bits contain k+u CS
; Kernel CS: GDT[1] ring 0 bits[47:32]
; User CS: GDT[3] ring 3 bits[63:48]
STAR_HIGH equ \
(((1 << 3) | 0)) | \
(((3 << 3) | 3) << 16)
; IA32_FMASK - Mask off interrupts in syscalls
FMASK_VAL equ 0x200
extern __counter_syscall_enter
extern __counter_syscall_sysret
extern syscall_registry
extern syscall_invalid
global syscall_handler_prelude
global syscall_handler_prelude:function (syscall_handler_prelude.end - syscall_handler_prelude)
syscall_handler_prelude:
push rbp ; Never executed, fake function prelude
mov rbp, rsp ; to calm down gdb
.real:
swapgs
mov [gs:CPU_DATA.rsp3], rsp
mov rsp, [gs:CPU_DATA.rsp0]
@@ -36,14 +51,7 @@ syscall_handler_prelude:
inc qword [rel __counter_syscall_enter]
cmp rax, MAX_SYSCALLS
jle .ok_syscall
.bad_syscall:
mov rdi, rax
call syscall_invalid
.ok_syscall:
and rax, 0xff ; Only 256 possible syscall values
lea r11, [rel syscall_registry]
mov r11, [r11 + rax * 8]
cmp r11, 0
@@ -52,8 +60,14 @@ syscall_handler_prelude:
call r11
inc qword [rel __counter_syscall_sysret]
jmp kernel_to_user_trampoline
global kernel_to_user_trampoline
.bad_syscall:
mov rdi, rax
call syscall_invalid
.end:
global kernel_to_user_trampoline:function (kernel_to_user_trampoline.end - kernel_to_user_trampoline)
kernel_to_user_trampoline:
pop r15
pop r14
@@ -70,3 +84,28 @@ kernel_to_user_trampoline:
swapgs
o64 sysret
.end:
global syscall_enable:function (syscall_enable.end - syscall_enable)
syscall_enable:
push rbp
mov rbp, rsp
mov rcx, MSR_STAR
mov rax, 0
mov rdx, STAR_HIGH
wrmsr
mov rcx, MSR_LSTAR
mov rax, syscall_handler_prelude.real
mov rdx, rax
shr rdx, 32
wrmsr
mov rcx, MSR_FMASK
mov rax, FMASK_VAL
wrmsr
pop rbp
ret
.end:

View File

@@ -1,6 +1,5 @@
%include "tasking.inc"
extern g_tss
global task_switch
task_switch:
push rbp
@@ -18,7 +17,7 @@ task_switch:
mov [rax + TCB.rsp], rsp
; Copy off saved user rsp
mov rcx, [gs:CPU_DATA.rsp3] ; rcx: curretn task's saved user rsp
mov rcx, [gs:CPU_DATA.rsp3] ; rcx: current task's saved user rsp
mov [rax + TCB.rsp3], rcx
; Install next task's TCB
@@ -31,7 +30,7 @@ task_switch:
mov rcx, [rdi + TCB.rsp0] ; rcx: top of next task's kernel stack
mov [gs:CPU_DATA.rsp0], rcx
lea rdx, [rel g_tss] ; rdx: address of TSS
mov rdx, [gs:CPU_DATA.tss] ; rdx: address of TSS
mov [rdx + TSS.rsp0], rcx
; Update saved user rsp
@@ -67,3 +66,8 @@ initialize_main_thread:
; the entrypoint should already be on the stack
jmp kernel_to_user_trampoline
global _current_gsbase
_current_gsbase:
mov rax, [gs:CPU_DATA.self]
ret

View File

@@ -6,9 +6,17 @@ struc TCB
endstruc
struc CPU_DATA
.self: resq 1
.id: resw 1
.index: resw 1
.reserved resd 1
.rsp0: resq 1
.rsp3: resq 1
.tcb: resq 1
.thread: resq 1
.process: resq 1
.tss: resq 1
.gdt: resq 1
endstruc
struc TSS

68
src/kernel/tss.cpp Normal file
View File

@@ -0,0 +1,68 @@
#include "kutil/assert.h"
#include "kutil/memory.h"
#include "kutil/no_construct.h"
#include "cpu.h"
#include "kernel_memory.h"
#include "log.h"
#include "objects/vm_area.h"
#include "tss.h"
// The BSP's TSS is initialized _before_ global constructors are called,
// so we don't want it to have a global constructor, lest it overwrite
// the previous initialization.
static kutil::no_construct<TSS> __g_bsp_tss_storage;
TSS &g_bsp_tss = __g_bsp_tss_storage.value;
TSS::TSS()
{
kutil::memset(this, 0, sizeof(TSS));
m_iomap_offset = sizeof(TSS);
}
TSS &
TSS::current()
{
return *current_cpu().tss;
}
uintptr_t &
TSS::ring_stack(unsigned ring)
{
kassert(ring < 3, "Bad ring passed to TSS::ring_stack.");
return m_rsp[ring];
}
uintptr_t &
TSS::ist_stack(unsigned ist)
{
kassert(ist > 0 && ist < 7, "Bad ist passed to TSS::ist_stack.");
return m_ist[ist];
}
void
TSS::create_ist_stacks(uint8_t ist_entries)
{
extern vm_area_guarded &g_kernel_stacks;
using memory::frame_size;
using memory::kernel_stack_pages;
constexpr size_t stack_bytes = kernel_stack_pages * frame_size;
for (unsigned ist = 1; ist < 8; ++ist) {
if (!(ist_entries & (1 << ist))) continue;
// Two zero entries at the top for the null frame
uintptr_t stack_bottom = g_kernel_stacks.get_section();
uintptr_t stack_top = stack_bottom + stack_bytes - 2 * sizeof(uintptr_t);
log::debug(logs::memory, "Created IST stack at %016lx size 0x%lx",
stack_bottom, stack_bytes);
// Pre-realize these stacks, they're no good if they page fault
for (unsigned i = 0; i < kernel_stack_pages; ++i)
*reinterpret_cast<uint64_t*>(stack_bottom + i * frame_size) = 0;
ist_stack(ist) = stack_top;
}
}

39
src/kernel/tss.h Normal file
View File

@@ -0,0 +1,39 @@
#pragma once
/// \file tss.h
/// Definitions relating to the TSS
#include <stdint.h>
/// The 64bit TSS table
class TSS
{
public:
TSS();
/// Get the currently running CPU's TSS.
static TSS & current();
/// Ring stack accessor. Returns a mutable reference.
/// \arg ring Which ring (0-3) to get the stack for
/// \returns A mutable reference to the stack pointer
uintptr_t & ring_stack(unsigned ring);
/// IST stack accessor. Returns a mutable reference.
/// \arg ist Which IST entry (1-7) to get the stack for
/// \returns A mutable reference to the stack pointer
uintptr_t & ist_stack(unsigned ist);
/// Allocate new stacks for the given IST entries.
/// \arg ist_entries A bitmap of used IST entries
void create_ist_stacks(uint8_t ist_entries);
private:
uint32_t m_reserved0;
uintptr_t m_rsp[3]; // stack pointers for CPL 0-2
uintptr_t m_ist[8]; // ist[0] is reserved
uint64_t m_reserved1;
uint16_t m_reserved2;
uint16_t m_iomap_offset;
} __attribute__ ((packed));

View File

@@ -33,7 +33,7 @@ vm_space::vm_space(page_table *p) :
{}
vm_space::vm_space() :
m_kernel(false)
m_kernel {false}
{
m_pml4 = page_table::get_table_page();
page_table *kpml4 = kernel_space().m_pml4;
@@ -163,6 +163,7 @@ void
vm_space::page_in(const vm_area &vma, uintptr_t offset, uintptr_t phys, size_t count)
{
using memory::frame_size;
kutil::scoped_lock lock {m_lock};
uintptr_t base = 0;
if (!find_vma(vma, base))
@@ -190,6 +191,7 @@ void
vm_space::clear(const vm_area &vma, uintptr_t offset, size_t count, bool free)
{
using memory::frame_size;
kutil::scoped_lock lock {m_lock};
uintptr_t base = 0;
if (!find_vma(vma, base))

View File

@@ -4,6 +4,7 @@
#include <stdint.h>
#include "kutil/enum_bitfields.h"
#include "kutil/spinlock.h"
#include "kutil/vector.h"
#include "page_table.h"
@@ -127,6 +128,8 @@ private:
bool operator==(const struct area &o) const;
};
kutil::vector<area> m_areas;
kutil::spinlock m_lock;
};
IS_BITFIELD(vm_space::fault_type);

View File

@@ -1,5 +1,5 @@
#include <stdint.h>
#include "cpu/cpu.h"
#include "cpu/cpu_id.h"
namespace cpu {
@@ -94,4 +94,13 @@ cpu_id::has_feature(feature feat)
return (m_features & (1 << static_cast<uint64_t>(feat))) != 0;
}
uint8_t
cpu_id::local_apic_id() const
{
uint32_t eax_unused;
uint32_t ebx;
__cpuid(1, 0, &eax_unused, &ebx);
return static_cast<uint8_t>(ebx >> 24);
}
}

View File

@@ -1,5 +1,5 @@
#pragma once
/// \file cpu.h Definition of required cpu features for jsix
/// \file cpu_id.h Definition of required cpu features for jsix
#include <stdint.h>
@@ -48,6 +48,9 @@ public:
/// \returns A |regs| struct of the values retuned
regs get(uint32_t leaf, uint32_t sub = 0) const;
/// Get the local APIC ID of the current CPU
uint8_t local_apic_id() const;
/// Get the name of the cpu vendor (eg, "GenuineIntel")
inline const char * vendor_id() const { return m_vendor_id; }

View File

@@ -6,6 +6,7 @@
#include <stdint.h>
#include "kutil/bip_buffer.h"
#include "kutil/spinlock.h"
namespace kutil {
namespace log {
@@ -111,6 +112,7 @@ private:
uint8_t m_sequence;
kutil::bip_buffer m_buffer;
kutil::spinlock m_lock;
static logger *s_log;
static const char *s_level_names[static_cast<unsigned>(level::max)];

View File

@@ -1,19 +1,46 @@
/// \file spinlock.h
/// Spinlock types and related defintions
#pragma once
#include <atomic>
namespace kutil {
/// An MCS based spinlock
class spinlock
{
public:
spinlock() : m_lock(false) {}
spinlock();
~spinlock();
inline void enter() { while (!m_lock.exchange(true)); }
inline void leave() { m_lock.store(false); }
/// A node in the wait queue.
struct waiter
{
bool locked;
waiter *next;
};
void acquire(waiter *w);
void release(waiter *w);
private:
std::atomic<bool> m_lock;
waiter *m_lock;
};
/// Scoped lock that owns a spinlock::waiter
class scoped_lock
{
public:
inline scoped_lock(spinlock &lock) : m_lock(lock) {
m_lock.acquire(&m_waiter);
}
inline ~scoped_lock() {
m_lock.release(&m_waiter);
}
private:
spinlock &m_lock;
spinlock::waiter m_waiter;
};
} // namespace kutil

View File

@@ -91,6 +91,8 @@ logger::output(level severity, area_t area, const char *fmt, va_list args)
header->bytes +=
vsnprintf(header->message, sizeof(buffer) - sizeof(entry), fmt, args);
kutil::scoped_lock lock {m_lock};
if (m_immediate) {
buffer[header->bytes] = 0;
m_immediate(area, severity, header->message);
@@ -117,6 +119,8 @@ logger::output(level severity, area_t area, const char *fmt, va_list args)
size_t
logger::get_entry(void *buffer, size_t size)
{
kutil::scoped_lock lock {m_lock};
void *out;
size_t out_size = m_buffer.get_block(&out);
if (out_size == 0 || out == 0)

View File

@@ -0,0 +1,49 @@
#include "kutil/spinlock.h"
namespace kutil {
static constexpr int memorder = __ATOMIC_SEQ_CST;
spinlock::spinlock() : m_lock {nullptr} {}
spinlock::~spinlock() {}
void
spinlock::acquire(waiter *w)
{
w->next = nullptr;
w->locked = true;
// Point the lock at this waiter
waiter *prev = __atomic_exchange_n(&m_lock, w, memorder);
if (prev) {
// If there was a previous waiter, wait for them to
// unblock us
prev->next = w;
while (w->locked) {
asm ("pause");
}
} else {
w->locked = false;
}
}
void
spinlock::release(waiter *w)
{
if (!w->next) {
// If we're still the last waiter, we're done
if(__atomic_compare_exchange_n(&m_lock, &w, nullptr, false, memorder, memorder))
return;
}
// Wait for the subseqent waiter to tell us who they are
while (!w->next) {
asm ("pause");
}
// Unblock the subseqent waiter
w->next->locked = false;
}
} // namespace kutil