Now that the other CPUs have been brought up, add support for scheduling
tasks on them. The scheduler now maintains separate ready/blocked lists
per CPU, and CPUs will attempt to balance load via periodic work
stealing.
Other changes as a result of this:
- The device manager no longer creates a local APIC object, but instead
just gathers relevant info from the APCI tables. Each CPU creates its
own local APIC object. This also spurred the APIC timer calibration to
become a static value, as all APICs are assumed to be symmetrical.
- Fixed a bug where the scheduler was popping the current task off of
its ready list, however the current task is never on the ready list
(except the idle task was first set up as both current and ready).
This was causing the lists to get into bad states. Now a task can only
ever be current or in a ready or blocked list.
- Got rid of the unused static process::s_processes list of all
processes, instead of trying to synchronize it via locks.
- Added spinlocks for synchronization to the scheduler and logger
objects.
This very large commit is mainly focused on getting the APs started and
to a state where they're waiting to have work scheduled. (Actually
scheduling on them is for another commit.)
To do this, a bunch of major changes were needed:
- Moving a lot of the CPU initialization (including for the BSP) to
init_cpu(). This includes setting up IST stacks, writing MSRs, and
creating the cpu_data structure. For the APs, this also creates and
installs the GDT and TSS, and installs the global IDT.
- Creating the AP startup code, which tries to be as position
independent as possible. It's copied from its location to 0x8000 for
AP startup, and some of it is fixed at that address. The AP startup
code jumps from real mode to long mode with paging in one swell foop.
- Adding limited IPI capability to the lapic class. This will need to
improve.
- Renaming cpu/cpu.* to cpu/cpu_id.* because it was just annoying in GDB
and really isn't anything but cpu_id anymore.
- Moved all the GDT, TSS, and IDT code into their own files and made
them classes instead of a mess of free functions.
- Got rid of bsp_cpu_data everywhere. Now always call the new
current_cpu() to get the current CPU's cpu_data.
- Device manager keeps a list of APIC ids now. This should go somewhere
else eventually, device_manager needs to be refactored away.
- Moved some more things (notably the g_kernel_stacks vma) to the
pre-constructor setup in memory_bootstrap. That whole file is in bad
need of a refactor.
Instead of always mapping the framebuffer at an arbitrary location, and
so reporting that to userspace, send the physical address so drivers can
call system_map_mmio().
In preparation for moving things to the init process, move process
loading out of the scheduler. memory_bootstrap now has a
load_simple_process function for mapping an args::program into memory,
and the stack setup has been simplified (though all the initv values are
still being added by the kernel - this needs rework) and normalized to
use the thread::add_thunk_user code path.
This also prompted a change of the process initialization protocol to
allow handles to get typed, and changing to marking them as just
self/other handls. This also means exposing the object type enum to
userspace.
To enable setting sections as NX or read-only, the boot program loader
now loads programs as lists of sections, and the kernel args are updated
accordingly. The kernel's loader now just takes a program pointer to
iterate the sections. Also enable NX in IA32_EFER in the bootloader.
There was previously no good way to block log-display tasks, either the
fb driver or the kernel log task. Now the system object has a signal
(j6_signal_system_has_log) that gets asserted when the log is written
to.
Several changes were needed to make this work:
- Update the page_table::flags to understand memory caching types
- Set up the PAT MSR to add the WC option
- Make page-offset area mapped as WT
- Add all the MTRR and PAT MSRs, and log the MTRRs for verification
- Add a vm_area flag for write_combining
It was not consistent how processes got handles to themselves or their
threads, ending up with double entries. Now make such handles automatic
and expose them with new self_handle() methods.
If there's no video, do as we did before, otherwise route logs to the fb
driver instead. (Need to clean this up to just have a log consumer
general interface?) Also added a "scrollback" class to fb driver and
updated the system_get_log syscall.
Move process init from each process needing a main.s with _start to
crt0.s in libc. Also change to a sysv-like initial stack with a
j6-specific array of initialization values after the program arguments.
Create a new framebuffer driver. Also hackily passing frame buffer size
in the list of init handles to all processes and mapping the framebuffer
into all processes. Changed bootloader passing frame buffer as a module
to its own struct.
In order to implement capabilities on system resources like IRQs so that
they may be restricted to drivers only, add a new 'system' kobject type,
and move the bind_irq functionality from endpoint to system.
Also fix some stack bugs passing the initial handles to a program.
- Add a tag field to all endpoint messages, which doubles as a
notification field
- Add a endpoint_bind_irq syscall to enable an endpoint to listen for
interrupt notifications. This mechanism needs to change.
- Add a temporary copy of the serial port code to nulldrv, and let it
take responsibility for COM2
Remove ELF and initrd loading from the kernel. The bootloader now loads
the initial programs, as it does with the kernel. Other files that were
in the initrd are now on the ESP, and non-program files are just passed
as modules.
Instead of making every callsite that may make a thread do a blocking
operation also invoke the scheduler, move that logic into thread
implementation - if the thread is blocking and is the current thread,
call schedule().
Related changes in this commit:
- Also make exiting threads and processes call the scheduler when
blocking.
- Threads start blocked, and get automatically added to the scheduler's
blocked list.
A check was added in scheduler::prune() which defers deleting threads
and processes if they're the current ones. However, they were still
getting removed from the block list, so they were being leaked.
As mentioned in the last commit, with processes owning spaces, there was
a weird extra space in the "kernel" process that owns the kernel
threads. Now we use that space as the global kernel space, and don't
create a separate one.
vm_space and page_table continue to take over duties from
page_manager:
- creation and deletion of address spaces / pml4s
- cross-address-space copies for endpoints
- taking over pml4 ownership from process
Also fixed the bug where the wrong process was being set in the cpu
data.
To solve: now the kernel process has its own vm_space which is not
g_kernel_space.
Defer from calling process::thread_exited() in scheduler::prune() if the
thread in question is the currently-executing thread, so that we don't
blow away the stack we're executing on. The next call to prune will pick
up the exited thread.
The "fake" stdout channel is now being passed in the new j6_process_init
structure to processes, and nulldrv now uses it to print a message to
the console.
The scheduler singleton was getting constructed twice, once at static
time and then again in main(). Make the singleton a pointer so we only
construct it once.
Previously we added startup_bonus to work around a segfault happening
when we preemted a newly created process instead of letting it give up
the CPU. Bug is not longer occuring, though that makes me nervous.
Implement the syscalls necessary for threads to create other threads in
their same process. This involved rearranging a number of syscalls, as
well as implementing object_wait and a basic implementation of a
process' list of handles.
The TCB is always stored at a constant offset within the thread object.
So instead of carrying an extra pointer, just implement thread::from_tcb
to get the thread.
Re-implent the concept of processes as separate from threads, and as a
kobject API object. Also improve scheduler::prune which was doing some
unnecessary iterations.
A few updates to scheduler policies:
* Grant processes a startup timeslice bonus for time spent loading the
process
* Grant processes a small fraction of a timeslice for yielding the CPU
with time left
Create a clock class which can be queried for current timestamp in
nanoseconds. Also implements a simple HPET class as one possible clock
source.
Tags: time
The scheduler again tracks remaining timeslice. Timeslices are bigger,
but once a process uses all of its timeslice, it's demoted and
replenished at the next priority. The scheduler also tracks the last
time a process ran, and promotes it if it's been starved for twice its
full timeslice.
TODO: replenish a small amount of timeslice each time a process is run,
so that more interactive processes keep their priorities.
Instead of many timer interrupts and decrementing a process' remaining
quanta, change to setting a single timer for when a process should be
preempted. If it uses its whole timeslice, demote it. If it uses less
than half before blocking, promote it. Determine timeslice based on
priority as well.
This change also required changing the apic timer interface to be purely
interval (in microseconds) based instead of its previous interval/tick
hybrid.
Many kernel objects had to keep a hold of refrences to allocators in
order to pass them on down the call chain. Remove those explicit
refrences and use `operator new`, `operator delete`, and define new
`kalloc` and `kfree`.
Also remove `slab_allocator` and replace it with a new mixin for slab
allocation, `slab_allocated`, that overrides `operator new` and
`operator free` for its subclass.
Remove some no longer used related headers, `buddy_allocator.h` and
`address_manager.h`
Tags: memory
This commit makes several fundamental changes to memory handling:
- the frame allocator is now only an allocator for free frames, and does
not track used frames.
- the frame allocator now stores its free list inside the free frames
themselves, as a hybrid stack/span model.
- This has the implication that all frames must currently fit within
the offset area.
- kutil has a new allocator interface, which is the only allowed way for
any code outside of src/kernel to allocate. Code under src/kernel
_may_ use new/delete, but should prefer the allocator interface.
- the heap manager has become heap_allocator, which is merely an
implementation of kutil::allocator which doles out sections of a given
address range.
- the heap manager now only writes block headers when necessary,
avoiding page faults until they're actually needed
- page_manager now has a page fault handler, which checks with the
address_manager to see if the address is known, and provides a frame
mapping if it is, allowing heap manager to work with its entire
address size from the start. (Currently 32GiB.)
There are a lot of under the hood changes here:
- Move syscalls to be a dispatch table, defined by syscalls.inc
- Don't need a full process state (push_all) in syscalls now
- In push_all, define REGS instead of using offsets
- Save TWO stack pointers as well as current saved stack pointer in TCB:
- rsp0 is the base of the kernel stack for interrupts
- rsp3 is the saved user stack from cpu_data
- Update syscall numbers in nulldrv
- Some asm-debugging enhancements to the gdb script
- fork() still not working
- More sensible stack tracer, in C++ (no symbols yet)
- Was forgetting to add null frame to new kernel stacks
- __kernel_assert was using an old vector
- A GP fault will only print its associated table entry