The scheduler queue locks could deadlock if the timer fired before the
scoped lock destructor ran. Also, reduce lock contention by letting only
one CPU steal work at a time.
Now that the other CPUs have been brought up, add support for scheduling
tasks on them. The scheduler now maintains separate ready/blocked lists
per CPU, and CPUs will attempt to balance load via periodic work
stealing.
Other changes as a result of this:
- The device manager no longer creates a local APIC object, but instead
just gathers relevant info from the APCI tables. Each CPU creates its
own local APIC object. This also spurred the APIC timer calibration to
become a static value, as all APICs are assumed to be symmetrical.
- Fixed a bug where the scheduler was popping the current task off of
its ready list, however the current task is never on the ready list
(except the idle task was first set up as both current and ready).
This was causing the lists to get into bad states. Now a task can only
ever be current or in a ready or blocked list.
- Got rid of the unused static process::s_processes list of all
processes, instead of trying to synchronize it via locks.
- Added spinlocks for synchronization to the scheduler and logger
objects.
This very large commit is mainly focused on getting the APs started and
to a state where they're waiting to have work scheduled. (Actually
scheduling on them is for another commit.)
To do this, a bunch of major changes were needed:
- Moving a lot of the CPU initialization (including for the BSP) to
init_cpu(). This includes setting up IST stacks, writing MSRs, and
creating the cpu_data structure. For the APs, this also creates and
installs the GDT and TSS, and installs the global IDT.
- Creating the AP startup code, which tries to be as position
independent as possible. It's copied from its location to 0x8000 for
AP startup, and some of it is fixed at that address. The AP startup
code jumps from real mode to long mode with paging in one swell foop.
- Adding limited IPI capability to the lapic class. This will need to
improve.
- Renaming cpu/cpu.* to cpu/cpu_id.* because it was just annoying in GDB
and really isn't anything but cpu_id anymore.
- Moved all the GDT, TSS, and IDT code into their own files and made
them classes instead of a mess of free functions.
- Got rid of bsp_cpu_data everywhere. Now always call the new
current_cpu() to get the current CPU's cpu_data.
- Device manager keeps a list of APIC ids now. This should go somewhere
else eventually, device_manager needs to be refactored away.
- Moved some more things (notably the g_kernel_stacks vma) to the
pre-constructor setup in memory_bootstrap. That whole file is in bad
need of a refactor.
In preparation for moving things to the init process, move process
loading out of the scheduler. memory_bootstrap now has a
load_simple_process function for mapping an args::program into memory,
and the stack setup has been simplified (though all the initv values are
still being added by the kernel - this needs rework) and normalized to
use the thread::add_thunk_user code path.
To enable setting sections as NX or read-only, the boot program loader
now loads programs as lists of sections, and the kernel args are updated
accordingly. The kernel's loader now just takes a program pointer to
iterate the sections. Also enable NX in IA32_EFER in the bootloader.
Create a new framebuffer driver. Also hackily passing frame buffer size
in the list of init handles to all processes and mapping the framebuffer
into all processes. Changed bootloader passing frame buffer as a module
to its own struct.
Remove ELF and initrd loading from the kernel. The bootloader now loads
the initial programs, as it does with the kernel. Other files that were
in the initrd are now on the ESP, and non-program files are just passed
as modules.
Instead of making every callsite that may make a thread do a blocking
operation also invoke the scheduler, move that logic into thread
implementation - if the thread is blocking and is the current thread,
call schedule().
Related changes in this commit:
- Also make exiting threads and processes call the scheduler when
blocking.
- Threads start blocked, and get automatically added to the scheduler's
blocked list.
vm_space and page_table continue to take over duties from
page_manager:
- creation and deletion of address spaces / pml4s
- cross-address-space copies for endpoints
- taking over pml4 ownership from process
Also fixed the bug where the wrong process was being set in the cpu
data.
To solve: now the kernel process has its own vm_space which is not
g_kernel_space.
The scheduler singleton was getting constructed twice, once at static
time and then again in main(). Make the singleton a pointer so we only
construct it once.
Previously we added startup_bonus to work around a segfault happening
when we preemted a newly created process instead of letting it give up
the CPU. Bug is not longer occuring, though that makes me nervous.
Implement the syscalls necessary for threads to create other threads in
their same process. This involved rearranging a number of syscalls, as
well as implementing object_wait and a basic implementation of a
process' list of handles.
Re-implent the concept of processes as separate from threads, and as a
kobject API object. Also improve scheduler::prune which was doing some
unnecessary iterations.
A few updates to scheduler policies:
* Grant processes a startup timeslice bonus for time spent loading the
process
* Grant processes a small fraction of a timeslice for yielding the CPU
with time left
The scheduler again tracks remaining timeslice. Timeslices are bigger,
but once a process uses all of its timeslice, it's demoted and
replenished at the next priority. The scheduler also tracks the last
time a process ran, and promotes it if it's been starved for twice its
full timeslice.
TODO: replenish a small amount of timeslice each time a process is run,
so that more interactive processes keep their priorities.
Instead of many timer interrupts and decrementing a process' remaining
quanta, change to setting a single timer for when a process should be
preempted. If it uses its whole timeslice, demote it. If it uses less
than half before blocking, promote it. Determine timeslice based on
priority as well.
This change also required changing the apic timer interface to be purely
interval (in microseconds) based instead of its previous interval/tick
hybrid.
Removing the `allocator.h` file defining the `kutil::allocator`
interface, now that explicit allocators are not being passed around.
Also removed the unused `frame_allocator::raw_allocator` class and
`kutil::invalid_allocator` object.
Tags: memory
Many kernel objects had to keep a hold of refrences to allocators in
order to pass them on down the call chain. Remove those explicit
refrences and use `operator new`, `operator delete`, and define new
`kalloc` and `kfree`.
Also remove `slab_allocator` and replace it with a new mixin for slab
allocation, `slab_allocated`, that overrides `operator new` and
`operator free` for its subclass.
Remove some no longer used related headers, `buddy_allocator.h` and
`address_manager.h`
Tags: memory
This commit makes several fundamental changes to memory handling:
- the frame allocator is now only an allocator for free frames, and does
not track used frames.
- the frame allocator now stores its free list inside the free frames
themselves, as a hybrid stack/span model.
- This has the implication that all frames must currently fit within
the offset area.
- kutil has a new allocator interface, which is the only allowed way for
any code outside of src/kernel to allocate. Code under src/kernel
_may_ use new/delete, but should prefer the allocator interface.
- the heap manager has become heap_allocator, which is merely an
implementation of kutil::allocator which doles out sections of a given
address range.
- the heap manager now only writes block headers when necessary,
avoiding page faults until they're actually needed
- page_manager now has a page fault handler, which checks with the
address_manager to see if the address is known, and provides a frame
mapping if it is, allowing heap manager to work with its entire
address size from the start. (Currently 32GiB.)
Previously CPU statue was passed on the stack, but the compiler is
allowed to clobber values passed to it on the stack in the SysV x86 ABI.
So now leave the state on the stack but pass a pointer to it into the
ISR functions.
Processes can now wait on signals/children/time. There is no clock
currently so "time" is just a monotonically increating tick count. Added
a SLEEP syscall to test this waiting/waking.
The syscall/sysret instructions don't swap stacks. This was bad but
passable until syscalls caused the scheduler to run, and scheduling a
task that paused due to interrupt.
Adding a new (hopefully temporary) syscall interrupt `int 0xee` to allow
me to test syscalls without stack issues before I tackle the
syscall/sysret issue.
Also implemented a basic `pause` syscall that causes the calling process
to become unready. Because nothing can wake a process yet, it never
returns.
- Scheduler now has multiple linked_lists of processes at different
priorities
- Process structure improvements
- scheduler::tick() and scheduler::schedule() separation
Now any initrd file is treated like a program image and passed to the
loader to load as a process. Very rudimentary elf loading just allocates
pages, copies sections, and sets the ELF's entrypoint as the RIP to
iretq to.