Previously, when adding a new thread, we only ever added it to the
current CPU and relied on work stealing to balance the CPUs. This commit
has the scheduler schedule new tasks round-robin across CPUs in hopes of
having to steal fewer tasks.
Also adds the run_queue.prev pointer for debugging what task was just
running on the given CPU.
When waking another thread, if that thread has a more urgent priority
than the current thread on the same CPU, send that CPU an IPI to tell it
to run its scheduler.
Related changes in this commit:
- Addition of the ipiSchedule isr (vector 0xe4) and its handler in
isr_handler().
- Change the APIC's send_ipi* functions to take an isr enum and not an
int for their vector parameter
- Thread TCBs now contain a pointer to their current CPU's cpu_data
structure
- Add the maybe_schedule() call to the scheduler, which sends the
schedule IPI to the given thread's CPU only when that CPU is running a
less-urgent thread.
- Move the locking of a run queue lock earlier in schedule() instead of
taking the lock in steal_work() and again in schedule().
In preparation for the new mailbox IPC model, blocking threads needed an
overhaul. The `wait_on_*` and `wake_on_*` methods are gone, and the
`block()` and `wake()` calls on threads now pass a value between the
waker and the blocked thread.
As part of this change, the concept of signals on the base kobject class
was removed, along with the queue of blocked threads waiting on any
given object. Signals are now exclusively the domain of the event object
type, and the new wait_queue utility class helps manage waiting threads
when an object does actually need this functionality. In some cases (eg,
logger) an event object is used instead of the lower-level wait_queue.
Since this change has a lot of ramifications, this large commit includes
the following additional changes:
- The j6_object_wait, j6_object_wait_many, and j6_thread_pause syscalls
have been removed.
- The j6_event_clear syscall has been removed - events are "cleared" by
reading them now. A new j6_event_wait syscall has been added to read
events.
- The generic close() method on kobject has been removed.
- The on_no_handles() method on kobject now deletes the object by
default, and needs to be overridden by classes that should not be.
- The j6_system_bind_irq syscall now takes an event handle, as well as a
signal that the IRQ should set on the event. IRQs will cause a waiting
thread to be woken with the appropriate bit set.
- Threads waking due to timeout is simplified to just having a
wake_timeout() accessor that returns a timestamp.
- The new wait_queue uses util::deque, which caused the disovery of two
bugs in the deque implementation: empty deques could still have a
single array allocated and thus return true for empty(), and new
arrays getting allocated were not being zeroed first.
- Exposed a new erase() method on util::map that takes a node pointer
instead of a key, skipping lookup.
Two minor issues: scheduler::prune wasn't formatted correctly, and
j6/caps.h was not using the ull prefix when shifting 64 bit numbers.
(It's doubtful an object would get more than 32 caps any time soon, but
better to be correct.)
The scheduler was accidentally checking the state of the _currently
running_ thread when seeing if it should promote a thread in the ready
queue. So, ie, constant-priority threads would get promoted as long as
some non-constant-priority thread was the currently-running thread.
This commit contains a couple large, interdependent changes:
- In preparation for capability checking, the _syscall_verify_*
functions now load most handles passed in, and verify that they exist
and are of the correct type. Lists and out-handles are not converted
to objects.
- Also in preparation for capability checking, the internal
representation of handles has changed. j6_handle_t is now 32 bits, and
a new j6_cap_t (also 32 bits) is added. Handles of a process are now a
util::map<j6_handle_t, handle> where handle is a new struct containing
the id, capabilities, and object pointer.
- The kernel object definition DSL gained a few changes to support auto
generating the handle -> object conversion in the _syscall_verify_*
functions, mostly knowing the object type, and an optional "cname"
attribute on objects where their names differ from C++ code.
(Specifically vma/vm_area)
- Kernel object code and other code under kernel/objects is now in a new
obj:: namespace, because fuck you <cstdlib> for putting "system" in
the global namespace. Why even have that header then?
- Kernel object types constructed with the construct_handle helper now
have a creation_caps static member to declare what capabilities a
newly created object's handle should have.
There has been a global clock object for a while now, but scheduler was
never using it, instead still using its simple increment clock. Now it
uses the hpet clock.
First attempt at a UART driver. I'm not sure it's the most stable. Now
that userspace is handling displaying logs, also removed serial and log
output support from the kernel.
The j6threads command shows the current thread, ready threads, and
blocked threads for a given CPU.
To support this, TCB structs gained a pointer to their thread (instead
of trying to do offset magic) and threads gained a pointer to their
creator. Also removed thread::from_tcb() now that the TCB has a pointer.
This means the kernel now depends on libj6. I've added the macro
definition __j6kernel when building for the kernel target, so I can
remove parts with #ifdefs.
This is a rather large commit that is widely focused on cleaning things
out of the 'junk drawer' that is src/include. Most notably, several
things that were put in there because they needed somewhere where both
the kernel, boot, and init could read them have been moved to a new lib,
'bootproto'.
- Moved kernel_args.h and init_args.h to bootproto as kernel.h and
init.h, respectively.
- Moved counted.h and pointer_manipulation.h into util, renaming the
latter to util/pointers.h.
- Created a new src/include/arch for very arch-dependent definitions,
and moved some kernel_memory.h constants like frame size, page table
entry count, etc to arch/amd64/memory.h. Also created arch/memory.h
which detects platform and includes the former.
- Got rid of kernel_memory.h entirely in favor of a new, cog-based
approach. The new definitions/memory_layout.csv lists memory regions
in descending order from the top of memory, their sizes, and whether
they are shared outside the kernel (ie, boot needs to know them). The
new header bootproto/memory.h exposes the addresses of the shared
regions, while the kernel's memory.h gains the start and size of all
the regions. Also renamed the badly-named page-offset area the linear
area.
- The python build scripts got a few new features: the ability to parse
the csv mentioned above in a new memory.py module; the ability to add
dependencies to existing source files (The list of files that I had to
pull out of the main list just to add them with the dependency on
memory.h was getting too large. So I put them back into the sources
list, and added the dependency post-hoc.); and the ability to
reference 'source_root', 'build_root', and 'module_root' variables in
.module files.
- Some utility functions that were in the kernel's memory.h got moved to
util/pointers.h and util/misc.h, and misc.h's byteswap was renamed
byteswap32 to be more specific.
Now that kutil has no kernel-specific code in it anymore, it can
actually be linked to by anything, so I'm renaming it 'util'.
Also, I've tried to unify the way that the system libraries from
src/libraries are #included using <> instead of "".
Other small change: util::bip_buffer got a spinlock to guard against
state corruption.
Continuing moving things out of kutil. The assert as implemented could
only ever work in the kernel, so remaining kutil uses of kassert have
been moved to including standard C assert instead.
Along the way, kassert was broken out into panic::panic and kassert,
and the panic.serial namespace was renamed panicking.
Stop creating stacks in user space for user threads, that should be done
by the thread's creator. This change adds process and stack_top
arguments to the thread_create syscall, so that threads can be created
in other processes, and given a stack address.
Also included is a fix in add_thunk_user due to the r11/flags change.
THIS COMMIT BREAKS USERSPACE. See subsequent commits for the user side
changes related to this change.
I'm a tabs guy. I like tabs, it's an elegant way to represent
indentation instead of brute-forcing it. But I have to admit that the
world seems to be going towards spaces, and tooling tends not to play
nice with tabs. So here we go, changing the whole repo to spaces since
I'm getting tired of all the inconsistent formatting.
The scheduler queue locks could deadlock if the timer fired before the
scoped lock destructor ran. Also, reduce lock contention by letting only
one CPU steal work at a time.
The idle threads for the APs have intentionally tiny stacks. Logging is
currently an absolute hog of stack space, so avoid logging on the idle
stacks as much as possible.
Eventually we should instead just reclaim the physical pages used by
most of the stack instead of making them tiny.
Now that the other CPUs have been brought up, add support for scheduling
tasks on them. The scheduler now maintains separate ready/blocked lists
per CPU, and CPUs will attempt to balance load via periodic work
stealing.
Other changes as a result of this:
- The device manager no longer creates a local APIC object, but instead
just gathers relevant info from the APCI tables. Each CPU creates its
own local APIC object. This also spurred the APIC timer calibration to
become a static value, as all APICs are assumed to be symmetrical.
- Fixed a bug where the scheduler was popping the current task off of
its ready list, however the current task is never on the ready list
(except the idle task was first set up as both current and ready).
This was causing the lists to get into bad states. Now a task can only
ever be current or in a ready or blocked list.
- Got rid of the unused static process::s_processes list of all
processes, instead of trying to synchronize it via locks.
- Added spinlocks for synchronization to the scheduler and logger
objects.
This very large commit is mainly focused on getting the APs started and
to a state where they're waiting to have work scheduled. (Actually
scheduling on them is for another commit.)
To do this, a bunch of major changes were needed:
- Moving a lot of the CPU initialization (including for the BSP) to
init_cpu(). This includes setting up IST stacks, writing MSRs, and
creating the cpu_data structure. For the APs, this also creates and
installs the GDT and TSS, and installs the global IDT.
- Creating the AP startup code, which tries to be as position
independent as possible. It's copied from its location to 0x8000 for
AP startup, and some of it is fixed at that address. The AP startup
code jumps from real mode to long mode with paging in one swell foop.
- Adding limited IPI capability to the lapic class. This will need to
improve.
- Renaming cpu/cpu.* to cpu/cpu_id.* because it was just annoying in GDB
and really isn't anything but cpu_id anymore.
- Moved all the GDT, TSS, and IDT code into their own files and made
them classes instead of a mess of free functions.
- Got rid of bsp_cpu_data everywhere. Now always call the new
current_cpu() to get the current CPU's cpu_data.
- Device manager keeps a list of APIC ids now. This should go somewhere
else eventually, device_manager needs to be refactored away.
- Moved some more things (notably the g_kernel_stacks vma) to the
pre-constructor setup in memory_bootstrap. That whole file is in bad
need of a refactor.
Instead of always mapping the framebuffer at an arbitrary location, and
so reporting that to userspace, send the physical address so drivers can
call system_map_mmio().
In preparation for moving things to the init process, move process
loading out of the scheduler. memory_bootstrap now has a
load_simple_process function for mapping an args::program into memory,
and the stack setup has been simplified (though all the initv values are
still being added by the kernel - this needs rework) and normalized to
use the thread::add_thunk_user code path.
This also prompted a change of the process initialization protocol to
allow handles to get typed, and changing to marking them as just
self/other handls. This also means exposing the object type enum to
userspace.
To enable setting sections as NX or read-only, the boot program loader
now loads programs as lists of sections, and the kernel args are updated
accordingly. The kernel's loader now just takes a program pointer to
iterate the sections. Also enable NX in IA32_EFER in the bootloader.
There was previously no good way to block log-display tasks, either the
fb driver or the kernel log task. Now the system object has a signal
(j6_signal_system_has_log) that gets asserted when the log is written
to.
Several changes were needed to make this work:
- Update the page_table::flags to understand memory caching types
- Set up the PAT MSR to add the WC option
- Make page-offset area mapped as WT
- Add all the MTRR and PAT MSRs, and log the MTRRs for verification
- Add a vm_area flag for write_combining
It was not consistent how processes got handles to themselves or their
threads, ending up with double entries. Now make such handles automatic
and expose them with new self_handle() methods.
If there's no video, do as we did before, otherwise route logs to the fb
driver instead. (Need to clean this up to just have a log consumer
general interface?) Also added a "scrollback" class to fb driver and
updated the system_get_log syscall.
Move process init from each process needing a main.s with _start to
crt0.s in libc. Also change to a sysv-like initial stack with a
j6-specific array of initialization values after the program arguments.
Create a new framebuffer driver. Also hackily passing frame buffer size
in the list of init handles to all processes and mapping the framebuffer
into all processes. Changed bootloader passing frame buffer as a module
to its own struct.
In order to implement capabilities on system resources like IRQs so that
they may be restricted to drivers only, add a new 'system' kobject type,
and move the bind_irq functionality from endpoint to system.
Also fix some stack bugs passing the initial handles to a program.
- Add a tag field to all endpoint messages, which doubles as a
notification field
- Add a endpoint_bind_irq syscall to enable an endpoint to listen for
interrupt notifications. This mechanism needs to change.
- Add a temporary copy of the serial port code to nulldrv, and let it
take responsibility for COM2
Remove ELF and initrd loading from the kernel. The bootloader now loads
the initial programs, as it does with the kernel. Other files that were
in the initrd are now on the ESP, and non-program files are just passed
as modules.
Instead of making every callsite that may make a thread do a blocking
operation also invoke the scheduler, move that logic into thread
implementation - if the thread is blocking and is the current thread,
call schedule().
Related changes in this commit:
- Also make exiting threads and processes call the scheduler when
blocking.
- Threads start blocked, and get automatically added to the scheduler's
blocked list.
A check was added in scheduler::prune() which defers deleting threads
and processes if they're the current ones. However, they were still
getting removed from the block list, so they were being leaked.
As mentioned in the last commit, with processes owning spaces, there was
a weird extra space in the "kernel" process that owns the kernel
threads. Now we use that space as the global kernel space, and don't
create a separate one.
vm_space and page_table continue to take over duties from
page_manager:
- creation and deletion of address spaces / pml4s
- cross-address-space copies for endpoints
- taking over pml4 ownership from process
Also fixed the bug where the wrong process was being set in the cpu
data.
To solve: now the kernel process has its own vm_space which is not
g_kernel_space.
Defer from calling process::thread_exited() in scheduler::prune() if the
thread in question is the currently-executing thread, so that we don't
blow away the stack we're executing on. The next call to prune will pick
up the exited thread.
The "fake" stdout channel is now being passed in the new j6_process_init
structure to processes, and nulldrv now uses it to print a message to
the console.
The scheduler singleton was getting constructed twice, once at static
time and then again in main(). Make the singleton a pointer so we only
construct it once.
Previously we added startup_bonus to work around a segfault happening
when we preemted a newly created process instead of letting it give up
the CPU. Bug is not longer occuring, though that makes me nervous.
Implement the syscalls necessary for threads to create other threads in
their same process. This involved rearranging a number of syscalls, as
well as implementing object_wait and a basic implementation of a
process' list of handles.
The TCB is always stored at a constant offset within the thread object.
So instead of carrying an extra pointer, just implement thread::from_tcb
to get the thread.
Re-implent the concept of processes as separate from threads, and as a
kobject API object. Also improve scheduler::prune which was doing some
unnecessary iterations.