Update the cpu data to point to the fake kernel process in
cpu_early_init so there can never be a race condition where the current
process may not be set.
The idle threads for the APs have intentionally tiny stacks. Logging is
currently an absolute hog of stack space, so avoid logging on the idle
stacks as much as possible.
Eventually we should instead just reclaim the physical pages used by
most of the stack instead of making them tiny.
Changing the SFINAE/enable_if strategy from a type to a constexpr
function means that it can be defined in other scopes than the functions
themselves, because of function overloading. This lets us put everything
into the kutil::bitfields namespace, and make bitfields out of enums in
other namespaces. Also took the chance to clean up the implementation a
bit.
Now that the other CPUs have been brought up, add support for scheduling
tasks on them. The scheduler now maintains separate ready/blocked lists
per CPU, and CPUs will attempt to balance load via periodic work
stealing.
Other changes as a result of this:
- The device manager no longer creates a local APIC object, but instead
just gathers relevant info from the APCI tables. Each CPU creates its
own local APIC object. This also spurred the APIC timer calibration to
become a static value, as all APICs are assumed to be symmetrical.
- Fixed a bug where the scheduler was popping the current task off of
its ready list, however the current task is never on the ready list
(except the idle task was first set up as both current and ready).
This was causing the lists to get into bad states. Now a task can only
ever be current or in a ready or blocked list.
- Got rid of the unused static process::s_processes list of all
processes, instead of trying to synchronize it via locks.
- Added spinlocks for synchronization to the scheduler and logger
objects.
KVM didn't like setting all the CR4 bits we wanted at once. I suspect
that means real hardware won't either. Delay the setting of the rest of
CR4 until after the CPU is in long mode - only set PAE and PGE from real
mode.
Because the firmware can set the APIC ids to whatever it wants, add a
sequential index to each cpu_data structure that jsix will use for its
main identifier, or for indexing into arrays, etc.
More and more places in the kernel init code are taking addresses from
the bootloader and translating them to offset-mapped addresses. The
bootloader can do this, so it should.
In order to avoid cyclic dependencies in the case of page faults while
bringing up an AP, pre-allocate the cpu_data structure and related CPU
control structures, and pass them to the AP startup code.
This also changes the following:
- cpu_early_init() was split out of cpu_early_init() to allow early
usage of current_cpu() on the BSP before we're ready for the rest of
cpu_init(). (These functions were also renamed to follow the preferred
area_action naming style.)
- isr_handler now zeroes out the IST entry for its vector instead of
trying to increment the IST stack pointer
- the IST stacks are allocated outside of cpu_init, to also help reduce
stack pressue and chance of page faults before APs are ready
- share stack areas between AP idle threads so we only waste 1K per
additional AP for the unused idle stack
Since SYSCALL/SYSRET rely on MSRs to control their function, split out
syscall_enable() into syscall_initialize() and syscall_enable(), the
latter being called on all CPUs. This affects not just syscalls but also
the kernel_to_user_trampoline.
Additionally, do away with the max syscalls, and just make a single page
of syscall pointers and name pointers. Max syscalls was fragile and
needed to be kept in sync in multiple places.
By adding more debug information to the symbols and adding function
frame preludes to the isr handler assembly functions, GDB sees them as
valid locations for stack frames, and can display backtraces through
interrupts.
Update the existing but unused spinlock class to an MCS-style queue
spinlock. This is probably still a WIP but I expect it to see more use
with SMP getting further integrated.
This very large commit is mainly focused on getting the APs started and
to a state where they're waiting to have work scheduled. (Actually
scheduling on them is for another commit.)
To do this, a bunch of major changes were needed:
- Moving a lot of the CPU initialization (including for the BSP) to
init_cpu(). This includes setting up IST stacks, writing MSRs, and
creating the cpu_data structure. For the APs, this also creates and
installs the GDT and TSS, and installs the global IDT.
- Creating the AP startup code, which tries to be as position
independent as possible. It's copied from its location to 0x8000 for
AP startup, and some of it is fixed at that address. The AP startup
code jumps from real mode to long mode with paging in one swell foop.
- Adding limited IPI capability to the lapic class. This will need to
improve.
- Renaming cpu/cpu.* to cpu/cpu_id.* because it was just annoying in GDB
and really isn't anything but cpu_id anymore.
- Moved all the GDT, TSS, and IDT code into their own files and made
them classes instead of a mess of free functions.
- Got rid of bsp_cpu_data everywhere. Now always call the new
current_cpu() to get the current CPU's cpu_data.
- Device manager keeps a list of APIC ids now. This should go somewhere
else eventually, device_manager needs to be refactored away.
- Moved some more things (notably the g_kernel_stacks vma) to the
pre-constructor setup in memory_bootstrap. That whole file is in bad
need of a refactor.
While working on the double-buffering issue, I ripped out this feature
from scrollback and didn't put it back in. Also having main allocate
extra space for the message buffer since calling malloc/free over again
several times was causing malloc to panic. (Which should also separately
be fixed..)
The frame allocator was causing page faults when exhausting the first
(well, last, because it starts from the end) block of free pages. Turns
out it was just incrementing instead of decrementing and thus running
off the end.
Since the kernel will tell us what size of buffer we need for
j6_system_get_log(), malloc the buffer for it from the heap instead of a
fixed array on the stack.
We have one big array of characters in scrollback, with lines of
constant width -- there's no reason to have an array of pointers when
offsets will do.
Instead of always mapping the framebuffer at an arbitrary location, and
so reporting that to userspace, send the physical address so drivers can
call system_map_mmio().
Probably due to old UEFI page tables going away, some systems failed to
load ACPI tables at their physical location. Make sure to translate them
to kernel offset-mapped addresses.
Create a syscall for drivers to be able to ask the kernel for a VMA that
maps a MMIO area. Also expose vm_flags via j6 table style include file
and new flags.h header.
The 0 index was still sometimes not handled properly in the hash table.
Also 0 is sometimes indicative of an error. Let's just start handles
from 1 to avoid those issues.
We started actually running up against the page boundary for kernel
stacks and thus double-faulting on page faults from kernel space. So I
finally added IST stacks. Note that we currently just
increment/decrement the IST entry by a page when we enter the handler to
avoid clobbering on re-entry, but this means:
* these handlers need to be able to operate with only a page of stack
* kernel stacks always have to be >1 pages
* the amount of nesting possible is tied to the kernel stack size.
These seem fine for now, but we should maybe find a way to use something
besides g_kernel_stacks to set up the IST stacks if/when this becomes an
issue.
The previous method of VMA page tracking relied on the VMA always being
mapped at least into one space and just kept track of pages in the
spaces' page tables. This had a number of drawbacks, and the mapper
system was too complex without much benefit.
Now make VMAs themselves keep track of spaces that they're a part of,
and make them responsible for knowing what page goes where. This
simplifies most types of VMA greatly. The new vm_area_open (nee
vm_area_shared, but there is now no reason for most VMAs to be
explicitly shareable) adds a 64-ary radix tree for tracking allocated
pages.
The page_tree cannot yet handle taking pages away, but this isn't
something jsix can do yet anyway.
ktuil::vector can take a static area of memory as its initial memory,
but the case was never handled where it outgrew that memory and had to
reallocate. Steal the high bit from the capacity value to indicate the
current memory should not be kfree()'d. Also added checks in the heap
allocator to make sure pointers look valid.
Pull syscall code out of libc and create new libj6. This should
eventually become a vDSO, but for now it can still be a static lib.
Also renames all the _syscall_* symbol names to j6_*
In preparation for moving things to the init process, move process
loading out of the scheduler. memory_bootstrap now has a
load_simple_process function for mapping an args::program into memory,
and the stack setup has been simplified (though all the initv values are
still being added by the kernel - this needs rework) and normalized to
use the thread::add_thunk_user code path.
Refactored out vm_space::handle_fault's allocation code into a separate
vm_space::allocate function, and reimplemented handle_fault in terms of
the new function.
This is a minor refactor including:
- Removing old commented-out syscall_dispatch function
- Removing IA32_EFER syscall-enable flag setting (this is done by the
bootloader now)
- Moving much logging from inside process/thread syscalls to the 'task'
log area, allowing for turning the 'syscall' area down to info by
default.