The 0 index was still sometimes not handled properly in the hash table.
Also 0 is sometimes indicative of an error. Let's just start handles
from 1 to avoid those issues.
We started actually running up against the page boundary for kernel
stacks and thus double-faulting on page faults from kernel space. So I
finally added IST stacks. Note that we currently just
increment/decrement the IST entry by a page when we enter the handler to
avoid clobbering on re-entry, but this means:
* these handlers need to be able to operate with only a page of stack
* kernel stacks always have to be >1 pages
* the amount of nesting possible is tied to the kernel stack size.
These seem fine for now, but we should maybe find a way to use something
besides g_kernel_stacks to set up the IST stacks if/when this becomes an
issue.
The previous method of VMA page tracking relied on the VMA always being
mapped at least into one space and just kept track of pages in the
spaces' page tables. This had a number of drawbacks, and the mapper
system was too complex without much benefit.
Now make VMAs themselves keep track of spaces that they're a part of,
and make them responsible for knowing what page goes where. This
simplifies most types of VMA greatly. The new vm_area_open (nee
vm_area_shared, but there is now no reason for most VMAs to be
explicitly shareable) adds a 64-ary radix tree for tracking allocated
pages.
The page_tree cannot yet handle taking pages away, but this isn't
something jsix can do yet anyway.
ktuil::vector can take a static area of memory as its initial memory,
but the case was never handled where it outgrew that memory and had to
reallocate. Steal the high bit from the capacity value to indicate the
current memory should not be kfree()'d. Also added checks in the heap
allocator to make sure pointers look valid.
Pull syscall code out of libc and create new libj6. This should
eventually become a vDSO, but for now it can still be a static lib.
Also renames all the _syscall_* symbol names to j6_*
In preparation for moving things to the init process, move process
loading out of the scheduler. memory_bootstrap now has a
load_simple_process function for mapping an args::program into memory,
and the stack setup has been simplified (though all the initv values are
still being added by the kernel - this needs rework) and normalized to
use the thread::add_thunk_user code path.
Refactored out vm_space::handle_fault's allocation code into a separate
vm_space::allocate function, and reimplemented handle_fault in terms of
the new function.
This is a minor refactor including:
- Removing old commented-out syscall_dispatch function
- Removing IA32_EFER syscall-enable flag setting (this is done by the
bootloader now)
- Moving much logging from inside process/thread syscalls to the 'task'
log area, allowing for turning the 'syscall' area down to info by
default.
This also prompted a change of the process initialization protocol to
allow handles to get typed, and changing to marking them as just
self/other handls. This also means exposing the object type enum to
userspace.
This makes the job of the kernel easier when marking module pages as
used in the frame allocator. This will also help when sending modules
over to the init process.
The previous frame allocator involved a lot of splitting and merging
linked lists and lost all information about frames while they were
allocated. The new allocator is based on an array of descriptor
structures and a bitmap. Each memory map region of allocatable memory
becomes one or more descriptors, each mapping up to 1GiB of physical
memory. The descriptors implement two levels of a bitmap tree, and have
a pointer into the large contiguous bitmap to track individual pages.
To enable setting sections as NX or read-only, the boot program loader
now loads programs as lists of sections, and the kernel args are updated
accordingly. The kernel's loader now just takes a program pointer to
iterate the sections. Also enable NX in IA32_EFER in the bootloader.
There was previously no good way to block log-display tasks, either the
fb driver or the kernel log task. Now the system object has a signal
(j6_signal_system_has_log) that gets asserted when the log is written
to.
Now that the spinwait bug is fixed, the raised time for APIC calibration
can be put back to a lower value. It was previously raised thinking more
time would get a more accurate result -- but accuracy was not the issue.
The endpoint syscalls endpoint_recv and endpoint_sendrecv gained new
local stack variables for calling into possibly blocking endpoint
functions, but the len variable was being initialized to 0 instead of
the incoming buffer size.
The amount of spurious IRQ activity on real hardware severely slows down
the system (minutes per frame instead of frames per second). There's no
reason to unmask all of them from the get-go before they're set up to be
handled.
In order to allow the bootloader to do preliminary CPUID validation
while UEFI is still handling displaying information to the user, split
most of the kernel's CPUID handling into a library to be used by both
kernel and boot.
Several changes were needed to make this work:
- Update the page_table::flags to understand memory caching types
- Set up the PAT MSR to add the WC option
- Make page-offset area mapped as WT
- Add all the MTRR and PAT MSRs, and log the MTRRs for verification
- Add a vm_area flag for write_combining
Scanline size can differ from horizontal resolution in some
framebuffers. This isn't currently handled, but at least log it so
it's visible if this lack of handling is a potential error.
The UEFI spec specifically calls out memory types with the high bit set
as being available for OS loaders' custom use. However, it seems many
UEFI firmware implementations don't handle this well. (Virtualbox, and
the firmware on my Intel NUC and Dell XPS laptop to name a few.)
So sadly since we can't rely on this feature of UEFI in all cases, we
can't use it at all. Instead, treat _all_ memory tagged as EfiLoaderData
as possibly containing data that's been passed to the OS by the
bootloader and don't free it yet.
This will need to be followed up with a change that copies anything we
need to save and frees this memory.
See: https://github.com/kiznit/rainbow-os/blob/master/boot/machine/efi/README.md
The UEFI spec specifically calls out memory types with the high bit set
as being available for OS loaders' custom use. However, it seems many
UEFI firmware implementations don't handle this well. (Virtualbox, and
the firmware on my Intel NUC and Dell XPS laptop to name a few.)
So sadly since we can't rely on this feature of UEFI in all cases, we
can't use it at all. Instead, treat _all_ memory tagged as EfiLoaderData
as possibly containing data that's been passed to the OS by the
bootloader and don't free it yet.
This will need to be followed up with a change that copies anything we
need to save and frees this memory.
See: https://github.com/kiznit/rainbow-os/blob/master/boot/machine/efi/README.md
After exiting UEFI, the bootloader had no way of displaying status to
the user. Now it will display a series of small boxes as a progress bar
along the bottom of the screen if a framebuffer exists. Errors or
warnings during a step will cause that step's box to turn red or orange,
and display bars above it to signal the error code.
This caused the simplification of the error handling system (which was
mostly just calling status_line::fail) and added different types of
status objects.
The endpoint receive syscalls can block and then write to userspace
memory. Since the current address space may be different after blocking,
make sure to only actually write to the user memory after returning to
the syscall handler - pass values that are on the syscall handler stack
deeper into the kernel.