Split the functionality of outputting kernel logs out of the UART
driver, and into a new service. The UART driver now registers a console
out channel with the service locator, which the logger service
retrieves, and then enters a loop getting logs from the kernel and
printing them out to the console.
The init process now serves as a service locator for its children,
passing all children a mailbox handle on which it is serving the service
locator protocol.
This commit contains a number of related mailbox issues:
- Add extra parameters to mailbox_respond_receive to allow both the
number of bytes/handles passed in, and the size of the byte/handle
buffers to be passed in.
- Don't delete mailbox messages on receipt if the caller is waiting on
reply
- Correctly pass status messages along with a mailbox::replyer object
- Actually wake the calling thread in the mailbox::replyer dtor
- Make sure to release locks _before_ calling thread::wake() on blocked
threads, as that may cause them to be scheduled ahead of the current
thread.
This commit adds a new flag, j6_channel_block, and a new flags param to
the channel_receive syscall. When the block flag is specified, the
caller will block waiting for data on the channel if the channel is
empty.
Influenced by other libc implementations, I had tried to make memcpy
smarter for differently-sized ranges, but my benchmarks showed no real
change. So change memcpy back to the simple rep movsb implementation.
There was a specialization of util::hash() for uint64_t (which just
returns the integer value), but other integer sizes did not previously
have similar specializations.
Also, two minor semi-related changes to util::map - skip copying empty
nodes when growing the map, and assert that the hash is non-zero when
inserting a new node.
It seems more common to want to sleep for a duration than to sleep to a
specific time. Change the implementation to not make the process look up
the current time first. (Plus, there's no current syscall to do so)
When waking another thread, if that thread has a more urgent priority
than the current thread on the same CPU, send that CPU an IPI to tell it
to run its scheduler.
Related changes in this commit:
- Addition of the ipiSchedule isr (vector 0xe4) and its handler in
isr_handler().
- Change the APIC's send_ipi* functions to take an isr enum and not an
int for their vector parameter
- Thread TCBs now contain a pointer to their current CPU's cpu_data
structure
- Add the maybe_schedule() call to the scheduler, which sends the
schedule IPI to the given thread's CPU only when that CPU is running a
less-urgent thread.
- Move the locking of a run queue lock earlier in schedule() instead of
taking the lock in steal_work() and again in schedule().
The new logger event object for making get_entry() block when no logs
are available was consuming the event's notification even if the thread
did not need to block. This was causing excessive blocking - if multiple
logs had been added since the last call to get_entry(), only one would
be returned, and the next call would block until yet another log was
added.
Now only call event::wait() to block the calling thread if there are no
logs available.
Added profiler.h which defines classes and macros for defining profiler
objects. Also added gdb command j6prof for printing profile data. Added
the syscall_profiles profiler class and auto wrapping of syscalls with
profile objects.
Other changes in this commit:
- Made the gdb command `j6threads` argument for specifying a CPU
optional. Without an argument, it loops through all CPUs.
- Switched to -mcmodel=kernel for kernel code, which makes `call`
instructions easier to follow when debugging / looking at disassembly.
A few changes to the panic handler's display:
- Change rdi and rsi to match other general-purpose registers. (They
were previously blue, matching the stack/base pointer registers.)
- Change the ordering of r8-r15 to be column-major instead of row-major.
I find myself wanting to read down the columns to find the register
I'm looking for, and rax-rdx are already this way.
- Make the flags register yellow, matching the ss and cs registers
- Comment out the call to print_rip() call, as it's only occasionally
helpful and can cause the panic handler to page fault.
When debugging, or in panic callstacks, the BSP idle thread used to be
reported as `_kernel_start`, because it was just the loop at the end of
that assembly function. Now, wrap that loop in a separate symbol called
`bsp_idle` to make it clearer that the cpu is in the idle thread.
The constexpr_hash.h header has fallen out of use. As constexpr hashing
will be used for IDs with the service locator protocol, update these
hashes to be 32 and 64 bit FNV-1a, and replace the _h user-defined
literal with _id (a 64-bit hash), and _id8 (a 32-bit hash folded down to
8 bits). These are now in the util/hash.h header along with the runtime
hash functions.
Three issues that caused build breaks when regenerating the build
directory after the previous commits:
- system.def was including endpoint.def
- syscalls/vm_area.cpp was including j6/signals.h
- util/util.h was missing an include of stddef.h
The new mailbox kernel object API offers asynchronous message-based IPC
for sending data and handles between threads, as opposed to endpoint's
synchronous model.
In preparation for the new mailbox IPC model, blocking threads needed an
overhaul. The `wait_on_*` and `wake_on_*` methods are gone, and the
`block()` and `wake()` calls on threads now pass a value between the
waker and the blocked thread.
As part of this change, the concept of signals on the base kobject class
was removed, along with the queue of blocked threads waiting on any
given object. Signals are now exclusively the domain of the event object
type, and the new wait_queue utility class helps manage waiting threads
when an object does actually need this functionality. In some cases (eg,
logger) an event object is used instead of the lower-level wait_queue.
Since this change has a lot of ramifications, this large commit includes
the following additional changes:
- The j6_object_wait, j6_object_wait_many, and j6_thread_pause syscalls
have been removed.
- The j6_event_clear syscall has been removed - events are "cleared" by
reading them now. A new j6_event_wait syscall has been added to read
events.
- The generic close() method on kobject has been removed.
- The on_no_handles() method on kobject now deletes the object by
default, and needs to be overridden by classes that should not be.
- The j6_system_bind_irq syscall now takes an event handle, as well as a
signal that the IRQ should set on the event. IRQs will cause a waiting
thread to be woken with the appropriate bit set.
- Threads waking due to timeout is simplified to just having a
wake_timeout() accessor that returns a timestamp.
- The new wait_queue uses util::deque, which caused the disovery of two
bugs in the deque implementation: empty deques could still have a
single array allocated and thus return true for empty(), and new
arrays getting allocated were not being zeroed first.
- Exposed a new erase() method on util::map that takes a node pointer
instead of a key, skipping lookup.
After hitting 700 commits last week, I thought it'd be a good time to
update the project status in the Readme. This change also pulls out the
toolchain scripts that moved to jsix-os/toolchain, and updates the
Readme about that as well.
Another issue related to the bug fix in 3be4b10 - if the segment is
non-aligned, the size of the VMA needs to be seg.mem_size + the prologue
size.
Also renamed the variables from prelude/prologue to prologue/epilogue;
it must have been late at night that I wrote that...
The bug from 3be4b10 should not have happened in the first place, as
level_names and area_names should not have been in .data but in .rodata
(or .data.rel.ro in this case), so this change makes them const.
The __init_libc function was already running the .init_array functions,
but was never running the .preinit_array functions. Now it runs them
both, in the correct order.
The drv.uart ELF currently ends up with a segment vaddr starting at
0x215010, which includes .data and .bss. The old loader was mishandling
this in a few ways:
- Not zeroing out the leading 16 bytes, or the trailing .bss section
- Copying the segment data to the start of the page, so it was offset by
-16 bytes.
- Mapping the VMA into the child program at the non-page-aligned
address, which causes all sorts of trouble.
When displaying a set of user regs, also display memory around the
current rip from those user regs. This helps find or rule out memory
corruption errors causing invalid code to run.
This change introduces test_runner, which runs unit or integration tests
and then tells the kernel to exit QEMU with a status code indicating the
number of failed tests.
The test_runner program is not loaded by default. Use the test manifest
to enable it:
./configure --manifest=assets/manifests/test.yml
A number of tests from the old src/tests have moved over. More to come,
as well as moving code from testapp before getting rid of it.
The test.sh script has been repurposed to be a "headless" version of
qemu.sh for running tests, and it exits with the appropriate exit code.
(Though ./qemu.sh gained the ability to exit with the correct exit code
as well.) Exit codes from kernel panics have been updated so that the
bash scripts should exit with code 127.
Adding -lc++ -lc++abi -lunwind to user programs. Also, to support this,
start building using the custom toolchain and its new x86_64-jsix-elf
target triplet.
This has always been on the todo list, but it finally bit me. srv.init
re-uses load addresses when loading multiple programs, and collision
between reused addresses was causing corruption without the TLB flush.
Now srv.init also doesn't increment its load address for sections when
loading a single program either, since unmapping pages actually works.
This commit joins the implementation of exit, _Exit, and abort into a
single translation unit, and also adds atexit, at_quick_exit, and
quick_exit. While this does go against the ideal of all libc functions
being in their own translation unit, their implementations are very
related, and so I think this makes sense.
The ctype functions are now both macros and functions (as allowed by the
spec). They're now implemented in the ctype_b style of glibc, as
libunwind wants __ctype_b_loc to work.
The new "noreturn" option tag on syscall methods causes those methods to
be generated with [[noreturn]] / _Noreturn to avoid clang complaining
that other functions marked noreturn, like exit(), because it can't tell
that the syscall never returns.
This new libc is mostly from scratch, with *printf() functions provided
by Marco Paland and Eyal Rozenberg's tiny printf library, and malloc and
friends provided by dlmalloc.
The great header shift: It didn't make sense to regenerate headers for
the same module for every target (boot/kernel/user) it appeared in. And
now that core headers are out of src/include, this was going to cause
problems for the new libc changes I've been working on. So I went back
to re-design how module headers work.
Pre-requisites:
- A module's public headers should all be available in one location, not
tied to target.
- No accidental includes. Another module should not be able to include
anything (creating an implicit dependency) from a module without
declaring an explicit dependency.
- Exception to the previous: libc's headers should be available to all,
at least for the freestanding headers.
New system:
- A new "public_headers" property of module declares all public headers
that should be available to dependant modules
- All public headers (after possible processing) are installed relative
to build/include/<module> with the same path as their source
- This also means no "include" dir in modules is necessary. If a header
should be included as <j6/types.h> then its source should be
src/libraries/j6/j6/types.h - this caused the most churn as all public
header sources moved one directory up.
- The "includes" property of a module is local only to that module now,
it does not create any implicit public interface
Other changes:
- The bonnibel concept of sources changed: instead of sources having
actions, they themselves are an instance of a (sub)class of Source,
which provides all the necessary information itself.
- Along with the above, rule names were standardized into <type>.<ext>,
eg "compile.cpp" or "parse.cog"
- cog and cogflags variables moved from per-target scope to global scope
in the build files.
- libc gained a more dynamic .module file
The manifest can now supply a list of boot flags, including "test".
Those get turned into the bootproto::args::flags field by the
bootloader. The kernel takes those and uses the test flag to control
enabling syscalls with the new "test" attribute, like the new
test_finish syscall, which lets automated tests call back to the kernel
to shut down the system.
If the thread waiting is the current thread, it should have the result
when it wakes. Might as well return it, so that syscalls that know
they're putting the current thread to sleep can get the result easily.
The "handle" tag on syscall parameters causes syscall_verify.cpp to pass
the resulting object as a obj::handle* instead of directly as an object
pointer. Now the handle tag is supported directly on the syscall itself
as well, causing the "self" object to be passed as a handle pointer.
The event object was missing any syscalls. Furthermore, kobject had an
old object_signal implementation (the syscall itself no longer exists),
which was removed. User code should only be able to set signals on
events.
Two minor issues: scheduler::prune wasn't formatted correctly, and
j6/caps.h was not using the ull prefix when shifting 64 bit numbers.
(It's doubtful an object would get more than 32 caps any time soon, but
better to be correct.)
The cpu.cpp/smp.cpp cleanup out of kernel_main missed an important call:
kernel_main never called smp::ready() to unblock the APs waiting for the
scheduler to be ready.
The return of slab_allocated! Now after the kutil/util/kernel giant
cleanup, this belongs squarely in the kernel, and works much better
there. Slabs are allocated via a bump pointer into a new kernel VMA,
instead of using kalloc() or allocating pages directly.
Adding the util::deque container, implemented with the util::linked_list
of arrays of items.
Also, use the deque for a kobject's blocked thread list to maintain
order instead of a vector using remove_swap().
The new zero_ok flag is similar to optional for reference parameters,
but only in cases where there is a length parameter. If that parameter
is a reference parameter itself and is null, or it is non-null and
contains a non-zero length, or there is no length parameter, then the
main parameter may not be null.
The syscall interface is designed to closely follow the System V amd64
calling convention, so that as much as possible, the call into the
assembly trampoline for the syscall sets up the call correctly. Before
this change, the only exception was using r10 (a caller-saved register
already) to stash the contents of rcx, which gets clobbered by the
syscall instruction. However, this only preserves registers for the
function call, as the stack is switched upon kernel entry, and
additional call frames have been added by the time the syscall gets back
into C++ land.
This change adds a new parameter to the syscall in rbx. Since rbx is
callee-saved, the syscall trampoline pushes it to the stack, and then
puts the address of the stack-passed arguments into rbx. Now that the
syscall implementations are wrapped in the _syscall_verify_* functions,
we can piggy-back on those to also set up the extra arguments from the
user stack.
Now, for any syscall with 7 or more arguments, the verify wrapper takes
the first six arguments normally, then gets a stack pointer (the rbx
value) as its 7th and final argument. It's then the job of the verify
wrapper to get the remaining arguments from that stack pointer and pass
them to the implementation function as normal arguments.
The main point of this change is to allow "global" capabilities defined
on the base object type. The example here is the clone capability on all
objects, which governs the ability to clone a handle.
Related changes in this commit:
- Renamed `kobject` to `object` as far as the syscall interface is
concerned. `kobject` is the cname, but j6_cap_kobject_clone feels
clunky.
- The above change made me realize that the "object <type>" syntax for
specifying object references was also clunky, so now it's "ref <type>"
- Having to add `.object` on everywhere to access objects in
interface.exposes or object.super was cumbersome, so those properties
now return object types directly, instead of ObjectRef.
- syscall_verify.cpp.cog now generates code to check capabilities on
handles if they're specified in the definition, even when not passing
an object to the implementation function.
Added the handle_clone syscall which allows for cloning a handle with
a subset of the original handle's capabilities.
Related changes:
- srv.init now calls handle_clone on its system handle, and load_program
was changed to allow this second system handle to be passed to loaded
programs instead. However, as drv.uart is still a driver AND a log
reader, this new handle is not actually passed yet.
- The definition parser was using a set for the cap list, which meant
the order (and thus values) of caps was not static.
- Some code in objects/handle.h was made more explicit about what bits
meant what.
This change finally adds capabilities to handles. Included changes:
- j6_handle_t is now again 64 bits, with the highest 8 bits being a type
code, and the next highest 24 bits being the capability mask, so that
programs can check type/caps without calling the kernel.
- The definitions grammar now includes a `capabilities [ ]` section on
objects, to list what capabilities are relevant.
- j6/caps.h is auto-generated from object capability lists
- init_libj6 again sets __handle_self and __handle_sys, this is a bit
of a hack.
- A new syscall, j6_handle_list, will return the list of existing
handles owned by the calling process.
- syscall_verify.cpp.cog now actually checks that the needed
capabilities exist on handles before allowing the call.
Channels were unused, and while they were listed in syscalls.def, they
had no syscalls listed in their interface. This change adds them back,
and updates them to the curren syscall style.
The kernel/main.cpp and kernel/memory_bootstrap.cpp files had become
something of a junk drawer. This change cleans them up in the following
ways:
- Most CPU initialization has moved to cpu.cpp, allowing several
functions to be made static and removed from cpu.h
- Multi-core startup code has moved to the new smp.h and smp.cpp, and
ap_startup.s has been renamed smp.s to match.
- run_constructors() has moved to memory_bootstrap.cpp, and all the
functionality of that file has been hidden behind a new public
interface mem::initialize().
- load_init_server() has moved from memory_bootstrap.cpp to main.cpp
Endpoints could previously crash if two senders were concurrently
writing to them, so this change adds a spinlock and protects functions
that touch the signals and blocked list.
Added methods for releasing the lock held in a scoped_lock early, as
well as reacquiring it after. Useful when, eg a thread is about to block
and should not be holding the lock while blocked.
Changes to qemu.sh:
- Now takes the -l/--log option to create the qemu log in jsix.log,
otherwise does not ask qemu to log. (Way faster operations.)
- Remove the debug-isa-exit device when starting with the --debug flag,
so that we can inspect panics
- Changed from 512M to 4G of ram
The compile_commands.json database is used by clangd to understand the
compilation settings of the project. Keep this up to date by making
ninja update it as part of the build, and have it depend on all build
and module files.
If a bip_buffer's A buffer is in the middle of being appended to, but
that append has not yet been committed, and all committed A data has
been read, the buffer would get into a bad state where m_start_r pointed
to the end of the previous A buffer, but that data is no longer
connected to either A or B. So now we make sure to check m_size_r before
considering A "done".
Added a list of currently-set flags on the thread's state. Also stopped
tracebacks and returned instead of erroring out when they threw
gdb.MemoryError.
A spinlock was recenly added to thread wait states, so that they
couldn't get stuck if their wake event happened while setting the wake
state, and cause the thread to deadlock. But the scoped_lock objects
locking that spinlock were being instantiated as temporaries and
immediately being thrown away because I forgot to name them.
The scheduler was accidentally checking the state of the _currently
running_ thread when seeing if it should promote a thread in the ready
queue. So, ie, constant-priority threads would get promoted as long as
some non-constant-priority thread was the currently-running thread.
This commit contains a couple large, interdependent changes:
- In preparation for capability checking, the _syscall_verify_*
functions now load most handles passed in, and verify that they exist
and are of the correct type. Lists and out-handles are not converted
to objects.
- Also in preparation for capability checking, the internal
representation of handles has changed. j6_handle_t is now 32 bits, and
a new j6_cap_t (also 32 bits) is added. Handles of a process are now a
util::map<j6_handle_t, handle> where handle is a new struct containing
the id, capabilities, and object pointer.
- The kernel object definition DSL gained a few changes to support auto
generating the handle -> object conversion in the _syscall_verify_*
functions, mostly knowing the object type, and an optional "cname"
attribute on objects where their names differ from C++ code.
(Specifically vma/vm_area)
- Kernel object code and other code under kernel/objects is now in a new
obj:: namespace, because fuck you <cstdlib> for putting "system" in
the global namespace. Why even have that header then?
- Kernel object types constructed with the construct_handle helper now
have a creation_caps static member to declare what capabilities a
newly created object's handle should have.
Since we have a DSL for specifying syscalls, we can create a verificaton
method for each syscall that can cover most argument (and eventually
capability) verification instead of doing it piecemeal in each syscall
implementation, which can be more error-prone.
Now a new _syscall_verify_* function exists for every syscall, which
calls the real implementation. The syscall table for the syscall handler
now maps to these verify functions.
Other changes:
- Updated the definition grammar to allow options to have a "key:value"
style, to eventually support capabilities.
- Added an "optional" option for parameters that says a syscall will
accept a null value.
- Some bonnibel fixes, as definition file changes weren't always
properly causing updates in the build dep graph.
- The syscall implementation function signatures are no longer exposed
in syscall.h. Also, the unused syscall enum has been removed.
There has been a global clock object for a while now, but scheduler was
never using it, instead still using its simple increment clock. Now it
uses the hpet clock.
First attempt at a UART driver. I'm not sure it's the most stable. Now
that userspace is handling displaying logs, also removed serial and log
output support from the kernel.
The j6threads command shows the current thread, ready threads, and
blocked threads for a given CPU.
To support this, TCB structs gained a pointer to their thread (instead
of trying to do offset magic) and threads gained a pointer to their
creator. Also removed thread::from_tcb() now that the TCB has a pointer.
The logger had a lot of code that was due to it being in kutil instead
of the kernel. Simplifying it a bit here in order to make the uart
logger easier and eventual paring down of the logger easier.
- Log areas are no longer hashes of their names, just an enum
- Log level settings are no longer saved in 4 bits, just a byte
- System signal updating is done in the logger now
This syscall allows a process to give another process access to an
object it has a handle to. The value of the handle as seen in the
receiver process is returned to the caller, so that the caller may
notify the recipient which handle was given.
In places where the "user" state is available, like interrupt handlers,
panic() and kassert() can now take an optional pointer to that user
cpu_state structure, and the panic handler will print that out as well.
The last line of the boot output was always getting cut off by anything
else getting printed to the serial port. Adding two newlines to clear
the cursor of the previous output.
These commands had a number of issues. They weren't evaluating their
arguments (eg, you couldn't use a symbol name instead of a number), and
they weren't explicitly using hex when evaluating numbers, so they were
getting incorrect values when the default radix was not 10.
A structure, system_config, which is dynamically defined by the
definitions/sysconf.yaml config, is now mapped into every user address
space. The kernel fills this with information about itself and the
running machine.
User programs access this through the new j6_sysconf fake syscall in
libj6.
See: Github bug #242
See: [frobozz blog post](https://jsix.dev/posts/frobozz/)
Tags:
This means the kernel now depends on libj6. I've added the macro
definition __j6kernel when building for the kernel target, so I can
remove parts with #ifdefs.
The last commit was a bandaid, but this needed a real fix. Now we create
a .parse_deps.phony file in every module build dir that implicitly
depends on that module's dependencies' .parse_deps.phony files, as well
as order-only depends on any cog-parsed output for that module. This
causes the cog files to get generated first if they never have been, but
still leaves real header dependency tracking to clang's depfile
generation.
Kernel panics previously only stopped the calling core. This commit
re-implements the panic system to allow us to stop all cores on a panic.
Changes include:
- panic now sends an NMI to all cores. This means we can't control the
contents of their registers, so panic information has been moved to a
global struct, and the panicking cpu sets the pointer to that data in
its cpu_data.
- the panic_handler is now set up with mutexes to print appropriately
and only initialize objects once.
- copying _current_gsbase into the panic handler, and #including the
cpprt.cpp file (so that we can define NDEBUG and not have it try to
link the assert code back in)
- making the symbol data pointer in kargs an actual pointer again, not
an address - and carrying that through to the panic handler
- the number of cpus is now saved globally in the kernel as g_num_cpus
While bonnibel already had the concept of a manifest, which controls
what goes into the built disk image, the bootloader still had filenames
hard-coded. Now bonnibel creates a 'jsix_boot.dat' file that tells the
bootloader what it should load.
Changes include:
- Modules have two new fields: location and description. location is
their intended directory on the EFI boot volume. description is
self-explanatory, and is used in log messages.
- New class, boot::bootconfig, implements reading of jsix_boot.dat
- New header, bootproto/bootconfig.h, specifies flags used in the
manifest and jsix_boot.dat
- New python module, bonnibel/manifest.py, encapsulates reading of the
manifest and writing jsix_boot.dat
- Syntax of the manifest changed slightly, including adding flags
- Boot and Kernel target ccflags unified a bit (this was partly due to
trying to get enum_bitfields to work in boot)
- util::counted gained operator+= and new free function util::read<T>
I just can't get enum_bitfields to work in boot. The same code works in
other targets. None of the compiler options should change that. I gave
up, but I'm leaving these changes in because they do actually handle
const better.
This is a rather large commit that is widely focused on cleaning things
out of the 'junk drawer' that is src/include. Most notably, several
things that were put in there because they needed somewhere where both
the kernel, boot, and init could read them have been moved to a new lib,
'bootproto'.
- Moved kernel_args.h and init_args.h to bootproto as kernel.h and
init.h, respectively.
- Moved counted.h and pointer_manipulation.h into util, renaming the
latter to util/pointers.h.
- Created a new src/include/arch for very arch-dependent definitions,
and moved some kernel_memory.h constants like frame size, page table
entry count, etc to arch/amd64/memory.h. Also created arch/memory.h
which detects platform and includes the former.
- Got rid of kernel_memory.h entirely in favor of a new, cog-based
approach. The new definitions/memory_layout.csv lists memory regions
in descending order from the top of memory, their sizes, and whether
they are shared outside the kernel (ie, boot needs to know them). The
new header bootproto/memory.h exposes the addresses of the shared
regions, while the kernel's memory.h gains the start and size of all
the regions. Also renamed the badly-named page-offset area the linear
area.
- The python build scripts got a few new features: the ability to parse
the csv mentioned above in a new memory.py module; the ability to add
dependencies to existing source files (The list of files that I had to
pull out of the main list just to add them with the dependency on
memory.h was getting too large. So I put them back into the sources
list, and added the dependency post-hoc.); and the ability to
reference 'source_root', 'build_root', and 'module_root' variables in
.module files.
- Some utility functions that were in the kernel's memory.h got moved to
util/pointers.h and util/misc.h, and misc.h's byteswap was renamed
byteswap32 to be more specific.
Now that kutil has no kernel-specific code in it anymore, it can
actually be linked to by anything, so I'm renaming it 'util'.
Also, I've tried to unify the way that the system libraries from
src/libraries are #included using <> instead of "".
Other small change: util::bip_buffer got a spinlock to guard against
state corruption.
Continuing moving things out of kutil. The assert as implemented could
only ever work in the kernel, so remaining kutil uses of kassert have
been moved to including standard C assert instead.
Along the way, kassert was broken out into panic::panic and kassert,
and the panic.serial namespace was renamed panicking.
The moving of kernel-only code out of kutil continues. (See 042f061)
This commit moves the following:
- The heap allocator code
- memory.cpp/h which means:
- letting string.h be the right header for memset and memcpy, still
including an implementation of it for the kernel though, since
we're not linking libc to the kernel
- Changing calls to kalloc/kfree to new/delete in kutil containers
that aren't going to be merged into the kernel
- Fixing a problem with stdalign.h from libc, which was causing issues
for type_traits.
Part two of rearranging kutil code. (See 042f061) Removing unused kutil
headers:
I can imagine that avl_tree or slab_allocated may want to be returned to
at some point, but for now they're just clutter.
Part one of a series of code moves. The kutil library is not very
useful, as most of its code is kernel-specific. This was originally for
testing purposes, but that can be achieved in other ways with the
current build system. I find this mostly creates a strange division in
the kernel code.
Instead, I'm going to move everything kernel-specific to actually be in
the kernel, and replace kutil with just 'util' for generic utility code
I want to share.
This commit:
- Moves the logger into the kernel.
- Updates the 'printf' library used from mpaland/printf to
eyalroz/printf and moved it into the kernel, as it's only used by the
logger in kutil.
- Removes some other unused kutil headers from some files, to help
future code rearrangement.
Note that the (now redundant-seeming) log.cpp/h in kernel is currently
still there - these files are more about log output than the logging
system, and will get replaced once I add user-space log output.
Fixing two page allocation issues I came across while debugging:
- Added a spinlock to the page_table static page cache, to avoid
multiple CPUs grabbing the same page. This cache should probably
just be made into per-CPU caches.
- Fixed a bitwise math issue ("1" instead of "1ull" when working with
64-bit numbers) that made it so that pages were never marked as
allocated when allocating 32 or more.
The j6tw (j6 table walk) command to debug page tables was throwing an
exception for an integer that was too big when the default radix was 16,
because it would interpret ints as hex even without the 0x prefix. Now
j6tw explicitly converts to hex and uses the prefix to be explicit.
Updated kassert to be an actual function, and used the __builtin_*
functions for location data. Updated the panic handler protocol to
include sending location data as three more parameters. Updated the
serial panic handler to display that data along with the (optional)
message.
Add a simple ELF loader to srv.init to load and start any module_program
parameters passed from the bootloader. Also creates stacks for newly
created threads.
Also update thread creation in testapp to create stacks.
Stop creating stacks in user space for user threads, that should be done
by the thread's creator. This change adds process and stack_top
arguments to the thread_create syscall, so that threads can be created
in other processes, and given a stack address.
Also included is a fix in add_thunk_user due to the r11/flags change.
THIS COMMIT BREAKS USERSPACE. See subsequent commits for the user side
changes related to this change.
LLVM was updated by system update, and is now unhappy building because
of libc++ threads settings. I'm explicitly turning them off in base.yml
for now until I can focus on a better fix.
Bonnibel used to copy any asset into the root - now a path can be
specified for assets, and debug output goes into .debug so GDB will pick
it up automatically.
The vm_area objects had a number of issues I have been running into when
working on srv.init:
- It was impossible to map a VMA, fill it, unmap it, and hand it to
another process. Unmapping the VMA in this process would cause all the
pages to be freed, since it was removed from its last mapping.
- If a VMA was marked with vm_flag::zero, it would be zeroed out _every
time_ it was mapped into a vm_space.
- The vm_area_open class was leaking its page_tree nodes.
In order to fix these issues, the different VMA types all work slightly
differently now:
- Physical pages allocated for a VMA are now freed when the VMA is
deleted, not when it is unmapped.
- A knock-on effect from the first point is that vm_area_guarded is now
based on vm_area_open, instead of vm_area_untracked. An untracked area
cannot free its pages, since it does not track them.
- The vm_area_open type now deletes its root page_tree node. And
page_tree nodes will delete child nodes or free physical pages in
their dtors.
- vm_flag::zero has been removed; pages will need to be zeroed out
further at a higher level.
- vm_area also no longer deletes itself only on losing its last handle -
it will only self-delete when all handles _and_ mappings are gone.
The page_tree struct was doing a lot of bit manipulation to keep its
base, level, and flags in a single uint64_t. But since this is such a
large structure anyway, another word doesn't change it much and greatly
simplifies both the code and reasoning about it.
Overall, I believe TOML to be a superior configuration format than YAML
in many situations, but it gets ugly quickly when nesting data
structures. The build configs were fine in TOML, but the manifest (and
my future plans for it) got unwieldy. I also did not want different
formats for each kind of configuration on top of also having a custom
DSL for interface definitions, so I've switched all the TOML to YAML.
Also of note is that this change actually adds structure to the manifest
file, which was little more than a CSV previously.
This change adds a new interface DSL for specifying objects (with
methods) and interfaces (that expose objects, and optionally have their
own methods).
Significant changes:
- Add the new scripts/definitions Python module to parse the DSL
- Add the new definitions directory containing DSL definition files
- Use cog to generate syscall-related code in kernel and libj6
- Unify ordering of pointer + length pairs in interfaces
This change moves Bonnibel from a separate project into the jsix tree,
and alters the project configuration to be jsix-specific. (I stopped
using bonnibel for any other projects, so it's far easier to make it a
custom generator for jsix.) The build system now also uses actual python
code in `*.module` files to configure modules instead of TOML files.
Target configs (boot, kernel-mode, user-mode) now moved to separate TOML
files under `configs/` and can inherit from one another.
I'm a tabs guy. I like tabs, it's an elegant way to represent
indentation instead of brute-forcing it. But I have to admit that the
world seems to be going towards spaces, and tooling tends not to play
nice with tabs. So here we go, changing the whole repo to spaces since
I'm getting tired of all the inconsistent formatting.
I was seeing more ignored interrupts when debugging, trying to shorten
their path more. Adding a separate ISR for ignored interrupts was the
shortest path, but that caused some strange instability that I'm not in
the mood to track down.
When setting a radix other than 10, the j6stack command will start
adding the wrong values - make sure to specify a radix for the offset
explicitly.
Also show the frame pointer for each frame with the j6bt command, and
don't throw exceptions if the name of the block is unknown.
Created the framework for using different loadable panic handlers,
loaded by the bootloader. Initial panic handler is panic.serial, which
contains its own serial driver and stacktrace code.
Other related changes:
- Asserts are now based on the NMI handler - panic handlers get
installed as the NMI interrupt handler
- Changed symbol table generation: now use nm's own demangling and
sorting, and include symbol size in the table
- Move the linker script argument out of the kernel target, and into the
kernel's specific module, so that other programs (ie, panic handlers)
can use the kernel target as well
- Some asm changes to boot.s to help GDB see stack frames - but this
might not actually be all that useful
- Renamed user_rsp to just rsp in cpu_state - everything in there is
describing the 'user' state
Resurrect the existing but unused ELF library in libraries/elf, and use
it instead of boot/elf.h for parsing ELF files in the bootloader.
Also adds a const version of offset_iterator called
const_offset_iterator.
Pull this widely-useful header out of kutil, so more things can use it.
Also replace its dependency on <type_traits> by defining our own custom
basic_types.h which contains a subset of the standard's types.
Create a new usermode program, srv.init, and have it read the initial
module_page args sent to it by the bootloader. Doesn't yet do anything
useful but sets up the way for loading the rest of the programs from
srv.init.
Other (mostly) related changes:
- bootloader: The allocator now has a function for allocating init
modules out of a modules_page slab. Also changed how the allocator is
initialized and passes the allocation register and modules_page list
to efi_main().
- bootloader: Expose the simple wstrlen() to the rest of the program
- bootloader: Move check_cpu_supported() to hardware.cpp
- bootloader: Moved program_desc to loader.h and made the loader
functions take it as an argument instead of paths.
- kernel: Rename the system_map_mmio syscall to system_map_phys, and
stop having it default those VMAs to having the vm_flags::mmio flag.
Added a new flag mask, vm_flags::driver_mask, so that drivers can be
allowed to ask for the MMIO flag.
- kernel: Rename load_simple_process() to load_init_server() and got rid
of all the stack setup routines in memory_bootstrap.cpp and task.s
- Fixed formatting in config/debug.toml, undefined __linux and other
linux-specific defines, and got rid of _LIBCPP_HAS_THREAD_API_EXTERNAL
because that's just not true.
Separate the video mode setting out from the console code into video.*,
and remove the framebuffer from the kernel args, moving it to the new
init args format.
A long overdue cleanup of the src/ tree.
- Moved src/drivers to src/user because it contains more than drivers
- Removed src/drivers/ahci because it's unused - will restore it when I
make a real AHCI driver
- Removed unused src/tools
- Moved kernel.ld (the only used file under src/arch) to src/kernel for
now, if/when there's a multi-platform effort that should be figured
out as part of it
- Removed the rest of the unused src/arch
- Renamed 'fb' to 'drv.uefi_fb' and 'nulldrv' to 'testapp'
The bootloader's load_program was reproducing all loadable program
header sections into new pages. Now only do that for sections containing
BSS sections (eg, where file size and mem size do not match).
The bootloader relied on the kernel to know which parts of memory to not
allocate over. For the future shift of having the init process load
other processes instead of the kernel, the bootloader needs a mechanism
to just hand the kernel a list of allocations. This is now done through
the new bootloader allocator, which all allocation goes through. Pool
memory will not be tracked, and so can be overwritten - this means the
args structure and its other structures like programs need to be handled
right away, or copied by the kernel.
- Add bootloader allocator
- Implement a new linked-list based set of pages that act as allocation
registers
- Allow for operator new in the bootloader, which goes through the
global allocator for pool memory
- Split memory map and frame accouting code in the bootloader into
separate memory_map.* files
- Remove many includes that could be replaced by forward declaration in
the bootloader
- Add a new global template type, `counted`, which replaces the
bootloader's `buffer` type, and updated kernel args structure to use it.
- Move bootloader's pointer_manipulation.h to the global include dir
- Make offset_iterator try to return references instead of pointers to
make it more consistent with static array iteration
- Implement a stub atexit() in the bootloader to satisfy clang
The scheduler queue locks could deadlock if the timer fired before the
scoped lock destructor ran. Also, reduce lock contention by letting only
one CPU steal work at a time.
Spinlock release uses __atomic_compare_exchange_n, which overwrites the
`desired` parameter with the actual value when the compare fails. This
was causing releases to always spin when there was lock contention.
Add the object_wait_many syscall to allow programs to wait for signals
on multiple objects at once. Also removed the object argument to
thread::wait_on_signals, which does nothing with it. That information is
saved in the thread being in the object's blocked threads list.
The kernel's file header has not been verified for a long time. This
change returns file verification to the bootloader to make sure the ELF
loaded in position 0 is actually the kernel.
The kernel::args namespace is really the protocol for initializing the
kernel from the bootloader. Also, the header struct in that namespace
isn't actually a header, but a collection of parameters. This change
renames the namespace to kernel::init and the struct to args.
Update the build_sysroot to build an LLVM 11 sysroot, dealing with
LLVM's change in structure. Also re-add peru.yaml for getting a
pre-built sysroot via peru.
See: https://github.com/buildinspace/peru
Update the cpu data to point to the fake kernel process in
cpu_early_init so there can never be a race condition where the current
process may not be set.
The idle threads for the APs have intentionally tiny stacks. Logging is
currently an absolute hog of stack space, so avoid logging on the idle
stacks as much as possible.
Eventually we should instead just reclaim the physical pages used by
most of the stack instead of making them tiny.
Changing the SFINAE/enable_if strategy from a type to a constexpr
function means that it can be defined in other scopes than the functions
themselves, because of function overloading. This lets us put everything
into the kutil::bitfields namespace, and make bitfields out of enums in
other namespaces. Also took the chance to clean up the implementation a
bit.
Now that the other CPUs have been brought up, add support for scheduling
tasks on them. The scheduler now maintains separate ready/blocked lists
per CPU, and CPUs will attempt to balance load via periodic work
stealing.
Other changes as a result of this:
- The device manager no longer creates a local APIC object, but instead
just gathers relevant info from the APCI tables. Each CPU creates its
own local APIC object. This also spurred the APIC timer calibration to
become a static value, as all APICs are assumed to be symmetrical.
- Fixed a bug where the scheduler was popping the current task off of
its ready list, however the current task is never on the ready list
(except the idle task was first set up as both current and ready).
This was causing the lists to get into bad states. Now a task can only
ever be current or in a ready or blocked list.
- Got rid of the unused static process::s_processes list of all
processes, instead of trying to synchronize it via locks.
- Added spinlocks for synchronization to the scheduler and logger
objects.
KVM didn't like setting all the CR4 bits we wanted at once. I suspect
that means real hardware won't either. Delay the setting of the rest of
CR4 until after the CPU is in long mode - only set PAE and PGE from real
mode.
Because the firmware can set the APIC ids to whatever it wants, add a
sequential index to each cpu_data structure that jsix will use for its
main identifier, or for indexing into arrays, etc.
More and more places in the kernel init code are taking addresses from
the bootloader and translating them to offset-mapped addresses. The
bootloader can do this, so it should.
In order to avoid cyclic dependencies in the case of page faults while
bringing up an AP, pre-allocate the cpu_data structure and related CPU
control structures, and pass them to the AP startup code.
This also changes the following:
- cpu_early_init() was split out of cpu_early_init() to allow early
usage of current_cpu() on the BSP before we're ready for the rest of
cpu_init(). (These functions were also renamed to follow the preferred
area_action naming style.)
- isr_handler now zeroes out the IST entry for its vector instead of
trying to increment the IST stack pointer
- the IST stacks are allocated outside of cpu_init, to also help reduce
stack pressue and chance of page faults before APs are ready
- share stack areas between AP idle threads so we only waste 1K per
additional AP for the unused idle stack
Since SYSCALL/SYSRET rely on MSRs to control their function, split out
syscall_enable() into syscall_initialize() and syscall_enable(), the
latter being called on all CPUs. This affects not just syscalls but also
the kernel_to_user_trampoline.
Additionally, do away with the max syscalls, and just make a single page
of syscall pointers and name pointers. Max syscalls was fragile and
needed to be kept in sync in multiple places.
By adding more debug information to the symbols and adding function
frame preludes to the isr handler assembly functions, GDB sees them as
valid locations for stack frames, and can display backtraces through
interrupts.
Added the command "j6tw <pml4> <addr>" which takes any arguments that
evaluate to addresses or integers. It displays the full breakdown of the
page table walk for the given address, with flags.
Update the existing but unused spinlock class to an MCS-style queue
spinlock. This is probably still a WIP but I expect it to see more use
with SMP getting further integrated.
This very large commit is mainly focused on getting the APs started and
to a state where they're waiting to have work scheduled. (Actually
scheduling on them is for another commit.)
To do this, a bunch of major changes were needed:
- Moving a lot of the CPU initialization (including for the BSP) to
init_cpu(). This includes setting up IST stacks, writing MSRs, and
creating the cpu_data structure. For the APs, this also creates and
installs the GDT and TSS, and installs the global IDT.
- Creating the AP startup code, which tries to be as position
independent as possible. It's copied from its location to 0x8000 for
AP startup, and some of it is fixed at that address. The AP startup
code jumps from real mode to long mode with paging in one swell foop.
- Adding limited IPI capability to the lapic class. This will need to
improve.
- Renaming cpu/cpu.* to cpu/cpu_id.* because it was just annoying in GDB
and really isn't anything but cpu_id anymore.
- Moved all the GDT, TSS, and IDT code into their own files and made
them classes instead of a mess of free functions.
- Got rid of bsp_cpu_data everywhere. Now always call the new
current_cpu() to get the current CPU's cpu_data.
- Device manager keeps a list of APIC ids now. This should go somewhere
else eventually, device_manager needs to be refactored away.
- Moved some more things (notably the g_kernel_stacks vma) to the
pre-constructor setup in memory_bootstrap. That whole file is in bad
need of a refactor.
While working on the double-buffering issue, I ripped out this feature
from scrollback and didn't put it back in. Also having main allocate
extra space for the message buffer since calling malloc/free over again
several times was causing malloc to panic. (Which should also separately
be fixed..)
The frame allocator was causing page faults when exhausting the first
(well, last, because it starts from the end) block of free pages. Turns
out it was just incrementing instead of decrementing and thus running
off the end.
Since the kernel will tell us what size of buffer we need for
j6_system_get_log(), malloc the buffer for it from the heap instead of a
fixed array on the stack.
We have one big array of characters in scrollback, with lines of
constant width -- there's no reason to have an array of pointers when
offsets will do.
Instead of always mapping the framebuffer at an arbitrary location, and
so reporting that to userspace, send the physical address so drivers can
call system_map_mmio().
Probably due to old UEFI page tables going away, some systems failed to
load ACPI tables at their physical location. Make sure to translate them
to kernel offset-mapped addresses.
Create a syscall for drivers to be able to ask the kernel for a VMA that
maps a MMIO area. Also expose vm_flags via j6 table style include file
and new flags.h header.
The 0 index was still sometimes not handled properly in the hash table.
Also 0 is sometimes indicative of an error. Let's just start handles
from 1 to avoid those issues.
We started actually running up against the page boundary for kernel
stacks and thus double-faulting on page faults from kernel space. So I
finally added IST stacks. Note that we currently just
increment/decrement the IST entry by a page when we enter the handler to
avoid clobbering on re-entry, but this means:
* these handlers need to be able to operate with only a page of stack
* kernel stacks always have to be >1 pages
* the amount of nesting possible is tied to the kernel stack size.
These seem fine for now, but we should maybe find a way to use something
besides g_kernel_stacks to set up the IST stacks if/when this becomes an
issue.
The previous method of VMA page tracking relied on the VMA always being
mapped at least into one space and just kept track of pages in the
spaces' page tables. This had a number of drawbacks, and the mapper
system was too complex without much benefit.
Now make VMAs themselves keep track of spaces that they're a part of,
and make them responsible for knowing what page goes where. This
simplifies most types of VMA greatly. The new vm_area_open (nee
vm_area_shared, but there is now no reason for most VMAs to be
explicitly shareable) adds a 64-ary radix tree for tracking allocated
pages.
The page_tree cannot yet handle taking pages away, but this isn't
something jsix can do yet anyway.
ktuil::vector can take a static area of memory as its initial memory,
but the case was never handled where it outgrew that memory and had to
reallocate. Steal the high bit from the capacity value to indicate the
current memory should not be kfree()'d. Also added checks in the heap
allocator to make sure pointers look valid.
Pull syscall code out of libc and create new libj6. This should
eventually become a vDSO, but for now it can still be a static lib.
Also renames all the _syscall_* symbol names to j6_*
In preparation for moving things to the init process, move process
loading out of the scheduler. memory_bootstrap now has a
load_simple_process function for mapping an args::program into memory,
and the stack setup has been simplified (though all the initv values are
still being added by the kernel - this needs rework) and normalized to
use the thread::add_thunk_user code path.
Refactored out vm_space::handle_fault's allocation code into a separate
vm_space::allocate function, and reimplemented handle_fault in terms of
the new function.
This is a minor refactor including:
- Removing old commented-out syscall_dispatch function
- Removing IA32_EFER syscall-enable flag setting (this is done by the
bootloader now)
- Moving much logging from inside process/thread syscalls to the 'task'
log area, allowing for turning the 'syscall' area down to info by
default.
This also prompted a change of the process initialization protocol to
allow handles to get typed, and changing to marking them as just
self/other handls. This also means exposing the object type enum to
userspace.
This makes the job of the kernel easier when marking module pages as
used in the frame allocator. This will also help when sending modules
over to the init process.
The previous frame allocator involved a lot of splitting and merging
linked lists and lost all information about frames while they were
allocated. The new allocator is based on an array of descriptor
structures and a bitmap. Each memory map region of allocatable memory
becomes one or more descriptors, each mapping up to 1GiB of physical
memory. The descriptors implement two levels of a bitmap tree, and have
a pointer into the large contiguous bitmap to track individual pages.
To enable setting sections as NX or read-only, the boot program loader
now loads programs as lists of sections, and the kernel args are updated
accordingly. The kernel's loader now just takes a program pointer to
iterate the sections. Also enable NX in IA32_EFER in the bootloader.
There was previously no good way to block log-display tasks, either the
fb driver or the kernel log task. Now the system object has a signal
(j6_signal_system_has_log) that gets asserted when the log is written
to.
Now that the spinwait bug is fixed, the raised time for APIC calibration
can be put back to a lower value. It was previously raised thinking more
time would get a more accurate result -- but accuracy was not the issue.
The endpoint syscalls endpoint_recv and endpoint_sendrecv gained new
local stack variables for calling into possibly blocking endpoint
functions, but the len variable was being initialized to 0 instead of
the incoming buffer size.
The amount of spurious IRQ activity on real hardware severely slows down
the system (minutes per frame instead of frames per second). There's no
reason to unmask all of them from the get-go before they're set up to be
handled.
In order to allow the bootloader to do preliminary CPUID validation
while UEFI is still handling displaying information to the user, split
most of the kernel's CPUID handling into a library to be used by both
kernel and boot.
Several changes were needed to make this work:
- Update the page_table::flags to understand memory caching types
- Set up the PAT MSR to add the WC option
- Make page-offset area mapped as WT
- Add all the MTRR and PAT MSRs, and log the MTRRs for verification
- Add a vm_area flag for write_combining
Scanline size can differ from horizontal resolution in some
framebuffers. This isn't currently handled, but at least log it so
it's visible if this lack of handling is a potential error.
The UEFI spec specifically calls out memory types with the high bit set
as being available for OS loaders' custom use. However, it seems many
UEFI firmware implementations don't handle this well. (Virtualbox, and
the firmware on my Intel NUC and Dell XPS laptop to name a few.)
So sadly since we can't rely on this feature of UEFI in all cases, we
can't use it at all. Instead, treat _all_ memory tagged as EfiLoaderData
as possibly containing data that's been passed to the OS by the
bootloader and don't free it yet.
This will need to be followed up with a change that copies anything we
need to save and frees this memory.
See: https://github.com/kiznit/rainbow-os/blob/master/boot/machine/efi/README.md
The UEFI spec specifically calls out memory types with the high bit set
as being available for OS loaders' custom use. However, it seems many
UEFI firmware implementations don't handle this well. (Virtualbox, and
the firmware on my Intel NUC and Dell XPS laptop to name a few.)
So sadly since we can't rely on this feature of UEFI in all cases, we
can't use it at all. Instead, treat _all_ memory tagged as EfiLoaderData
as possibly containing data that's been passed to the OS by the
bootloader and don't free it yet.
This will need to be followed up with a change that copies anything we
need to save and frees this memory.
See: https://github.com/kiznit/rainbow-os/blob/master/boot/machine/efi/README.md
After exiting UEFI, the bootloader had no way of displaying status to
the user. Now it will display a series of small boxes as a progress bar
along the bottom of the screen if a framebuffer exists. Errors or
warnings during a step will cause that step's box to turn red or orange,
and display bars above it to signal the error code.
This caused the simplification of the error handling system (which was
mostly just calling status_line::fail) and added different types of
status objects.
The endpoint receive syscalls can block and then write to userspace
memory. Since the current address space may be different after blocking,
make sure to only actually write to the user memory after returning to
the syscall handler - pass values that are on the syscall handler stack
deeper into the kernel.
It was not consistent how processes got handles to themselves or their
threads, ending up with double entries. Now make such handles automatic
and expose them with new self_handle() methods.
Using a hash of zero to signal an empty slot doesn't play nice with the
hash_node specialization that uses the key for the hash, when 0 is a
common key.
I thought it would be ok, that it'd just be something to remember. But
then I used 0 as a key anyway, so clearly it was a bad idea.
Now that the ELF loader is known to be working correctly, remove its
extra print statements about section loading to keep the bootloader
output to one screen.
If there's no video, do as we did before, otherwise route logs to the fb
driver instead. (Need to clean this up to just have a log consumer
general interface?) Also added a "scrollback" class to fb driver and
updated the system_get_log syscall.
Moved old PSF parsing code from kernel, and switched to embedding whole
PSF instead of just glyph data to make font class the same code paths
for both cases.
For the fb driver to have a font before loading from disk is available,
create a hard-coded font as a byte array.
To create this, added a new scripts/psf_to_cpp.py which also refactored
out much of scripts/parse_font.py into a new shared module
scripts/fontpsf.py.
Move process init from each process needing a main.s with _start to
crt0.s in libc. Also change to a sysv-like initial stack with a
j6-specific array of initialization values after the program arguments.
Improve the j6stack command in two ways: first, swap the order of the
arguments, as depth is much more likely to be changed. Second, on any
exception accessing memory in the stack, print the exception and
continue instead of failing the whole command.
Create a new framebuffer driver. Also hackily passing frame buffer size
in the list of init handles to all processes and mapping the framebuffer
into all processes. Changed bootloader passing frame buffer as a module
to its own struct.
The console's putc() was looking for CRs and if it saw one, appending an
LF. The output was only writing LFs, though, so instead what's needed is
to look for LFs, and if it sees one, insert a CR first.
On my laptop, a few symbol names raise errors as invalid when attempting
to demangle them. This should not stop the rest of the symbol table from
building.
Previously kutil::vector used size_t as its size type. Since most uses
in the kernel will never approach 4 billion items, default the size type
to uint32_t but make it an optional template argument. This saves 8
bytes per vector, which can be non-trivial with lots of vectors.
In order to implement capabilities on system resources like IRQs so that
they may be restricted to drivers only, add a new 'system' kobject type,
and move the bind_irq functionality from endpoint to system.
Also fix some stack bugs passing the initial handles to a program.
- Add a tag field to all endpoint messages, which doubles as a
notification field
- Add a endpoint_bind_irq syscall to enable an endpoint to listen for
interrupt notifications. This mechanism needs to change.
- Add a temporary copy of the serial port code to nulldrv, and let it
take responsibility for COM2
Remove ELF and initrd loading from the kernel. The bootloader now loads
the initial programs, as it does with the kernel. Other files that were
in the initrd are now on the ESP, and non-program files are just passed
as modules.
The vm_area_shared type of VMA used to track mappings in a separate
array, which doubled information and wasted space. This was no longer
used, and is now removed.
Instead of making every callsite that may make a thread do a blocking
operation also invoke the scheduler, move that logic into thread
implementation - if the thread is blocking and is the current thread,
call schedule().
Related changes in this commit:
- Also make exiting threads and processes call the scheduler when
blocking.
- Threads start blocked, and get automatically added to the scheduler's
blocked list.
The allowed flag was janky and easy to get lost when doing page table
manipulation. All allocation goes throug vm_area now, so 'allowed' can
be dropped.
The rcx register is used by the function call ABI for the 4th argument,
but is also clobbered by SYSCALL to hold the IP. The r10 register is
caller-saved but not part of the ABI, so stash rcx there when crossing
the syscall boundary.
The vm_space allow() functionality was a bit janky; using VMAs for all
regions would be a lot cleaner. To that end, this change:
- Adds a "static array" ctor to kutil::vector for setting the kernel
address space's VMA list. This way a kernel heap VMA can be created
without the heap already existing.
- Splits vm_area into different subclasses depending on desired behavior
- Splits out the concept of vm_mapper which maps vm_areas to vm_spaces,
so that some kinds of VMA can be inherently single-space
- Implements VMA resizing so that userspace can grow allocations.
- Obsolete page_table_indices is removed
Also, the following bugs were fixed:
- kutil::map iterators on empty maps no longer break
- memory::page_count was doing page-align, not page-count
See: Github bug #242
See: [frobozz blog post](https://jsix.dev/posts/frobozz/)
Tags:
Finished the VMA kobject and added the related syscalls. Processes can
now allocate memory! Other changes in this commit:
- stop using g_frame_allocator and add frame_allocator::get()
- make sure to release all handles in the process dtor
- fix kutil::map::iterator never comparing to end()
Added the syscalls/helpers.h file to templatize common kobject syscall
operations. Also moved most syscall implementations to using
process::current() and thread::current() instead of asking the
scheduler.
A check was added in scheduler::prune() which defers deleting threads
and processes if they're the current ones. However, they were still
getting removed from the block list, so they were being leaked.
vm_space no longer relies on page_manager to map pages during a page
fault. Other changes that come with this commit:
- C++ standard has been changed to C++17
- enum bitfield operators became constexpr
- enum bifrield operators can take a mix of ints and enum arguments
- added page table flags enum instead of relying on ints
- remove page_table::unmap_table and page_table::unmap_pages
As mentioned in the last commit, with processes owning spaces, there was
a weird extra space in the "kernel" process that owns the kernel
threads. Now we use that space as the global kernel space, and don't
create a separate one.
vm_space and page_table continue to take over duties from
page_manager:
- creation and deletion of address spaces / pml4s
- cross-address-space copies for endpoints
- taking over pml4 ownership from process
Also fixed the bug where the wrong process was being set in the cpu
data.
To solve: now the kernel process has its own vm_space which is not
g_kernel_space.
This is the first commit of several reworking the VM system. The main
focus is replacing page_manager's global functionality with objects
representing individual VM spaces. The main changes in this commit were:
- Adding the (as yet unused) vm_area object, which will be the main
point of control for programs to allocate or share memory.
- Replace the old vm_space with a new one based on state in its page
tables. They will also be containers for vm_areas.
- vm_space takes over from page_manager as the page fault handler
- Commented out the page walking in memory_bootstrap; I'll probably need
to recreate this functionality, but it was broken as it was.
- Split out the page_table.h implementations from page_manager.cpp into
the new page_table.cpp, updated it, and added page_table::iterator as
well.
Add an iterator type to kutil::map, and allow for each loops. Also
unify the compare() signature expected by sorting containers, and fixes
to adding and sorting in kutil::vector.
Replace linearly-indexed vector of handles with new kutil::map. Also
provide thread::current() and process::current() accessors so that every
syscall doesn't need to include the scheduler to deduce the current
process.
Added the kutil::map collection, an open addressing, robinhood hash map
with backwards shift deletes. Also added hash.h with templated
implementations of the FNV-1a 64 bit hash function, and pulled the log2
function out of the heap_allocator code into the new util.h.
The tests clearly haven't even been built in a while. I've added a
helper script to the project root. Also added a kassert() handler that
will allow tests to catch or fail on asserts.
The buffer cache will try to clean up a returned buffer by unmapping it,
but it may have only been committed without ever getting mapped. Allow
for page_out to handle non-fully mapped regions.
Defer from calling process::thread_exited() in scheduler::prune() if the
thread in question is the currently-executing thread, so that we don't
blow away the stack we're executing on. The next call to prune will pick
up the exited thread.
The "fake" stdout channel is now being passed in the new j6_process_init
structure to processes, and nulldrv now uses it to print a message to
the console.
Multiple changes regarding channels. Mainly channels are now stream
based and can handle partial reads or writes. Channels now use the
kernel buffers area with the related buffer_cache. Added a fake stdout
stream channel and kernel task to read its contents to the screen in
preparation for handing channels as stdin/stdout to processes.
Renamed and genericized the stack_cache class to manage any address
range area of buffers or memory regions. Removed singleton and created
some globals that map to different address regions (kernel stacks,
kernel buffers).
Tags: vmem virtual memeory
When committing an area of vmem and splitting from a larger block, the
block that is returned was set to the unknown state, and the leading
block was incorrectly set to the desired state.
Also remove extra unused thread ctor.
Added the functionality for unreseve and uncommit to vm_space. Also
added the documented but not implemented ability to pass a 0 address to
get any address region with the given size.
Create a script that builds a simple-to-read symbol table from the
output of `nm`. Include running that script over the kernel in the
build, and including that output in the initrd.
Tags: callstack debugging
The bootloader now creates all PD tables in kernel space, so remove
memory_bootstrap.cpp code that dealt with cases where there was no PD
for a given range, and kassert that all PDs exist.
Also deal with the case where the final PD exists, which never committed
the last address range.
We were previously allocating kernel stacks as large objects on the
heap. Now keep track of areas of the kernel stack area that are in use,
and allocate them from there. Also required actually implementing
vm_space::commit(). This still needs more work.
The scheduler singleton was getting constructed twice, once at static
time and then again in main(). Make the singleton a pointer so we only
construct it once.
Process PML4s all point their high (kernelspace) entries at the same set
of PDs, but that copying only happens on process creation. New PDs added
to the kernel PML4 won't get shared among other PML4s. This change
instantiates empty PDs for all PML4 entries in the higher half to make
sure this can't happen.
When moving heap allocation order calculation to use __builtin_clz, we
based that calculation off of the length requested, not the totla length
including the memory header size.
Previously we were using a less efficient loop method of finding the
appropriate block size order, now we use __builtin_clz, which should use
the CPU's clz instruction if it's available.
Heap allocator is a buddy allocator that deals with power-of-two block
sizes. Previously it referred to both a number of bytes and an order of
magnitude as a 'size'. Rename functions and variables referring to
orders of magnitude to 'order'.
Tags: pedantry
Previously we added startup_bonus to work around a segfault happening
when we preemted a newly created process instead of letting it give up
the CPU. Bug is not longer occuring, though that makes me nervous.
Add a system call to assert signals on a given object, only within the
range of user-settable signals. Also made object_wait return
immediately if any of the given signals are already set.
Add the channel object for sending messages between threads. Currently
no good of passing channels to other threads, but global variables in a
single process work. Currently channels are slow and do double copies,
need to refine more.
Tags: ipc
Implement the syscalls necessary for threads to create other threads in
their same process. This involved rearranging a number of syscalls, as
well as implementing object_wait and a basic implementation of a
process' list of handles.
The TCB is always stored at a constant offset within the thread object.
So instead of carrying an extra pointer, just implement thread::from_tcb
to get the thread.
Re-implent the concept of processes as separate from threads, and as a
kobject API object. Also improve scheduler::prune which was doing some
unnecessary iterations.
Calling `spinwait()` was hanging due to improper computation of the
clock rate because justin did a dumb at math. Also the period can be
greater than 1ns, so the clock's units were updated to microseconds.
When loading ELF headers (as opposed to sections), the `file_size` of
the data may be smaller than the `mem_size` of the section to be loaded
in memory. Don't blindly copy `mem_size` bytes from the ELF file, but
instead only `file_size`, then zero the rest.
Tags: elf loader
A few updates to scheduler policies:
* Grant processes a startup timeslice bonus for time spent loading the
process
* Grant processes a small fraction of a timeslice for yielding the CPU
with time left
Create a clock class which can be queried for current timestamp in
nanoseconds. Also implements a simple HPET class as one possible clock
source.
Tags: time
The scheduler again tracks remaining timeslice. Timeslices are bigger,
but once a process uses all of its timeslice, it's demoted and
replenished at the next priority. The scheduler also tracks the last
time a process ran, and promotes it if it's been starved for twice its
full timeslice.
TODO: replenish a small amount of timeslice each time a process is run,
so that more interactive processes keep their priorities.
Instead of many timer interrupts and decrementing a process' remaining
quanta, change to setting a single timer for when a process should be
preempted. If it uses its whole timeslice, demote it. If it uses less
than half before blocking, promote it. Determine timeslice based on
priority as well.
This change also required changing the apic timer interface to be purely
interval (in microseconds) based instead of its previous interval/tick
hybrid.
The bip buffer implementation was not initializing it's write-state
member, which was causing logs to always fail when the logger was not in
immediate mode.
Removing the `allocator.h` file defining the `kutil::allocator`
interface, now that explicit allocators are not being passed around.
Also removed the unused `frame_allocator::raw_allocator` class and
`kutil::invalid_allocator` object.
Tags: memory
Reformat the cpu_features.inc file and add the `in_hv` feature that is
supposedly set by hypervisors when running in emulation. QEMU does not
set it.
Tags: cpuid
Before global constructors were working, I had added a hack to zero out
the static vector member of in `slab_allocated`. Now that they are
working, this can be removed.
Back when the bootloader could only mirror higher half memory linearly
to the lower half, the kernel couldn't be loaded at the beginning of
kernel space because it was unlikely to find enough pages there, so it
was loaded at +1MiB in kernel space. Now the bootloader can map any
pages to the necessary locations, the offset can be removed.
Tags: elf linker virtual memory
Look up the global constructor list that the linker outputs, and run
them all. Required creation of the `kutil::no_construct` template for
objects that are constructed before the global constructors are run.
Also split the `memory_initialize` function into two - one for just
those objects that need to happen before the global ctors, and one
after.
Tags: memory c++
Many kernel objects had to keep a hold of refrences to allocators in
order to pass them on down the call chain. Remove those explicit
refrences and use `operator new`, `operator delete`, and define new
`kalloc` and `kfree`.
Also remove `slab_allocator` and replace it with a new mixin for slab
allocation, `slab_allocated`, that overrides `operator new` and
`operator free` for its subclass.
Remove some no longer used related headers, `buddy_allocator.h` and
`address_manager.h`
Tags: memory
With the new bootloader changes that use clang to directly build an EFI
application, the last piece of GNU-EFI that was used (the linker script)
is no longer necessary.
Thanks for being a great starting point, GNU-EFI!
Eventually the UEFI headers should be brought in from their own project,
but for now, like the other projects under external/, these are being
copied into this repository.
Tags: boot uefi
The last bug getting back to par with master - looks like there might be
threading issues with the logger task at the moment. Turning it off for
now.
Tags: log bug todo
There were a few lingering bugs due to places where 510/511 were
hard-coded as the kernel-space PML4 entries. These are now constants
defined in kernel_memory.h instead.
Tags: boot memory paging
GDB works far better now with QEMU's `-S` flag. No longer does it
complain about changing the target from 32 to 64 bits. Get rid of the
old `waiting` loop and `sleep` call in the GDB config for the kernel.
Tags: debugging
`kutil::vector` was calling `operator delete []` on memory that had not
been allocated with `operator new []`, and so was deleting the wrong
pointer.
Tags: bug memory allocator
The `kernel_main()` had a lot change out from under it with the
bootloader changes. This change brings most of it back in line with the
new kernel arguments.
Tags: pml4 paging boot
Created a new `memory_initialize()` function that uses the new-style
kernel args structure from the new bootloader.
Additionally:
* Fixed a hard-coded interrupt EOI address that didn't work with new
memory locations
* Make the `page_manager::fault_handler()` automatically grant pages
in the kernel heap
Tags: boot page fault
At some point, `init_console()` ended up not being before the first
usage of some `log::` functions, which were jumping off into garbage.
Tags: initialization boot
The bootloader was previously just passing the memory map as a module,
but the memory map is important enough to want a direct pointer, instead
of having to search the modules.
The `kernel_offset` and `page_offset` had already been updated with
previous bootloader changes, but `kernel_max_heap` had not. Also, make
all the constants `constexpr` instead of `static const` that would live
in multiple TUs.
When `page_entry_iterator` became a template and changed its static shifts
translating virtual address to table indices into a for loop, that loop
was getting the indices backwards (ie, PML4E index was really the PTE
index, and so on).
Tags: paging
* When using the non-allocating version of `get_uefi_mappings` the
length was not getting set. Reworked this function.
* Having `build_kernel_mem_map` from `bootloader_main_uefi` caused it to
get an out of date map key. Moved this function into `efi_main` right
before exiting boot services.
Searching for `hlt` in disassembly is an easy way to find the error
handler. This change centralizes it to just one, to better match
disassembly with code.
The page table code had been copied mostly verbatim from the kernel, and
was a dense mess. I abstraced the `page_table_indices` class and the old
loop behavior of `map_in` into a new `page_entry_iterator` class, making
both `map_pages` and the initial offset mapping code much cleaner.
Tags: vmem paging
Set up initial page tables for both the offset-mapped area and the
loaded kernel code and data.
* Got rid of the `loaded_elf` struct - the loader now runs after the
initial PML4 is created and maps the ELF sections itself.
* Copied in the `page_table` and `page_table_indices` from the kernel,
still need to clean this up and extract it into shared code.
* Added `page_table_cache` to the kernel args to pass along free pages
that can be used for initial page tables.
Tags: paging
Clang needs memset, memcpy, etc to exist even in freestanding situations
because it will emit calls to those functions. This commit adds a simple
weak-linked memset implementation.
The `build_kernel_mem_map` function now calls `get_uefi_mappings`
itself, instead of having the efi map passed in. `get_uefi_mappings`
also now takes a `bool allocate` to direct it to actually allocate
the map or not. If it doesn't, it instead just returns the size of
the map and the metadata - which `build_kernel_mem_map` uses to decide
how much space to first allocate for the kernel's map.
Adding this now because I'm sure I'll forget later. Make sure to
annotate the entrypoint function pointer as `__attribute__((sysv_abi))`
so that it's not called via ms abi like the rest of the loader.
Exiting boot services can't actually be done from inside
`bootloader_uefi_main`, because there are objects in that scope that run
code requiring boot services in their destructors.
Also added `support.cpp` with `memcpy` because clang will emit
references to `memcpy` even in freestanding mode.
Added a `debug_break` function to allow for faking breakpoints when
connecting to the bootloader with GDB.
Tags: debug
The `get_mappings()` function was getting too large, and some of its
output is needed by more than just the building of the kernel map. Split
it out into two.
Tags: boot memory
- The old kernel_args structure is now mostly represented as a series of
'modules' or memory ranges, tagged with a type. An arbitrary number
can be passed to the kernel
- Update bootloader to allocate space for the args header and 10 module
descriptors
<?xml version="1.0" encoding="UTF-8" standalone="no"?><!-- Generator: Gravit.io --><svgxmlns="http://www.w3.org/2000/svg"xmlns:xlink="http://www.w3.org/1999/xlink"style="isolation:isolate"viewBox="176.562 356.069 211.11 113"width="211.11pt"height="113pt"><g><g><rectx="176.562"y="356.069"width="211.11"height="113"transform="matrix(1,0,0,1,0,0)"fill="rgb(255,255,255)"/><g><pathd=" M 212.981 372.36 L 219.564 376.16 L 226.147 379.961 L 226.147 387.563 L 226.147 395.164 L 219.564 398.965 L 212.981 402.766 L 206.398 398.965 L 199.815 395.164 L 199.815 387.563 L 199.815 379.961 L 206.398 376.16 L 212.981 372.36 L 212.981 372.36 L 212.981 372.36 Z M 256.292 397.366 L 262.875 401.166 L 269.458 404.967 L 269.458 412.569 L 269.458 420.17 L 262.875 423.971 L 256.292 427.772 L 249.709 423.971 L 243.126 420.17 L 243.126 412.569 L 243.126 404.967 L 249.709 401.166 L 256.292 397.366 L 256.292 397.366 Z M 183.622 387.283 L 205.52 374.64 L 227.418 361.997 L 249.316 374.64 L 271.214 387.283 L 271.214 412.569 L 271.214 437.854 L 249.316 450.497 L 227.418 463.14 L 205.52 450.497 L 183.622 437.854 L 183.622 412.569 L 183.622 387.283 L 183.622 387.283 L 183.622 387.283 Z M 241.855 372.36 L 248.438 376.16 L 255.021 379.961 L 255.021 387.563 L 255.021 395.164 L 248.438 398.965 L 241.855 402.766 L 235.272 398.965 L 228.689 395.164 L 228.689 387.563 L 228.689 379.961 L 235.272 376.16 L 241.855 372.36 Z "fill-rule="evenodd"fill="rgb(49,79,128)"/><pathd=" M 298.642 379.579 L 291.621 379.579 L 291.621 372.558 L 298.642 372.558 L 298.642 379.579 Z M 285.214 446.718 L 285.214 441.452 L 287.32 441.452 L 287.32 441.452 Q 289.339 441.452 290.524 440.092 L 290.524 440.092 L 290.524 440.092 Q 291.708 438.731 291.708 436.625 L 291.708 436.625 L 291.708 387.039 L 298.729 387.039 L 298.729 436.011 L 298.729 436.011 Q 298.729 440.925 295.921 443.822 L 295.921 443.822 L 295.921 443.822 Q 293.113 446.718 288.286 446.718 L 288.286 446.718 L 285.214 446.718 Z M 306.628 432.676 L 306.628 427.41 L 314.088 427.41 L 314.088 427.41 Q 317.862 427.41 319.573 425.347 L 319.573 425.347 L 319.573 425.347 Q 321.285 423.285 321.285 419.95 L 321.285 419.95 L 321.285 419.95 Q 321.285 417.317 319.705 415.474 L 319.705 415.474 L 319.705 415.474 Q 318.125 413.631 314.966 411.174 L 314.966 411.174 L 314.966 411.174 Q 312.245 408.98 310.621 407.356 L 310.621 407.356 L 310.621 407.356 Q 308.998 405.732 307.813 403.319 L 307.813 403.319 L 307.813 403.319 Q 306.628 400.905 306.628 397.746 L 306.628 397.746 L 306.628 397.746 Q 306.628 393.095 309.744 390.067 L 309.744 390.067 L 309.744 390.067 Q 312.859 387.039 318.125 387.039 L 318.125 387.039 L 325.76 387.039 L 325.76 392.305 L 319.441 392.305 L 319.441 392.305 Q 313.21 392.305 313.21 398.185 L 313.21 398.185 L 313.21 398.185 Q 313.21 400.467 314.615 402.134 L 314.615 402.134 L 314.615 402.134 Q 316.019 403.802 319.003 406.083 L 319.003 406.083 L 319.003 406.083 Q 321.723 408.19 323.479 409.901 L 323.479 409.901 L 323.479 409.901 Q 325.234 411.613 326.463 414.202 L 326.463 414.202 L 326.463 414.202 Q 327.691 416.791 327.691 420.301 L 327.691 420.301 L 327.691 420.301 Q 327.691 426.532 324.4 429.604 L 324.4 429.604 L 324.4 429.604 Q 321.109 432.676 315.141 432.676 L 315.141 432.676 L 306.628 432.676 Z M 342.611 379.579 L 335.59 379.579 L 335.59 372.558 L 342.611 372.558 L 342.611 379.579 Z M 342.611 432.676 L 335.59 432.676 L 335.59 387.039 L 342.611 387.039 L 342.611 432.676 Z M 356.126 432.676 L 348.754 432.676 L 361.392 409.77 L 349.632 387.039 L 356.39 387.039 L 364.639 403.187 L 372.977 387.039 L 379.735 387.039 L 367.974 409.77 L 380.612 432.676 L 373.24 432.676 L 364.639 416.001 L 356.126 432.676 Z "fill="rgb(49,79,128)"/></g></g></g></svg>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.