Initial support for XSAVE, but not XSAVEOPT or XSAVEC:
- Enable XSAVE and set up xcr0 for all CPUs
- Allocate XSAVE area for all non-kernel threads
- Call XSAVE and XRSTOR on task switch
This commit does a number of things to start the transition of channels
from kernel to user space:
- Remove channel objects / syscalls from the kernel
- Add mutex type in libj6
- Add condition type in libj6
- Add a `ring` type flag for VMA syscalls to create ring buffers
- Implement a rudimentary shared memory channel using all of the above
Add the syscalls j6_futex_wait and j6_futex_wake. Currently marking this
as WIP as they need more testing.
Added to support futexes:
- vm_area and vm_space support for looking up physical address for a
virtual address
- libj6 mutex implementation using futex system calls
Previously processes and threads would be deleted by the scheduler. Now,
only delete them based on refcounts - this allows joining an
already-exited thread, for instance.
Previously event tried to read its value in event::wake_observer, which
required jumping through some hoops in how wait_queue was designed, so
that a value wouldn't be wasted if the wait_queue was empty. Now, read
the event value in event::wait after returning from the thread::block
call instead, which simplifies the whole process and lets us simplify
the wait_queue API as well.
For the coming switch to cap/handle ref-counting being the main lifetime
determiner of objects, get rid of self handles for threads and processes
to avoid circular references. Instead, passing 0 to syscalls expecting a
thread or process handle signifies "this process/thread".
There was an inverted boolean logic in determining how many consecutive
pages were available.
Also adding some memory debugging tools I added to track down the recent
memory bugs:
- A direct debugcon::write call, for logging to the debugcon without the
possible page faults with the logger.
- A new vm_space::lock call, to make a page not fillable in memory
debugging mode
- A mode in heap_allocator to always alloc new pages, and lock freed
pages to cause page faults for use-after-free bugs.
- Logging in kobject on creation and deletion
- Page table cache structs are now page-sized for easy pointer math
Add a version of thread::block() that takes a lock and releases it after
marking the thread as unready, but before calling the scheduler.
Use this version of block() in the wait_queue.
Split out different constants for scheduler::idle_priority and
scheduler::max_priority, so that threads never fall to the same priority
level as the idle threads.
Previously process::exit() was going through the threads in order
calling thread::exit() - which blocks and never wakes if called on the
current thread. Since the current thread likely belongs to the process
which is exiting, and the current thread wasn't guaranteed to be last in
the list, this could leave threads not cleaned up.
Worse, no matter what, this caused the m_threads_lock to always be held
forever on exit, keeping the scheduler from ever finishing a call to
process::thread_exited() on its threads.
Another spot I meant to go back and clean up with a lock - found it when
a process with threads running on two CPUs exited, and the scheduler
tried to delete the process on both CPUs.
This commit fixes the mailbox tests in test_runner, which broke when
mailbox was simplified to just use call and respond. It also fixes a
bug the tests uncovered: if the mailbox is closed while a caller is in
the reply map (ie, when its call data has been passed on to a thread
calling respond, but has yet to be responded to itself), that caller is
never awoken.
This commit changes the add_user_thunk to point to a new routine,
initialize_user_cpu, which sets all the registers that were previously
unset when starting a new user thread. The values for rdi and rsi are
popped off the initial stack values that add_user_thunk sets up, so that
user thread procs can take up to two arguments.
To suppor this, j6_thread_create gained two new arguments, which are
passed on to the thread.
This also let me finally get rid of the hack of passing an argument in
rsp when starting init.
In prep for the coming change to keep log entries as a true ring buffer,
move the log buffer from bss into its own memory section.
Related changes in this commit:
- New vm_area_ring, which maps a set of pages twice to allow easy linear
reading of data from a ring buffer when it wraps around the end.
- logger_init() went away, and the logger ctor is called from
mem::initialize()
- Instead of an event object, the logger just has a bare wait_queue
- util::counted::from template type changed slightly to allow easy
conversion from an intptr_t as well as a pointer
- Previously added debugcon_logger code removed - this will be added in
a separate file in a followup commit
In preparation for futexes, I wanted to make kobjects a bit lighter.
Storing 32 bits of object id, and 8 bits of type (and not ending the
class in a ushort for handle count, which meant all kobjects were likely
to have a bunch of pad bytes), the kobject class data is now just one 8
byte word.
Also from this, change logs that mention threads or processes from
printing the full koid to just 2 bytes of object id from both process
and thread, which makes following the logs much easier.
The status code from thread exit had too many issues, (eg, how does it
relate to process exit code? what happens when different threads exit
with different exit codes?) and not enough value, so I'm getting rid of
it.
A number of simplifications of mailboxes now that the interface is much
simpler, and synchronous.
* call and respond can now only transfer one handle at a time
* mailbox objects got rid of the message queue, and just have
wait_queues of blocked threads, and a reply_to map.
* threads now have a message_data struct on them for use by mailboxes
Instead of handles / capabilities having numeric ids that are only valid
for the owning process, they are now global in a system capabilities
table. This will allow for specifying capabilities in IPC that doesn't
need to be kernel-controlled.
Processes will still need to be granted access to given capabilities,
but that can become a simpler system call than the current method of
sending them through mailbox messages (and worse, having to translate
every one into a new capability like was the case before). In order to
track which handles a process has access to, a new node_set based on
node_map allows for an efficient storage and lookup of handles.
The kernel log levels are now numerically reversed so that more-verbose
levels can be added to the end. Replaced 'debug' with 'verbose', and
added new 'spam' level.
This commit contains a number of related mailbox issues:
- Add extra parameters to mailbox_respond_receive to allow both the
number of bytes/handles passed in, and the size of the byte/handle
buffers to be passed in.
- Don't delete mailbox messages on receipt if the caller is waiting on
reply
- Correctly pass status messages along with a mailbox::replyer object
- Actually wake the calling thread in the mailbox::replyer dtor
- Make sure to release locks _before_ calling thread::wake() on blocked
threads, as that may cause them to be scheduled ahead of the current
thread.
This commit adds a new flag, j6_channel_block, and a new flags param to
the channel_receive syscall. When the block flag is specified, the
caller will block waiting for data on the channel if the channel is
empty.
When waking another thread, if that thread has a more urgent priority
than the current thread on the same CPU, send that CPU an IPI to tell it
to run its scheduler.
Related changes in this commit:
- Addition of the ipiSchedule isr (vector 0xe4) and its handler in
isr_handler().
- Change the APIC's send_ipi* functions to take an isr enum and not an
int for their vector parameter
- Thread TCBs now contain a pointer to their current CPU's cpu_data
structure
- Add the maybe_schedule() call to the scheduler, which sends the
schedule IPI to the given thread's CPU only when that CPU is running a
less-urgent thread.
- Move the locking of a run queue lock earlier in schedule() instead of
taking the lock in steal_work() and again in schedule().
The new mailbox kernel object API offers asynchronous message-based IPC
for sending data and handles between threads, as opposed to endpoint's
synchronous model.
In preparation for the new mailbox IPC model, blocking threads needed an
overhaul. The `wait_on_*` and `wake_on_*` methods are gone, and the
`block()` and `wake()` calls on threads now pass a value between the
waker and the blocked thread.
As part of this change, the concept of signals on the base kobject class
was removed, along with the queue of blocked threads waiting on any
given object. Signals are now exclusively the domain of the event object
type, and the new wait_queue utility class helps manage waiting threads
when an object does actually need this functionality. In some cases (eg,
logger) an event object is used instead of the lower-level wait_queue.
Since this change has a lot of ramifications, this large commit includes
the following additional changes:
- The j6_object_wait, j6_object_wait_many, and j6_thread_pause syscalls
have been removed.
- The j6_event_clear syscall has been removed - events are "cleared" by
reading them now. A new j6_event_wait syscall has been added to read
events.
- The generic close() method on kobject has been removed.
- The on_no_handles() method on kobject now deletes the object by
default, and needs to be overridden by classes that should not be.
- The j6_system_bind_irq syscall now takes an event handle, as well as a
signal that the IRQ should set on the event. IRQs will cause a waiting
thread to be woken with the appropriate bit set.
- Threads waking due to timeout is simplified to just having a
wake_timeout() accessor that returns a timestamp.
- The new wait_queue uses util::deque, which caused the disovery of two
bugs in the deque implementation: empty deques could still have a
single array allocated and thus return true for empty(), and new
arrays getting allocated were not being zeroed first.
- Exposed a new erase() method on util::map that takes a node pointer
instead of a key, skipping lookup.
If the thread waiting is the current thread, it should have the result
when it wakes. Might as well return it, so that syscalls that know
they're putting the current thread to sleep can get the result easily.
The return of slab_allocated! Now after the kutil/util/kernel giant
cleanup, this belongs squarely in the kernel, and works much better
there. Slabs are allocated via a bump pointer into a new kernel VMA,
instead of using kalloc() or allocating pages directly.
Adding the util::deque container, implemented with the util::linked_list
of arrays of items.
Also, use the deque for a kobject's blocked thread list to maintain
order instead of a vector using remove_swap().
Added the handle_clone syscall which allows for cloning a handle with
a subset of the original handle's capabilities.
Related changes:
- srv.init now calls handle_clone on its system handle, and load_program
was changed to allow this second system handle to be passed to loaded
programs instead. However, as drv.uart is still a driver AND a log
reader, this new handle is not actually passed yet.
- The definition parser was using a set for the cap list, which meant
the order (and thus values) of caps was not static.
- Some code in objects/handle.h was made more explicit about what bits
meant what.
This change finally adds capabilities to handles. Included changes:
- j6_handle_t is now again 64 bits, with the highest 8 bits being a type
code, and the next highest 24 bits being the capability mask, so that
programs can check type/caps without calling the kernel.
- The definitions grammar now includes a `capabilities [ ]` section on
objects, to list what capabilities are relevant.
- j6/caps.h is auto-generated from object capability lists
- init_libj6 again sets __handle_self and __handle_sys, this is a bit
of a hack.
- A new syscall, j6_handle_list, will return the list of existing
handles owned by the calling process.
- syscall_verify.cpp.cog now actually checks that the needed
capabilities exist on handles before allowing the call.
Channels were unused, and while they were listed in syscalls.def, they
had no syscalls listed in their interface. This change adds them back,
and updates them to the curren syscall style.
Endpoints could previously crash if two senders were concurrently
writing to them, so this change adds a spinlock and protects functions
that touch the signals and blocked list.
A spinlock was recenly added to thread wait states, so that they
couldn't get stuck if their wake event happened while setting the wake
state, and cause the thread to deadlock. But the scoped_lock objects
locking that spinlock were being instantiated as temporaries and
immediately being thrown away because I forgot to name them.
This commit contains a couple large, interdependent changes:
- In preparation for capability checking, the _syscall_verify_*
functions now load most handles passed in, and verify that they exist
and are of the correct type. Lists and out-handles are not converted
to objects.
- Also in preparation for capability checking, the internal
representation of handles has changed. j6_handle_t is now 32 bits, and
a new j6_cap_t (also 32 bits) is added. Handles of a process are now a
util::map<j6_handle_t, handle> where handle is a new struct containing
the id, capabilities, and object pointer.
- The kernel object definition DSL gained a few changes to support auto
generating the handle -> object conversion in the _syscall_verify_*
functions, mostly knowing the object type, and an optional "cname"
attribute on objects where their names differ from C++ code.
(Specifically vma/vm_area)
- Kernel object code and other code under kernel/objects is now in a new
obj:: namespace, because fuck you <cstdlib> for putting "system" in
the global namespace. Why even have that header then?
- Kernel object types constructed with the construct_handle helper now
have a creation_caps static member to declare what capabilities a
newly created object's handle should have.
There has been a global clock object for a while now, but scheduler was
never using it, instead still using its simple increment clock. Now it
uses the hpet clock.
First attempt at a UART driver. I'm not sure it's the most stable. Now
that userspace is handling displaying logs, also removed serial and log
output support from the kernel.