Previously, when adding a new thread, we only ever added it to the
current CPU and relied on work stealing to balance the CPUs. This commit
has the scheduler schedule new tasks round-robin across CPUs in hopes of
having to steal fewer tasks.
Also adds the run_queue.prev pointer for debugging what task was just
running on the given CPU.