SCHED_SOFTRR
Linux Scheduler Policy
by
Davide Libenzi <davidel@xmailserver.org>
The Linux scheduler
currently implements POSIX standard definitions for task priorities,
that are SCHED_OTHER, SCHED_FIFO and SCHED_RR. The SCHED_OTHER
scheduling policy is basically timeslice driven by assigning each task a
miximum timeslice that the task itself can use before being expired.
Tasks that need almost deterministic latencies have both SCHED_FIFO and
SCHED_RR policies available. The problem is that those policies can be
used only by the superuser account since they can in theory lock down
the machine if the task that is using them do not explicitly release the
CPU. In many applications, like multimedia players for example, there is
the need to have almost deterministic timings to correctly perform
their operations and this would require the application to be run as
superuser. Since the POSIX definition is very clear about the SCHED_RR
policy, that is that the task should never be pre-empted by SCHED_OTHER
tasks, we do need another policy to be defined so that we can have both
deterministic scheduler latencies and, at the same time, avoid other
tasks starvation because of a greedy realtime process. It is fairly easy
to modify the current scheduler to have both the egg and the chicken by
introducing a bounding for the CPU time the non-root realtime task. We
will define a new scheduler policy SCHED_SOFTRR that will make the
target task to run with realtime priority while, at the same time, we
will enforce a bound for the CPU time the process itself will consume.
A new field (ts_timestamp) has been added to the task struct to
register the timestamp when a task receives a brand new timeslice. When
a SCHED_SOFTRR task's timestamp expires, a check is performed to
compare the difference between the current timestamp (jiffies) and the
timestamp when the task received the last timeslice. If this difference
is lower than SCHED_TS_KSOFTRR (currently == 5) times the task
timeslice, the process is dropped inside the expired array by
giving other (non realtime) tasks a chance to run, otherwise it will be
reinjected inside the active array by
exactly following the POSIX SCHED_RR policy. The current patch has
actually an hack (that should be likely removed in next versions) so
that if a non-root user tries to request a SCHED_RR policy, it'll be
automatically downgraded to SCHED_SOFTRR to be able to test existing
application binaries without rebuilding them. I also coded a simple
latency test application that can be used to measure scheduler latencies
under different policies. The name of the test program is lattest whose
source code is available at the bottom of this page. Running lattest even with
huge CPU loads shows very predictable latencies and running a CPU hog
with SCHED_SOFTRR leaves the system in an usable state. Tuning can be
obviously done on SCHED_TS_KSOFTRR to find the better value to
leave the system in usable state even in case of bad-behaving
SCHED_SOFTRR processes. Testing has been done using the lattest tool to
measure expected latency against the one effectively measured. On my
Athlon 1GHz with 768MB of RAM a `make -j 40 bzImage` has been used to
load the machine (every time a `make clean` preceeded the test) and then lattest has been
run with :
lattest --sched-other --sleep-mstime X
--test-stime 60
to measure SCHED_OTHER
latencies, while :
lattest --sched-softrr --sleep-mstime X --test-stime 60
to measure SCHED_SOFTRR
latencies. Results show a very impredictable latency (as expected) using
the SCHED_OTHER scheduling policy, while a very predictable one using
the SCHED_SOFTRR one.
Patches And Test Software