SCHED_SOFTRR Linux Scheduler Policy

The Linux scheduler currently implements POSIX standard definitions for task priorities, that are SCHED_OTHER, SCHED_FIFO and SCHED_RR. The SCHED_OTHER scheduling policy is basically timeslice driven by assigning each task a miximum timeslice that the task itself can use before being expired. Tasks that need almost deterministic latencies have both SCHED_FIFO and SCHED_RR policies available. The problem is that those policies can be used only by the superuser account since they can in theory lock down the machine if the task that is using them do not explicitly release the CPU. In many applications, like multimedia players for example, there is the need to have almost deterministic timings to correctly perform their operations and this would require the application to be run as superuser. Since the POSIX definition is very clear about the SCHED_RR policy, that is that the task should never be pre-empted by SCHED_OTHER tasks, we do need another policy to be defined so that we can have both deterministic scheduler latencies and, at the same time, avoid other tasks starvation because of a greedy realtime process. It is fairly easy to modify the current scheduler to have both the egg and the chicken by introducing a bounding for the CPU time the non-root realtime task. We will define a new scheduler policy SCHED_SOFTRR that will make the target task to run with realtime priority while, at the same time, we will enforce a bound for the CPU time the process itself will consume. A new field (ts_timestamp) has been added to the task struct to register the timestamp when a task receives a brand new timeslice. When a SCHED_SOFTRR task's timestamp expires, a check is performed to compare the difference between the current timestamp (jiffies) and the timestamp when the task received the last timeslice. If this difference is lower than SCHED_TS_KSOFTRR (currently == 5) times the task timeslice, the process is dropped inside the expired array by giving other (non realtime) tasks a chance to run, otherwise it will be reinjected inside the active array by exactly following the POSIX SCHED_RR policy. The current patch has actually an hack (that should be likely removed in next versions) so that if a non-root user tries to request a SCHED_RR policy, it'll be automatically downgraded to SCHED_SOFTRR to be able to test existing application binaries without rebuilding them. I also coded a simple latency test application that can be used to measure scheduler latencies under different policies. The name of the test program is lattest whose source code is available at the bottom of this page. Running lattest even with huge CPU loads shows very predictable latencies and running a CPU hog with SCHED_SOFTRR leaves the system in an usable state. Tuning can be obviously done on SCHED_TS_KSOFTRR to find the better value to leave the system in usable state even in case of bad-behaving SCHED_SOFTRR processes. Testing has been done using the lattest tool to measure expected latency against the one effectively measured. On my Athlon 1GHz with 768MB of RAM a `make -j 40 bzImage` has been used to load the machine (every time a `make clean` preceeded the test) and then lattest has been run with :

to measure SCHED_SOFTRR latencies. Results show a very impredictable latency (as expected) using the SCHED_OTHER scheduling policy, while a very predictable one using the SCHED_SOFTRR one.

Also, a pure CPU bound task running with SCHED_SOFTRR has been run without having the system to suffer for this. In fact, the CPU bounding that SCHED_SOFTRR processes receives results effective in preventing any sort of starvation of other tasks. More test has been done using this time the thud.c simple load and using the same lattest parameters to sample latency. Results show that huge latecies can be expected to hit applications running with the SCHED_OTHER POSIX policy. On the contrary, the SCHED_SOFTRR policy guarantee a deterministic latency under every load.