## Real-Time Scheduling on Heterogenous Multicore Processors

Many DoD computing systems--particularly cyber-physical systems--are subject to stringent size, weight, and power requirements. The quantity of sensor readings and functionalities is also increasing, and their associated processing must fulfill real-time requirements. This situation motivates the need for computers with greater processing capacity. For example, to fulfill the requirements of nano-sized unmanned aerial vehicles (UAVs), developers must choose a computer platform that offers significant processing capacity and use its processing resources to meet its needs for autonomous surveillance missions. This blog post discusses these issues and highlights our research that addresses them.

To choose a computer platform that offers greater capacity, it is necessary to observe the major trends among chip makers. Historically, advances in semiconductor miniaturization (a.k.a., Moore's Law) periodically yielded microprocessors with significantly greater clock speeds. Unfortunately, microprocessor serial processing speed is reaching a physical limit due to excessive power consumption. As a result, semiconductor manufacturers are now producing chips without increasing the clock speed, but instead are increasing the number of processor cores on a chip, which results in multicore processors. For nearly a decade, the use of *homogeneous* multicore processors (which are chips with identical processing cores) gave us some headroom in terms of power consumption and allowed us to enjoy greater computing capacity. This headroom is diminishing, unfortunately, and is about to vanish, forcing semiconductor manufacturers to seek new solutions.

We are currently witnessing a shift among semiconductor manufacturers from homogeneous multicore processors with identical processor cores to *heterogeneous* multicore processors. The impetus for this shift is that processor cores tailored for a specific class of applications behavior have the potential to offer much better power-efficiency. AMD Fusion and NVIDIA Tegra 3 are examples of this shift. Intel Sandybridge, which has a graphics processor integrated onto the same chip as the normal processor, also reflects this shift, as well.

In a heterogeneous multicore environment, the execution time of a software task depends on which processor core it executes on. For example, a software task performing computer graphics rendering, simulating physics, or estimating trajectories of flying objects runs much faster on a graphics processor than on a normal processor. Conversely, some software tasks are inherently sequential and cannot benefit from the graphics processor; they execute much faster on a normal processor. For example, a software task with many branches and no inherent parallelism runs much faster on a normal processor than on a graphics processor. Ideally, each task would be assigned to the processor where it executes with the greatest speed, but unfortunately the workload is often not perfectly balanced to the types of processor cores available.

Efficient use of processing capacity in the new generation of microprocessors therefore requires that tasks are assigned to processors intelligently. In this context, "intelligently" means that the resources requested by the program are the ones possessed by the processor. Moreover, the desire for short design cycles, rapid fielding, and upgrades necessitates that task assignment be done automatically--with algorithms and associated tools.

**THE TASK ASSIGNMENT PROBLEM**

The problem of assigning tasks to processors can be described as follows: A task (such as computer graphics rendering or a program determining whether the process half-or-triple-plus-one reaches one with a known starting value) is described with its processor utilization, but it has different processor utilizations for different processors. For example, if a given task is assigned to a graphics processor, then the task will have a utilization of 10 percent. If the task is assigned to a normal processor, the task will have a utilization of 70 percent. We are interested in assigning each task to exactly one processor such that for each processor, the sum of utilization of all tasks assigned to this processor will not exceed 100 percent. If we can find such an assignment then it is known that if tasks have deadlines described with the model implicit-deadline sporadic tasks--and if the scheduling algorithm Earliest-Deadline-First (EDF) is used--then all deadlines will be met at run-time (with a minor modification, we can use Rate-Monotonic scheduling as well).

**PREVIOUS APPROACHES FOR TASK ASSIGNMENT**

The task assignment problem belongs to a class of problems that are computationally intractable, meaning that it is highly unlikely to be possible to design an algorithm that finds a good assignment and always runs fast. So we should either create an algorithm that always finds a good assignment or one that always runs fast. For the goal of designing an algorithm that always finds a good assignment, task assignment can be modeled as integer-linear programming (ILP) as follows:

Minimize z

subject to the constraints that

for each processor p: x_{1,p}*u_{1,p} + x_{2,p}*u_{2,p} + ... + x_{n,p}*u_{n,p} <= z

and

for each task i: x_{i,1} + x_{i,2} + ... + x_{i,m} = 1

and

for each pair (i,p) of task i and processor p: x_{i,p} is either 0 or 1

In the optimization problem above, n is the number of tasks and m is the number of processors and u_{i,p} is the utilization of task i if it would be assigned to processor p. x_{i,p} is a decision variable with the interpretation that it is one if task i is assigned to processor p; zero otherwise.

Unfortunately, solving this integer linear program takes a long time.

To design an algorithm that always runs reasonably fast, there are several algorithms, as described in a research paper by Sanjoy K. Baruha, that transform the ILP into a linear program (LP) and then perform certain tricks. Although an LP runs faster than the ones based on ILP, they still have to solve an optimization problem, which can be time-consuming. To design algorithms that run faster, we would like to perform task assignment in a way that does not require solving LP.

**OUR APPROACH FOR TASK ASSIGNMENT**

Previous work on task assignment for homogeneous multicore processors where all processor cores are identical is based on a framework called bin-packing heuristics. Such algorithms work approximately as follows:

1.Sort tasks according to some criterion.

2.**for** each task **do**

3.**for** each processor **do**

4.if the task has not yet been assigned and it is possible to assign the task to the processor so that the sum of utilization of tasks on the processor does not exceed 100 percent then

5.Assign the task on the processor

6.**end if**

7.**end for**

8.**end for**

Our approach involves adapting bin-packing heuristics to heterogeneous multicore processors. We believe it is possible to modify the algorithm structure outlined above so we can also assign tasks to processors even when the utilization of a task depends on the processor to which it is assigned. One can show that the use of bin-packing can perform poorly if processors and tasks are not considered in any particular order. Specifically, for a set of tasks that could be assigned, such an approach can fail even when given processors that are "infinitely" faster. One of our main research challenges is therefore to determine how to sort tasks (step 1) and in which order we should consider processors (in step 3). We are evaluating our new algorithms in the following ways:

- We plan to prove (mathematically) the performance of our new algorithms. Specifically, we are interested in proving that if it is possible to assign tasks to processors then our algorithm will succeed in assigning tasks to a processor if a given processor is x times as fast. Given that x is our performance metric; the lower its value, the better.
- We also plan to evaluate the performance of our algorithms by applying the algorithms on randomly-generated task sets. This will demonstrate the "typical" behavior of the algorithms.

**CONCLUSION**

Most semiconductor manufacturers are shifting towards heterogeneous multicore processors to offer greater computing capacity while keep power consumption sufficiently low. But using a heterogeneous multicore efficiently for cyber-physical systems with stringent size, weight, and power requirements requires that tasks are assigned properly. This blog post has discussed the state-of-art and summarized our ongoing work in this area.

**ADDITIONAL RESOURCES**

To read the paper "Partitioning real-time tasks among heterogeneous multiprocessors" by Sanjoy Baruah, please visit

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1327956

To read the proceedings "Assigning Real-Time Tasks on Heterogeneous Multiprocessors with Two Unrelated Types of Processors", please visit

http://www.cister.isep.ipp.pt/activities/seminars/%28S%28otidyq454nfyy255alnrdb3m%29%29/GetFile.aspx?File=%2Fspringseminars2011%2Frtss10_het2.pdf