Institute of Mathematics and Informatics Bulgarian Academy of Sciences
Citation:
Pliska Studia Mathematica Bulgarica, Vol. 12, No 1, (1998), 5p-20p
Abstract:
We discuss in this paper the problem of generating highly efficient code when a
n + 1-dimensional nested loop program is executed on a n-dimensional torus/grid
of distributed-memory general-purpose machines. We focus on a class of uniform
recurrences with non-negative components of the dependency matrix. Using tiling
the iteration space strategy we show that minimizing the total running time reduces
to solving a non-trivial non-linear integer optimization problem. For the later we
present a mathematical framework that enables us to derive an O(n log n) algorithm
for finding a good approximate solution. The theoretical evaluations and the experimental results show that the obtained solution approximates the original minimum
sufficiently well in the context of the considered problem. Such algorithm is realtime usable for very large values of n and can be used as optimization techniques in
parallelizing compilers as well as in performance tuning of parallel codes by hand.