IJCSI International Journal of Computer Science Issues, Vol. 4, No. 2, 2009
26
work correctly [2,3]. We can not deploy binary search if
data is not pre sorted otherwise the search process may get
trapped into a blind alley thereby exhibiting worst case
complexity. Literature review has highlighted mainly
interesting things which could be the parts of folklore and
or theorems .Usually these folklore and theorem parts are
always omitted by the faculties teaching these topics and
students find them extremely difficult to understand. We
will examine sorting algorithms in these lines and try and
discover the tradeoffs between folklore and theorems. This
paper covers different complexity aspects of basic sorting
models, such as big O notation, best, worst and average
case analysis, time-space tradeoffs, lower bounds, etc.
2. Origin of Folklore
When we look to develop or use a sorting algorithm on
large problems, it is important to understand how long the
algorithm might take to run. The time for most sorting
algorithms depends on the amount of data or size of the
problem. In order to analyze an algorithm, we try to find a
relationship showing how the time needed for the
algorithm depends on the amount of data. This is called
the "complexity" of the algorithm. [4]
A simple sorting algorithm like bubble sort may have a
high complexity, whereas an algorithm which is very
complex in its organization such as shell sort , sometimes
pays off by having a lower complexity in the sense of
time needed for computation[5,6]. Besides running time
analysis, it is seen that, factors other than the sorting
algorithm selected to solve a problem, affect the time
needed for a program. It is just because different people
carrying out a solution to a problem may work at different
speeds, even when they use the same sorting method,
different computers work at different speeds. The
different speeds of computers on the same program can be
due to different "clock speeds”, the rates at which steps in
the program are executed by the machine and different
"architectures," the way in which the internal instructions
and circuitry of the computer are organized. Literature
review shows that, the researchers have attempted for
rigorous analysis of sorting algorithms and produced
prolific comments [7, 8, 9] discovering such complexity
dependent factors .
Consequently, analysis of sorting algorithm can not
predict exactly how long it will take on a particular
computer [5]. What analysis can do is tell us how the time
needed depends on the amount of data. For example, we
always come across folklore like, for an algorithm, when
we double the amount of data, the time needed is also
doubled, and in other words the time needed is
proportional to the amount of data. The analysis of
another algorithm might tell us that when we double the
amount of data, the time is increased by a factor of four,
which is the time needed is proportional to the amount of
data squared. The latter algorithm would have the time
needed increase much more rapidly than the first.
When analyzing sorting algorithms, it often happens that
the analysis of efficiency also depends considerably on the
nature of the data. For example, if the original data set
already is almost ordered, a sorting algorithm may behave
rather differently than if the data set originally contains
random data or is ordered in the reverse direction.
Similarly, for many sorting algorithms, it is difficult to
analyze the average-case complexity. Generally speaking,
the difficulty comes from the fact that one has to analyze
the time complexity for all inputs of a given length and
then compute the average. This is a difficult task. Using
the incompressibility method, we can choose just one
input as a representative input. Via Kolmogorov
complexity, we can show that the time complexity of this
input is in fact the average-case complexity of all inputs of
this length. Constructing such a “representative input”
is impossible, but we know it exists and this is
sufficient [10]
For these reasons, the analysis for sorting algorithms often
considers separate cases, depending on the original nature
of the data. The price of this generality is exponential
complexity; with the result that many problems of
practical interest are solvable better than folklore of
sorting, but the limitations of computational capacity
prevent them from being solved in practice. The
increasing diversity in computing platforms motivates
consideration of multi-processor environment. Literature
review suggests that no folklore is found mentioned
regarding complexity in multiprocessor environment.
Recently, many results on the computational complexity of
sorting algorithms were obtained using Kolmogorov
complexity (the incompressibility method). Especially, the
usually hard average-case analysis is amenable to this
method. A survey [10] shows such results about Bubble
sort, Heap sort, Shell sort, Dobosiewicz-sort, Shaker sort,
and sorting with stacks and queues in sequential or parallel
mode. It is also found that the trade-off between memory
and sorting is enhanced by the increase in availability of
computer memory and the increase in processor speed.
Currently, the prices of computer memory are decreasing.
Therefore, acquiring larger memory configurations is no
longer an obstacle, making it easier to equip a computer
with more memory.
Similarly, every year there is an increase in computer
speed. The Moore’s Law, a computer-industry standard
IJCSI