The Folklore of Sorting Algorithms



IJCSI International Journal of Computer Science Issues, Vol. 4, No. 2, 2009

26


work correctly [2,3]. We can not deploy binary search if
data is not pre sorted otherwise the search process may get
trapped into a blind alley thereby exhibiting worst case
complexity. Literature review has highlighted mainly
interesting things which could be the parts of folklore and
or theorems .Usually these folklore and theorem parts are
always omitted by the faculties teaching these topics and
students find them extremely difficult to understand. We
will examine sorting algorithms in these lines and try and
discover the tradeoffs between folklore and theorems. This
paper covers different complexity aspects of basic sorting
models, such as big O notation, best, worst and average
case analysis, time-space tradeoffs, lower bounds, etc.

2. Origin of Folklore

When we look to develop or use a sorting algorithm on
large problems, it is important to understand how long the
algorithm might take to run. The time for most sorting
algorithms depends on the amount of data or size of the
problem. In order to analyze an algorithm, we try to find a
relationship showing how the time needed for the
algorithm depends on the amount of data. This is called
the "complexity" of the algorithm. [4]

A simple sorting algorithm like bubble sort may have a
high complexity, whereas an algorithm which is very
complex in its organization such as shell sort , sometimes
pays off by having a lower complexity in the sense of
time needed for computation[5,6]. Besides running time
analysis, it is seen that, factors other than the sorting
algorithm selected to solve a problem, affect the time
needed for a program. It is just because different people
carrying out a solution to a problem may work at different
speeds, even when they use the same sorting method,
different computers work at different speeds. The
different speeds of computers on the same program can be
due to different "clock speeds”, the rates at which steps in
the program are executed by the machine and different
"architectures," the way in which the internal instructions
and circuitry of the computer are organized. Literature
review shows that, the researchers have attempted for
rigorous analysis of sorting algorithms and produced
prolific comments [7, 8, 9] discovering such complexity
dependent factors .

Consequently, analysis of sorting algorithm can not
predict exactly how long it will take on a particular
computer [5]. What analysis can do is tell us how the time
needed depends on the amount of data. For example, we
always come across folklore like, for an algorithm, when
we double the amount of data, the time needed is also
doubled, and in other words the time needed is
proportional to the amount of data. The analysis of
another algorithm might tell us that when we double the
amount of data, the time is increased by a factor of four,
which is the time needed is proportional to the amount of
data squared. The latter algorithm would have the time
needed increase much more rapidly than the first.

When analyzing sorting algorithms, it often happens that
the analysis of efficiency also depends considerably on the
nature of the data. For example, if the original data set
already is almost ordered, a sorting algorithm may behave
rather differently than if the data set originally contains
random data or is ordered in the reverse direction.
Similarly, for many sorting algorithms, it is difficult to
analyze the average-case complexity. Generally speaking,
the difficulty comes from the fact that one has to analyze
the time complexity for all inputs of a given length and
then compute the average. This is a difficult task. Using
the incompressibility method, we can choose just one
input as a representative input. Via Kolmogorov
complexity, we can show that the time complexity of this
input is in fact the average-case complexity of all inputs of
this length. Constructing such a “representative input”
is impossible, but we know it exists and this is
sufficient [10]

For these reasons, the analysis for sorting algorithms often
considers separate cases, depending on the original nature
of the data. The price of this generality is exponential
complexity; with the result that many problems of
practical interest are solvable better than folklore of
sorting, but the limitations of computational capacity
prevent them from being solved in practice. The
increasing diversity in computing platforms motivates
consideration of multi-processor environment. Literature
review suggests that no folklore is found mentioned
regarding complexity in multiprocessor environment.

Recently, many results on the computational complexity of
sorting algorithms were obtained using Kolmogorov
complexity (the incompressibility method). Especially, the
usually hard average-case analysis is amenable to this
method. A survey [10] shows such results about Bubble
sort, Heap sort, Shell sort, Dobosiewicz-sort, Shaker sort,
and sorting with stacks and queues in sequential or parallel
mode. It is also found that the trade-off between memory
and sorting is enhanced by the increase in availability of
computer memory and the increase in processor speed.
Currently, the prices of computer memory are decreasing.
Therefore, acquiring larger memory configurations is no
longer an obstacle, making it easier to equip a computer
with more memory.

Similarly, every year there is an increase in computer
speed. The Moore’s Law, a computer-industry standard

IJCSI



More intriguing information

1. THE UNCERTAIN FUTURE OF THE MEXICAN MARKET FOR U.S. COTTON: IMPACT OF THE ELIMINATION OF TEXTILE AND CLOTHING QUOTAS
2. The name is absent
3. Are combination forecasts of S&P 500 volatility statistically superior?
4. A Note on Costly Sequential Search and Oligopoly Pricing (new title: Truly Costly Sequential Search and Oligopolistic Pricing,)
5. The name is absent
6. Surveying the welfare state: challenges, policy development and causes of resilience
7. On the job rotation problem
8. Announcement effects of convertible bond loans versus warrant-bond loans: An empirical analysis for the Dutch market
9. The name is absent
10. The name is absent
11. Design and investigation of scalable multicast recursive protocols for wired and wireless ad hoc networks
12. Healthy state, worried workers: North Carolina in the world economy
13. Magnetic Resonance Imaging in patients with ICDs and Pacemakers
14. The name is absent
15. The name is absent
16. Do imputed education histories provide satisfactory results in fertility analysis in the Western German context?
17. Does Competition Increase Economic Efficiency in Swedish County Councils?
18. An institutional analysis of sasi laut in Maluku, Indonesia
19. The Distribution of Income of Self-employed, Entrepreneurs and Professions as Revealed from Micro Income Tax Statistics in Germany
20. The name is absent