internal main memory
external use auxiliary storage
stable retains original order if keys are the same
oblivious perform the same amount of work regardless of actual input
sort by address use indirect addressing so don't have to move record (key address pairs)
Count number of elements less than each item plus the number of equal elements which occur to its left. Then place the item at the correct location.
Try pseudocode at seats.
For efficiency, record which is higher for each compare.
template<class T>
void Rank(T a[], int n, int rank[])
{// Rank the n elements a[0:n-1].
for (int i = 0; i < n; i++)
rank[i] = 0; // initialize
for (i = 1; i < n; i++)
for (int j = 0; j < i; j++)
if (a[j] <= a[i]) rank[i]++;
else rank[j]++;
}
Note that this only counts the number of items less than (or to the left and equal) for each item. There must still be a separate routine which actually moves each item into the proper location based on rank.
How are equal keys handled? If there are three copies of the same key, do they all have the same rank? Try computing ranks for 8 1 5 3 5
Count number of compares as 1 + 2 + 3 + 4 + ... + n-1 = n(n+1)/2 (Derive by listing sum backwards and adding together.)
selection sort: select elements one at a time and place in proper
final position.
Repeatedly find the smallest.
3 6 43 1 9 1 6 43 3 9 1 3 43 6 9 1 3 6 43 9 1 3 6 9 43
Analysis: n + n-1 + n-2 +
+ 1 = n(n-1)/2
Only n moves - use when records are long
Requires
compares even when already sorted - oblivious.
Compares adjacent elements - exchanges if out of order.
Lists gets smaller each time - at least one is placed in final order.
Place of last swap is as much as you have to look at.
Sorts by inserting records into an already existing sorted file.
Two groups of keys - sorted and unsorted.
Insert n times - each time move 1/2 elements to insert.
The number of elements in the list changes so
11 5 17 1 21 5 11 17 1 21 5 11 17 1 21 1 5 11 17 21 1 5 11 17 21
Will perform better when the degree of unsortedness is low - recommended for data that is nearly sorted.
Improvements
Could use sentinel containing the key
search until
sentinel is extra item added to one end of the list so ensure that the loop will terminate without having to include a separate check.
Partition the set into: j, those elements less than j, those elements
greater than j.
Often does as: Use Two pointers - top pointer looks for a value smaller than j, bottom pointer looks for a value larger than j. Then interchange. (This does only a third as many swaps.)
Apply recursively
Analysis: At each level, all elements of the array are examined. The number of levels depends on how equally the pieces are divided. Best case: log n levels yielding O(n log n)
Worst case Let n be the size of the array to be sorted:
C(n) = n-1 + C(n-1) = n(n-1)/2
Space requirement: depends on recursive stacking.
Improvements
Chop the lists into two sublists. Sort the two pieces. Combine by merging. (Some techniques just get sublists of larger and larger powers of two.) The pieces may ALSO be sorted via Mergesort, so it is recursive.
In merging sublists of length n, clearly no more than 2n compares are required. Actually it is less than this, but it is easy to count this way.
Each level takes exactly n compares and there are lg n levels, the complexity is O(n lg n).
A closer count is n lg n -1.25n + 1 (by running test cases)
If linked lists, no problem with storage space. If we have an array of items to be stored, the mergesort requires an auxiliary storage array.
Notice that quicksort and mergesort have similar structure
quicksort(a[],low,high) {
pivot = partition(a,low,high)
quicksort(a,low,pivot-1)
quicksort(a,pivot+1,high)
}
mergesort(a[],low,high) {
mid=(low+high)/2
mergesort(a,low,mid)
mergesort(a,mid+1,high)
sequentialmerge(a,low,mid,high)
}
Both have O(n) work to either (1) divide into chunks or (2) put the chunks back together.
The pictures we draw (for expected case) look the same.
The formula analysis looks the same.
We see it doesn't matter whether we do the work before the recursion or after. The work is the same.
Performance: memory and time required
performance analysis: analytical
performance measurement: experimental
Space complexity (usually less important):
Time complexity:
Components of space complexity:
Log Review
For a binary tree with 63 nodes, how many levels are there?
If I have an array of size 120, how many times can I split the array in half?
![]()
![]()
![]()
![]()
![]()
![]()
Components of Time Complexity:
asymptotics study of functions of n as n gets large (without bound)
If the running time of an algorithm is proportional to n, when we double n we double the running time.
If the running time is proportional to lg n (c log n), when we double n we only change the running time by c (c is constant of proportionality).
c log 2n = c(log 2 + log n) = c(1 + log n) = c + c log nSince original time was c log n, doubling n only increased the time by c.
List examples of something being proportional to something else.
Operation Count: how many times you add, multiply, compare, etc.
Step Counts: Attempt to account for time spent in all parts of the program/function as a function of the characteristics of the program.
Example 2.20
template<class T>
void Add( T **a, T **b, T **c, int rows, int cols)
{// Add matrices a and b to obtain matrix c.
for (int i = 0; i < rows; i++) {
count++; // preceding for loop
for (int j = 0; j < cols; j++) {
count++; // preceding for loop
c[i][j] = a[i][j] + b[i][j];
count++; // assignment
}
count++; // last time of j for loop
}
count++; // last time of i for loop
}
Can also assign counts on a per statement basis.
void Add( T **a, T **b, T **c, int rows, int cols)
{
for (int i = 0; i < rows; i++) rows+1
for (int j = 0; j < cols; j++) rows(cols+1)
c[i][j] = a[i][j] + b[i][j]; rows*cols
}
TOTAL: 2(rows*col) + 2rows + 1
Key reason for operation or step counts is to compare two programs which compute the same results.
We want to give an upper bound on the amount of time it takes to solve a problem.
defn:
constants c and
such that
whenever
Termed complexity: has nothing to do with difficulty of coding or understanding, just time to execute.
Important tool for analyzing and describing the behavior of algorithms
Is an
algorithm always better than a
algorithm? No, it depends
on the specific constants, but if n is large enough, an
is always
slower than an
algorithm.
Complexity Class: O(1), O(log n), O(n), O(n log n), O(
), O(
)
For small n, c may dominate.
Intractable: all known algorithms are of exponential complexity
In this example there are two values which effect the complexity: m and n
for (i=0;i < n;i++) x++ for (j=0;j < m;i++) x++The first statement has complexity O(n). The second statement has complexity O(m).
Therefore, the additive property indicates:
O(n) + O(m) = O(max(n,m))
if cond then S1 else S2
The complexity is the running time of the cond plus the larger of the complexities of S1 and S2.
Analyze nested for loops inside out. The total complexity of a statement inside a group of nested for loops is the complexity of the statement multiplied by the product of the size of each for loop.
for (int i=0;i < m;i++)
for (int j=0;j < n;j++)
x++;
O(mn)
Consider a pictorial view of the work. Let each row represent the work done in one iteration of the outermost loop. The number of rows represents the number of times the outermost loop executes. The area of the figure will then represent the total work.
for (beg = 1;beg < m; beg++)
for (j = beg; j < m; j++)
x++;
In this example, our complexity picture is triangular. The outermost loops executes m times, but since each time the j loop is called it is has a different beginning location, the rows are of different length.
The complexity for recursive algorithms requires additional techniques.
void doit(int n)
{
if (n==1) return;
for (int i=0; i < n; i++)
x = x + i;
doit(n/2);
doit(n/2);
}
If we let T(n) represent the time to solve doit(n), the running time is represented recursively as T(n) = n+ 2 T(n/2) . In other words, the time for method doit to execute when n is the parameter is n (because of the for loop) plus two times the running time of T(n/2) (since doit is called twice recursively with a parameter of n/2).
Since T is defined in terms of T, this is called a recurrence relation
In our pictorial view, we let each line represent a layer of recursions (The first call is the first row, the two calls at the second level (doit calls doit) comprise the second row, the four third level calls (doit calls doit calls doit) represent the third row. The length of the row represents the call itself (ignoring costs incurred by the recursive calls). In other words, to determine the size of the first row, measure how much work is done in that call not counting the recursive calls it makes.
The number of rows is determined by

void doit(int n)
{ if (n<=0) return;
for (int i=0; i < n; i++)
x = x + i
doit(n/2)
}
If we let T(n) represent the time to solve this problem, the time is represented recursively as T(n) = T(n/2) +n.
In our pictorial view, we let each line represent a layer of recursions (The first call is the first row, the one call at the second level (doit calls doit) is the second row, the one third level call (doit calls doit calls doit) represents the third row. The length of the row represents the call itself (ignoring costs incurred by the recursive calls). In other words, to determine the size of the first row, measure how much work is done in that call not counting the recursive calls it makes.
void doit(int n){
if (n <=1) return;
int x = x + i;
doit(n/2);
}
If we let T(n) represent the time to solve this problem, the
time is represented recursively as
T(n) = T(n/2) +1.
In this case, a single call to doit(n) (ignoring recursive calls) takes constant time. We draw that as a square of length 1. Since there are log n levels representing the log n calls, the picture looks like:
void doit(n){
if (n <=1) return;
int x = x + i;
doit(n/2);
doit(n/2);
}
If we let T(n) represent the time to solve this problem, the
time is represented recursively as
T(n) = 2T(n/2) +1.
Again, the time to execute doit(n) ignoring recursive calls is constant. However, the number of calls required at each level doubles. Our picture is
Mathematicians have developed a formula approach to determining complexity.
Theorem:
T(n)= a T(n/b) + O(
)
if
the complexity is O(
)
if
the complexity is O(
)
if
the complexity is O(
)
In this case:
Let's use the theorem to revisit the same problems we solved pictorially.
void doit(int n)
{
if (n==1) return;
for (int i=0; i < n; i++)
x = x + i;
doit(n/2);
doit(n/2);
}
There are two recursive calls made:
a=2
Each recursive call does half of the work:
b=2
The work done in a single call is n. k is the power on n:
k=1 (as work is n)
Since
, we are in the ``equals'' case.
if
O(
) = O(
) = O(nlog n), which is exactly what our pictures told
us.
void doit(int n)
{ if (n<=0) return;
for (int i=0; i < n; i++)
x = x + i
doit(n/2)
}
There is one recursive calls made: a=1 The recursive call does half of the work: b=2 The work done in a single call is n. k is the power on n: k=1 (as work is n)
Since
, we are in the ``less than'' case.
if
O(
) = O(
)= O(n), which is exactly what our pictures told us.
void doit(int n){
if (n <=1) return;
int x = x + i;
doit(n/2);
}
There is one recursive calls made: a=1 The recursive call does half of the work: b=2 The work done in a single call is independent of n. k is the power on n: k=0 (as work is 1)
since
as
we are in the ``equals'' case.
if
O(
) = O(
) = O(log n), which is exactly what our pictures told
us.
void doit(n){
if (n <=1) return;
int x = x + i;
doit(n/2);
doit(n/2);
}
There are two recursive calls made: a=2 The recursive call does half of the work: b=2 The work done in a single call is independent of n. k is the power on n: k=0 (as work is 1)
Since
we are in the ``greater than'' case.
if
the complexity is O(
= O(
) = O(
) which is exactly what our
pictures told us.
In analyzing algorithms, we often encounter progressions. Most calculus books list the formulas below, but you should be able to derive them.
Example: ![]()
Writing the same sum backwards:
![]()
If we add the two S's together,

Example: ![]()
Multiplying both sides by the base:
Subtracting S from 2S we get
At times we may want to verify the complexity of written code.
To do this we can either use
To be able to determine complexity from experimental evidence, you must be able to have data for several different problem sizes. Since we do not know the constant, we can determine nothing from a single data point.
For example, if our runtime information consists of

The complexity could be anything
Even with two pieces of data it is not completely determined as the timing doesn't have to be exact (you could have been lucky and finished faster).
The easiest way to visually see the complexity is to have four or five data points in which the problem size keeps doubling.
Consider the following timings obtained from running code. What is the complexity?

This complexity is O(1) - the run time is basically constant.

This complexity is O(n) with a c of 5. There is a bit of variability, however.

This is O(log n) with a c =5. The first entry doesn't fit the pattern, but remember that it may take a while for the pattern to emerge.

This is O(
) with c=2

This is O(n log n) with c=1;
How do you figure out the complexity from experimental data when you know neither the constant or the complexity?
I ``eyeball it'' to come up with a good guess. Then I figure out the constant for one entry. Next I see if that constant works for all the entries. This is basically my ``approximate-then-finalize'' approach.
How do I guess?
Assume P is
for some constant
and Q is
for some constant
. Program p is faster when n > max(
,
, c/d).
Note, the coefficients matter in this comparison. Eventually the lower complexity wins out, but n might be quite large.
Study Figure 2.25. Note that even for small n (1000), time is measured in
years for O(
).

One problem that concerns us is, ``Is the
always worse that n log n, no matter
what the constants?''
The following table helps us answer that question:

Notice that while the constants make make n log n worse that
for a while, eventually
(for large enough n) the higher complexity will be worse.