Lecture 01 - Algorithm Analysis
1.0 - Course Overview
1.1 - Course Aims
- Expand your ability to analyse, critique, design and implement advanced data structures and algorithms
1.2 - Assumed Background
- Programming experience, including basic data structures and recursive procedures
- Familiarity with using a programming language - using Java for assignments
- Mathematical background
- Familiarity with proof by mathematical induction
- Knowledge of calculus, including differentiation, limits, L’Hôpital’s rule, and summations
1.3 - Motivation
- Practical basis for creating more efficient algorithms
- Theoretical basis for justifying your choice of algorithms
- Improve your problem-solving skills
- A prerequisite for looking for a job at Google, Oracle, etc.
2.0 - Recap of Algorithm Analysis
- Design and analyse efficient algorithms
- Analysis: Bounded below Ω, bounded above O, bounded tightly Θ
- Constant terms can be disregarded for large enough inputs
- Summations useful for analysing the cost of an algorithm
2.1 - What is an Algorithm
- An algorithm is a well-defined computation procedure that takes some values as an input and produces some value, or set of values as an output [CLRS, Chapter 1]
- Usually defined to solve a specific computational problem
- We would like some algorithms to be correct (with respect to their problem) as well as efficient.
- We would also like the algorithms to be designed in a way that they are readable so that we can (a) verify that they are correct (b) maintain these algorithms.
2.2 - Sorting Problem
💡 More broadly, how do we define a problem in a way so that we can define algorithms to solve them
Input
A sequence of numbers ⟨a1,a2,a3,⋯,an⟩
Output
A permutation ⟨a1′,a2′,a3′,⋯,an′⟩ of the input sequence such that a1′≤a2′,⋯,≤an′
2.2.1 - Sorting Algorithms: Insertion Sort
🌱 We have many sorting algorithms, and one of these are called Insertion Sort.
for j = 2 to A.length
key = A[j]
i = j-1
while i > 0 and A[i] > key:
A[i + 1] = A[i]
i = i - 1
A[i + 1] = key
🌱 Run insertion sort on the array A = [5, 4, 2, 6, 3]
Initially, j=2 - that is, we begin the insertion sort on the second element.
i jA= 5 4 3 2 1
Additionally at this point, key=4.
Since i=5>key=4, we shuffle up the position of the comparison element.
We decrement the value of i, and in this case, we have found where to insert 4.
i jA= 5 4 3 2 1
We now work on the insertion of the following elements (where we try and insert it into the sorted section).
How do we verify that this algorithm is correct?
🌱 We can use the idea of invariants to help us prove that algorithms are correct using an inductive argument. That is, we can use these invariants as the inductive argument to prove that these algorithms are correct.
- An invariant for insertion sort could be
- A[1..j-1] contains the original elements from A[1..j-1], but in sorted order
- This invariant is initially true, as a single element is inherently in sorted order
- The body of the loop preserves the loop invariant
- Therefore, using an inductive argument, we can verify that the
3.0 - Execution Time
🌱 What does execution time depend on?
- Execution time depends on a variety of factors, including:
Input Size
e.g. sorting 10 vs 1000 elements
Input Value
e.g. sorting an already-sorted list vs a reverse-sorted list
Computer Architecture
e.g. the basic instructions available, computation speed
- Generally, we want an upper bound on execution time
3.1 - Execution Time: Worst, Average and Best Case
Worst Case
T(n)= Maximum execution time over all input of size n
Average Case
T(n)= Average execution time over all inputs of size n, weighted by probability of input.
Worst Case
T(n)= Minimum execution time over all inputs of size n
3.2 - Running Time Analysis
🌱 The running time of an algorithm on a given input can be measured in terms of the number of primitive steps executed.
-
Let’s compute the running time of insertion sort
-
Let n be A.length
-
Let’s measure the time in terms of the number of array comparisons (in red)
- This is an alright approximation, as the amount of work that must be done to finish this execution (i.e., execution time) is proportional to the number of times this comparison is executed.
for j = 2 to A.length
key = A[j]
i = j-1
while i > 0 and A[i] > key:
A[i + 1] = A[i]
i = i - 1
A[i + 1] = key
-
We want to compute the worst-case running time of our algorithm
-
When j=2 we can perform the comparison a maximum of 1 time.
-
When j=3 we can perform the comparison a maximum of 2 times (if the two preceding elements are greater than it, as we have to shuffle both of these elements)
-
..
-
For an arbitrary value of j, we can perform the comparison a maximum of j−1 times.
-
Therefore, the worst-case running time of the algorithm is:
T(n)=j=2∑n(j−1)=2n(n−1)
-
We now want to compute the best-case running time of our algorithm
-
The best case occurs when the array that we want to sort is already in sorted order.
-
That is, for every iteration of the for loop, we only perform the check once.
-
Therefore, the best-case is:
T(n)=j=2∑n(1)=n−1
3.3 - Running Time Analysis - Recursion
- More complicated to analyse than conditionals and loops
- Running time can often be described by a recurrence
- Overall running time on a problem of size n is described in terms of running time(s) on smaller inputs and functions of n.
3.4 - Asymptotic Analysis: The General Idea
- Groups functions together based on their rate of growth:
Merge Sort
Θ(nlogn)
Insertion Sort
Θ(n2)
- For large inputs, the difference in order outweigh constant factors:
- For example, merge sort is ultimately better for large enough n no matter what the constant factors are.
- Ignores implementation dependent constants such as machine speed or the compiler.
4.0 - Asymptotic Notation
4.1 - Growth of Functions
🌱 The following table shows the largest instance that can be solved in a given time
T(n) |
1 second |
1 day |
1 year |
n |
1,000,000 |
86,400,000,000 |
31,536,000,000,000 |
nlogn |
62,746 |
2,755,147,514 |
798,160,978,500 |
n2 |
1,000 |
293,938 |
5,615,629 |
n3 |
100 |
4,421 |
31,593 |
2n |
19 |
36 |
44 |
4.2 - Limitations of Asymptotic Analysis
- Constant factors are relevant for:
- Small input sizes
- Algorithms of the same order
4.3 - Asymptotic Notation
For functions f and g:
- f∈O(g) - f is asymptotically bounded above by g to within a constant factor.
- n∈O(n2)
- 64,000n∈O(n)
- f∈Ω(g) - f is asymptotically bounded below by g to within a constant factor
- g∈O(f)
- n2∈Ω(n)
- f∈Θ(g) - f is asymptotically bounded above and below by g to within a constant factor.
- f∈O(g)∧f∈Ω(g)
- 42n∈Θ(n)
- n∈Θ(n2)