- Review Efficiency lecture notes
Review the notes from the Efficiency lecture. In particular, be sure you understand:
- What (approximately) 210, 220, and 230 equal.
- What approximately log21k, log21m, log21b equal.
- The fact that logn is much, much smaller than n.
- Why we ignore lower-order terms and constants in Big-O.
- Why we ignore the base in logs in Big-O.
- How selectionSort, bubbleSort, and mergeSort work.
- The proof that selectionSort is O(n2).
- The proof that, for selectionSort (or any quadratic algorithm), when you multiply the input size by c, you multiply the runtime by c2.
- The proof that mergeSort is O(nlogn).
- The fact that, for mergeSort (or any nlogn algorithm), when you multiply the input size by c, you multiply the runtime by a value slightly
larger than c but quite a bit smaller than c2.
- The fact that O(nlogn) is theoretically optimal for any comparison-based sort, and consequently that mergeSort is within a constant time as fast
as any possible comparison-based sort.
- These class notes on sorting (in detail!) -- just the sorting section. Be sure to really read these notes, including for example watching the videos of the sorting animations.
- Review xSortLab
Run
xSortLab, and...
- Do a visual sort of bubbleSort, and predict each step before confirming it visually.
- Do a visual sort of selectionSort, and predict each step before confirming it visually.
- Do a visual sort of mergeSort, and predict each step before confirming it visually.
- Do a timed sort of selectionSort, and confirm that doubling the input size increases the runtime by about 4x, and that this prediction grows more accurate as the input size increases.
- Do a timed sort of mergeSort, and confirm that doubling the input size increases the runtime by more than 2x, but just barely, and well under 4x, and that this prediction grows more accurate as the input size increases.
- Rewrite (from scratch, without notes) Selection and Merge Sort
Rewrite the code from scratch for Selection and Merge Sort as written
here. You are responsible for all the code in that file, including selectionSort, mergeSort (and merge), and the timing and testing code. Be sure to study our implementations, and not others online (and be sure not to use recursion!). You need to know this code very well by the upcoming quiz!
- Big-O Analysis: Problems and Solutions
Find the Big-O runtime of the given function by computing the runtime of each line.
Problems |
Mild |
Problem 1 |
def foo(L): #L is a list
i = 1
listLength = len(L)
result = []
while i < listLength:
result += L[i]
i *= 3
return i |
Problem 2 |
def foo(S): #S is a string
stringLength = len(S)
i = stringLength
result = {}
while i > 0:
result.add(S[i])
i //= 3
return result |
Problem 3 |
def foo(L): # L is a list
lenList = len(L)
count = 0
for i in range(lenList):
for j in range(lenList):
count += L[i]
return count |
Medium |
Problem 4 |
def foo(s): #s is a string of length N
result = 0
for char in string.ascii_lowercase:
if char in s:
s = s[1:]
result += 1
return result |
Problem 5 |
def foo(s):
return len(s) |
Problem 6 |
def foo(L): #L is a list
n = len(L)
for i in range(n**2, n**3, n):
L.append(i)
for j in range(n//5, n//2, n//10):
L.pop()
return L |
Spicy |
Problem 7 |
def foo(L):
result = []
for i in range(1, len(sorted(L)) + 1):
newList = len(L) * [i]
result.extend(newList)
return sorted(result) |
Problem 8 |
def foo(L): # L is a square, 2D list
n = len(L)
j = 1
count = 0
while j < n:
for i in range(n):
if max(L[j]) in L[i]:
count += 1
j *= 2
return count |
Problem 9 |
def bigOh(L):
new = list()
for i in range(len(L)):
new.extend(L[:i:2])
new.sort()
result = set(new)
return result
|
Solutions |
Mild |
Problem 1 |
def foo(L): #L is a list
i = 1 # O(1)
listLength = len(L) # O(1)
result = [] # O(1)
while i < listLength: # O(log(N))
result += L[i] # O(1)
i *= 3 # O(1)
return i # O(1)
# Overall -- O(log(N)) |
Problem 2 |
def foo(S): #S is a string
stringLength = len(S) # O(1)
i = stringLength # O(1)
result = {} # O(1)
while i > 0: # O(log(N))
result.add(S[i]) # O(1)
i //= 3 # O(1)
return result # O(1)
# Overall -- O(log(N)) |
Problem 3 |
def foo(L): # L is a list
lenList = len(L) # O(1)
count = 0 # O(1)
for i in range(lenList): # O(N)
for j in range(lenList): # O(N)
count += L[i] # O(1)
return count # O(1)
# Overall -- O(N ** 2) |
Medium |
Problem 4 |
def foo(s): #s is a string of length N
result = 0 #O(1)
for char in string.ascii_lowercase: #O(1)
if char in s: #O(N)
s = s[1:] #O(N)
result += 1 #O(1)
return result #O(1)
#Overall - #O(N) |
Problem 5 |
def foo(s):
return len(s) # O(1)
# Overall O(1) |
Problem 6 |
def foo(L): #L is a list
n = len(L) #O(1)
for i in range(n**2, n**3, n): #O(n**2)
L.append(i) #O(1)
for j in range(n//5, n//2, n//10): #O(1)
L.pop() #O(1)
return L #O(1)
#Overall: O(n**2) |
Spicy |
Problem 7 |
def foo(L):
result = [] # O(1)
# initial computation of O(nlogn), then runs O(n) times
for i in range(1, len(sorted(L)) + 1): # O(nlogn) + n iterations
newList = len(L) * [i] # O(n)
result.extend(newList) # O(n)
# result has length O(n**2)
return sorted(result) # n**2 log(n**2)
# Overall O(n**2 log(n)) |
Problem 8 |
def foo(L): # L is a square, 2D list
n = len(L) #O(1)
j = 1 #O(1)
count = 0 #O(1)
while j < n: #O(logn)
for i in range(n): #O(n)
if max(L[j]) in L[i]: #O(n)
count += 1 #O(1)
j *= 2 #O(1)
return count #O(1)
#Overall: O(n**2logn) |
Problem 9 |
def bigOh(L):
new = list() # O(1)
for i in range(len(L)): # n times
new.extend(L[:i:2]) # O(i) = O(n)
new.sort() # O(n**2 log(n))
result = set(new) # O(n**2)
return result # O(1)
# O(n**2 log(n)) |
- mostCommonName(L) in O(n) time
Write the function mostCommonName, that takes a list of names (such as ["Jane", "Aaron", "Cindy", "Aaron"], and returns the most common name in this list (in this case, "Aaron"). If there is more than one such name, return a set of the most common names. So mostCommonName(["Jane", "Aaron", "Jane", "Cindy", "Aaron"]) returns the set {"Aaron", "Jane"}. If the set is empty, return None. Also, treat names case sensitively, so "Jane" and "JANE" are different names. You should write three different versions, one that runs in O(n**2), O(nlogn) and O(n).
def mostCommonName(L):
return 42 # place your answer here!
def testMostCommonName():
print("Testing mostCommonName()...", end="")
assert(mostCommonName(["Jane", "Aaron", "Cindy", "Aaron"])
== "Aaron")
assert(mostCommonName(["Jane", "Aaron", "Jane", "Cindy", "Aaron"])
== {"Aaron", "Jane"})
assert(mostCommonName(["Cindy"]) == "Cindy")
assert(mostCommonName(["Jane", "Aaron", "Cindy"])
== {"Aaron", "Cindy", "Jane"})
assert(mostCommonName([]) == None)
print("Passed!")
testMostCommonName()
- invertDictionary(d)
Write the function invertDictionary(d) that takes a dictionary d that maps keys
to values and returns a dictionary of its inverse, that maps the original values back to their keys. One complication: there can be duplicate values in the original dictionary. That is, there can be keys
k1 and k2 such that (d[k1] == v) and (d[k2] == v) for the same value v.
For this reason, we will in fact map values back to the set of keys
that originally mapped to them. So, for example:
assert(invertDictionary({1:2, 2:3, 3:4, 5:3}) ==
{2:set([1]), 3:set([2,5]), 4:set([3])})
Also, you may assume that the values in the original dictionary are all
immutable, so that they are legal keys in the resulting inverted dictionary.
- mergeSortWithOneAuxList(a)
Write the function mergeSortWithOneAuxList(a) that works just like mergeSort from the notes, only here you can only create a single aux list just one time, rather than once for each call to merge. To do this, you will need to create the aux list in the outer function (mergeSortWithOneAuxList) and then pass it as a parameter into the merge function. The rest is left to you. In a comment at the top of this function, include some timing measurements comparing this function to the original mergeSort, and a brief reflection on whether or not this change was worthwhile.
- threeWayMergesort(L)
First, write the function threeWayMergesort(L) that takes a list L and does
a modified version of mergesort on L, where instead of combining two
sorted sublists at a time, you should
combine three sorted sublists at a time (so after one pass, you have sublists of size
3, and after two passes, sublists of size 9, and so on). You may not assume
that the size of the list is a power of 3.
Next, in a triple-quoted string just below your threeWayMergesort function,
write a short proof that the function runs in O(nlogn) -- that is, in the
same big-oh worst-case runtime as normal two-way mergesort. Then,
write the function threeWayMergesortTiming() that empirically demonstrates that
threeWayMergesort runs in the same big-oh as normal (two-way) mergesort
(no more than a constant time faster or slower, even as n grows large).
So all that extra work to write threeWayMergesort was for naught. Sigh.
When you are done with this function, run it, get the timing data that
proves your point, and include it with a very brief explanation
at the end of that triple-quoted string just below threeWayMergesort.
- Short Answers
- For each of these functions, circle the closest family it belongs to
(big O notation).
A. T = 2N + 5
O(logN) O(N) O(NlogN) O(N**2) O(N**3)
B. T = N + logN
O(logN) O(N) O(NlogN) O(N**2) O(N**3)
C. T = N**3 + 45N**2 + 100logN
O(logN) O(N) O(NlogN) O(N**2) O(N**3)
D. T = 10N * 3N
O(logN) O(N) O(NlogN) O(N**2) O(N**3)
E. T = 10N + 3N
O(logN) O(N) O(NlogN) O(N**2) O(N**3)
F. T = number of steps merge sort takes to sort a list of N numbers
O(logN) O(N) O(NlogN) O(N**2) O(N**3)
- Briefly describe how hashing works for a set.
- Answer the following questions:
- Your code takes 5 seconds to run on an input of length 1000. How long does your
code take to run on an input of length 2000 if it runs in O(n**2)?
- Your code takes 0.005 seconds to run on an input of length 1000 and 0.01 seconds
to run on an input of length 4000. What is the big oh?
- Your code takes 10 seconds to run on an input of length 1,000,000 and 20 seconds
to run on an input of length 2,000,000. What is the big oh?
- Fill in the blanks with the big-oh of each line and each function overall:
def bigOh4(L): # N = len(L), L is a list
N = len(L) # O(_)
R = [ ] # O(_)
for k in L: # O(_)
M, s, d = copy.copy(L), set(), dict() # O(_)
while (M != [ ]): # O(_)
s.add(M[0]**2) # O(_)
a = M.pop() # O(_)
d[k] = a + k # O(_)
M = M[::-1] # O(_)
R += [(k + v**3) for v in s] # O(_)
return (min(R), max(R)) # O(_)
# Overall: O(_)
def bigOh5(L): # N = len(L), L is a list
N = len(L) # O(_)
M1 = [L[i]**2 for i in range(1, len(L), 3)] # O(_)
M2 = [L[i]**3 for i in range(1, 3)] # O(_)
M3 = sorted([x*y for x in L for y in L]) # O(_)
return sum(sorted(M1 + M2 + M3)) # O(_)
# Overall: O(_)
def bigOh6(s): # N = len(s), s is a string
for letter in string.ascii_uppercase: # O(_)
if s[-1] == letter: # O(_)
return "" # O(_)
i = len(s) - 1 # O(_)
result = "" # O(_)
while i >= 0: # O(_)
result += s[int(i)] # O(_)
i -= len(s) / 4 # O(_)
return result # O(_)
# Overall: O(_)
- Code Tracing: indicate what each of the following will print:
def ct1(n):
s, t = set(), set()
while (n > 0):
(d, n) = (n%10, n//10)
if (d in t): t.remove(d)
elif (d in s): t.add(d)
s.add(d)
return sorted(t)
print(ct1(13051231))
def ct2(d, key):
while (key in d) and ((key+2) not in d):
d[key+2] = key+1
key = d[key]
L = [ ]
for key in sorted(d.keys()):
L.append(10*key + d[key])
return L
print(ct2({1:5, 0:2}, 0))
- Reasoning over Code: For each function, give an input that makes the function return True:
def rc1Helper(d):
s = "" for key in d:
if key % 2 == 0:
s += d[key]
return s
def rc1(d):
assert(sorted(d.keys()) == list(range(0,4)))
s = rc1Helper(d)
return s == "good luck!"
def rc2(d):
i = j = k = m = 0
for key in d:
i += 1
value = d[key]
if (key == value):
j = min(value, j)
k = max(value, k)
else:
m += 1
return ((i, j, k, m) == (4, -2, +2, 1))
- mostCommonWebsite(L)
As a busy CMU student, you notice that you're getting distracted a lot while you're trying to work and decide to put that to rest by using your internet history to track which sites you visit the most. Luckily, since you're a superstar 112 student who just learned about efficiency, you can do this analysis super fast.
Write the function mostCommonWebsite, that takes a browser history (list
of strings) such as ["google.com", "agar.io", "cs.cmu.edu/~112", "agar.io"],
and returns a set of the most commonly visited sites in this list (in this case, there is only one most common site ("agar.io"), so we return {"agar.io"}). If there is more than one most common site, then return a set containing all of them. So, mostCommonWebsite(["cs.cmu.edu/~112", "agar.io", "cs.cmu.edu/~112", "google.com", "agar.io"]) returns the set {"agar.io", "cs.cmu.edu/~112"} since both occur twice. Your solution should run in O(n) where n is the length of your history.
- mostPopularFriend(d)
Recall that friendsOfFriends(d) takes a dictionary d like this:
d = dict()
d["fred"] = set(["wilma", "betty", "barney"])
d["wilma"] = set(["fred", "betty", "dino"])
With this in mind, write the function mostPopularFriend(d) that takes a dictionary of that form, and returns the name that occurs the most number of times in all the sets of friends. In the example above, mostPopularFriend(d) would return "betty". You may assume that there is exactly one such name, so ignore ties.
- findTriplets(L)
Write the function findTriplets(L) that takes as input a list L of integers of length N and returns a set of all triplets in the list whose sum is equal to 0.
For example, if the given list is [-1, 0, -3, 2, 1], you should return {(1, 0, -1), (-3, 2, 1)} (or any permutation of those numbers). If there is no valid triplet, you should return the empty set. You may assume that L is a list containing only integers.
This must be written in N**2 time.