Heap queue algorithm dissection
Table of Contents
- Theory overview
- Python implementation
6.006 Introduction to Algorithms Heaps and heap sort
Applied usage as a
Implements a set
Sof elements, each of elements associated with a key.
- Insert(S, x): insett element x into set
- Max(S): return element of
Swith a largest key
- Extract_max(S): ^^^ and remove it from
- Increase(S, x, k): increase the value of x's key to new value k
Figure 1: Max-heap
Heap as a Tree
- Root of tree: first element
- Parent(i) = i/2
- Left(i) = 2i
- Right(i) = 2i + 1
The key of a node >= the keys of its childen
produces a max heap from an unsorted array
correct a single violation of the heap property in a subtree's root
Tree leaves definition
Elements A[n/2+1 … n] are all leaves O(nlgn) simple analysis
max_heapifytakes O(1) for nodes that are one level above the leaves and in general O(l) time for nodes that are l levels above the leaves
Official documentation proclaims:
Heapq.heapyfy(x)ransform list x into a heap, in-place, in linear time
The heap is an integral component in many algorithms — a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.
Lets take an attempt to disclose much unique and useful heapq features.
Heap and numbers
from heapq import heapify h1 = [-1, 3, 5, 7, 9, 2, 4, 6, 8, 0] heapify(h1) print(h1)
>>> >>> >>> [-1, 0, 2, 6, 3, 5, 4, 7, 8, 9]
h2 = [-2.2, 3.8, 1.1, -5] heapify(h2) print(h2)
>>> [-5, -2.2, 1.1, 3.8]
Heap of letters
h3 = ['f', 'd', 'q', 'g', 'b'] heapify(h3) print(h3)
>>> ['b', 'd', 'q', 'g', 'f']
Heap of words
h4 = ['break', 'border', 'backer', 'bachelor', 'baccara'] heapify(h4) print(h4)
>>> ['baccara', 'bachelor', 'backer', 'break', 'border']
The latter output looks a bit odd, but it's a
its definition (!)
In the same manner,
wouldn't work as expected:
from heapq import nlargest, nsmallest, heappush print(nlargest(3, h4)) # for clarity heappush(h4, 'baby') print(h4) print('\n') print(nsmallest(3, h4))
>>> ['break', 'border', 'backer'] ... >>> ['baby', 'bachelor', 'baccara', 'break', 'border', 'backer'] ['baby', 'baccara', 'bachelor']
nlargest return the largest values in a descendant order,
nsmallest do it in a rising one.
Heap of tuples
from heapq import heappop h =  heappush(h, (5, 'write code')) heappush(h, (7, 'release product')) heappush(h, (1, 'write spec')) heappush(h, (3, 'create tests')) print(type(h),'\n', h) heappop(h)
>>> >>> >>> >>> >>> >>> >>> <class 'list'> [(1, 'write spec'), (3, 'create tests'), (5, 'write code'), (7, 'release product')] (1, 'write spec')
As you might recon a heap in this particular case is a specific list, which is sorted on the fly.
heappush(h, (6, 'test it up')) print('\n', h)
[(3, 'create tests'), (6, 'test it up'), (5, 'write code'), (7, 'release product')]
Obviously that order is corrupted, but it is strictly following by the
heappop(h) print('\n', h) heappop(h)
(3, 'create tests') [(5, 'write code'), (6, 'test it up'), (7, 'release product')] (5, 'write code')
If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.
The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.
The sorted is faster if you will need to retrieve all elements in order later.
The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.
Lets compare how fast they are work
Comparison Task: We have a large list of integers which is appended 1000 times with a single value during each iteration. Try to extract 10 smallest values out of it after every modification. Compare the execution time
Lets create a random sequence of integers
from random import randint numbers = [randint(0,10000) for i in range(100)]
Lets create a function to detect minimum values in the data-set:
def min_10_values_by_native_sort(l): l.sort() return l[:10]
from heapq import nsmallest, heapify def min_10_values_in_the_heap(h): heapify(h) return nsmallest(10, h)
Now it's time to evaluate with this data-set:
def time_complexity_analysis(n, l, f): result =  for i in range(n): l.append(randint(0,100)) result.append(f(l)) return result[-1]
i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); time_complexity_analysis(10, numbers, min_10_values_in_the_heap); print(time()-i)
[24, 29, 30, 34, 52, 54, 64, 65, 72, 93] 0.00017976760864257812 [1, 5, 24, 25, 27, 29, 30, 34, 41, 50] 0.0005156993865966797
heapq module works 4 time slowly than a native sort.
But try to elaborate our algorithm:
heapifyinitial data-set once
heappushnew values into it many times
nsmallestafter each modification
Wrong usage #2
Lets restrict our algorithm to
heapify the data array only once.
def custom_heap_algorithm(n, h): """h is a preliminarily prepared heap""" result =  for i in range(n): heappush(h, randint(0, 100)) result.append(nsmallest(10, h)) return result[-1]
i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); heapify(numbers); custom_heap_algorithm(10, numbers); print(time()-i)
[1, 1, 5, 19, 23, 24, 25, 27, 29, 30] 0.0001804828643798828 [1, 1, 2, 5, 8, 14, 18, 19, 23, 24] 0.00045680999755859375
i = time(); time_complexity_analysis(10**3, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); heapify(numbers); custom_heap_algorithm(10**3, numbers); print(time()-i)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0.02643871307373047 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0.2994558811187744
blog comments powered by Disqus