Heap queue algorithm dissection
Table of Contents
Theory overview
6.006 Introduction to Algorithms Heaps and heap sort
Applied usage as a Priority Queue
Implements a set
S
of elements, each of elements associated with a key.
Typical operations
- Insert(S, x): insett element x into set
S
- Max(S): return element of
S
with a largest key - Extract_max(S): ^^^ and remove it from
S
- Increase(S, x, k): increase the value of x's key to new value k
Figure 1: Max-heap
Heap as a Tree
- Root of tree: first element
- Parent(i) = i/2
- Left(i) = 2i
- Right(i) = 2i + 1
Linear view
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|
100 | 19 | 36 | 17 | 3 | 25 | 1 | 2 | 7 |
Max-Heap property
The key of a node >= the keys of its childen
Heap operations
Build_max_heap:
produces a max heap from an unsorted array
max_heapify:
correct a single violation of the heap property in a subtree's root
Tree leaves definition
Elements A[n/2+1 … n] are all leaves O(nlgn) simple analysis
Time complexity
Observe:
max_heapify
takes O(1) for nodes that are one level above the leaves and in general O(l) time for nodes that are l levels above the leaves
Other sources
Wikipedia article
Python implementation
Docs
Official documentation proclaims:
Heapq.heapyfy(x)
ransform list x into a heap, in-place, in linear time
Third-party documentation
The heap is an integral component in many algorithms — a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.
Code samples
Lets take an attempt to disclose much unique and useful heapq features.
Heap and numbers
Integers
from heapq import heapify h1 = [-1, 3, 5, 7, 9, 2, 4, 6, 8, 0] heapify(h1) print(h1)
>>> >>> >>> [-1, 0, 2, 6, 3, 5, 4, 7, 8, 9]
Decimal
h2 = [-2.2, 3.8, 1.1, -5] heapify(h2) print(h2)
>>> [-5, -2.2, 1.1, 3.8]
Heap of letters
h3 = ['f', 'd', 'q', 'g', 'b'] heapify(h3) print(h3)
>>> ['b', 'd', 'q', 'g', 'f']
Heap of words
h4 = ['break', 'border', 'backer', 'bachelor', 'baccara'] heapify(h4) print(h4)
>>> ['baccara', 'bachelor', 'backer', 'break', 'border']
The latter output looks a bit odd, but it's a heap
by
its definition (!)
Unexpected results
In the same manner, nlargest
& nsmallest
wouldn't work as expected:
from heapq import nlargest, nsmallest, heappush print(nlargest(3, h4)) # for clarity heappush(h4, 'baby') print(h4) print('\n') print(nsmallest(3, h4))
>>> ['break', 'border', 'backer'] ... >>> ['baby', 'bachelor', 'baccara', 'break', 'border', 'backer'] ['baby', 'baccara', 'bachelor']
Seems like nlargest
return the largest values in a descendant order,
but nsmallest
do it in a rising one.
Heap of tuples
from heapq import heappop h = [] heappush(h, (5, 'write code')) heappush(h, (7, 'release product')) heappush(h, (1, 'write spec')) heappush(h, (3, 'create tests')) print(type(h),'\n', h) heappop(h)
>>> >>> >>> >>> >>> >>> >>> <class 'list'> [(1, 'write spec'), (3, 'create tests'), (5, 'write code'), (7, 'release product')] (1, 'write spec')
As you might recon a heap in this particular case is a specific list, which is sorted on the fly.
heappush(h, (6, 'test it up')) print('\n', h)
[(3, 'create tests'), (6, 'test it up'), (5, 'write code'), (7, 'release product')]
Obviously that order is corrupted, but it is strictly following by the
heap
definition.
heappop(h) print('\n', h) heappop(h)
(3, 'create tests') [(5, 'write code'), (6, 'test it up'), (7, 'release product')] (5, 'write code')
Time complexity
From StackOverflow
If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.
The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.
The sorted is faster if you will need to retrieve all elements in order later.
The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.
Lets compare how fast they are work
Comparison Task: We have a large list of integers which is appended 1000 times with a single value during each iteration. Try to extract 10 smallest values out of it after every modification. Compare the execution time
Lets create a random sequence of integers
from random import randint numbers = [randint(0,10000) for i in range(100)]
Comparison between heapq
and sort
Lets create a function to detect minimum values in the data-set:
def min_10_values_by_native_sort(l): l.sort() return l[:10]
And using heapq
module
from heapq import nsmallest, heapify def min_10_values_in_the_heap(h): heapify(h) return nsmallest(10, h)
Now it's time to evaluate with this data-set:
def time_complexity_analysis(n, l, f): result = [] for i in range(n): l.append(randint(0,100)) result.append(f(l)) return result[-1]
i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); time_complexity_analysis(10, numbers, min_10_values_in_the_heap); print(time()-i)
[24, 29, 30, 34, 52, 54, 64, 65, 72, 93] 0.00017976760864257812 [1, 5, 24, 25, 27, 29, 30, 34, 41, 50] 0.0005156993865966797
Algorithm analysis
Obviously that heapq
module works 4 time slowly than a native sort.
But try to elaborate our algorithm:
heapify
initial data-set onceheappush
new values into it many times- detect
nsmallest
after each modification
Wrong usage #2
Lets restrict our algorithm to heapify
the data array only once.
def custom_heap_algorithm(n, h): """h is a preliminarily prepared heap""" result = [] for i in range(n): heappush(h, randint(0, 100)) result.append(nsmallest(10, h)) return result[-1]
i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); heapify(numbers); custom_heap_algorithm(10, numbers); print(time()-i)
[1, 1, 5, 19, 23, 24, 25, 27, 29, 30] 0.0001804828643798828 [1, 1, 2, 5, 8, 14, 18, 19, 23, 24] 0.00045680999755859375
i = time(); time_complexity_analysis(10**3, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); heapify(numbers); custom_heap_algorithm(10**3, numbers); print(time()-i)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0.02643871307373047 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0.2994558811187744
blog comments powered by Disqus