Theory overview

6.006 Introduction to Algorithms Heaps and heap sort

Applied usage as a Priority Queue

Implements a set S of elements, each of elements associated with a key.

Typical operations

  • Insert(S, x): insett element x into set S
  • Max(S): return element of S with a largest key
  • Extract_max(S): ^^^ and remove it from S
  • Increase(S, x, k): increase the value of x's key to new value k

Max-heap

Figure 1: Max-heap

Heap as a Tree

  • Root of tree: first element
  • Parent(i) = i/2
  • Left(i) = 2i
  • Right(i) = 2i + 1

Linear view

1 2 3 4 5 6 7 8 9
100 19 36 17 3 25 1 2 7

Max-Heap property

The key of a node >= the keys of its childen

Heap operations

Build_max_heap:

produces a max heap from an unsorted array

max_heapify:

correct a single violation of the heap property in a subtree's root

Tree leaves definition

Elements A[n/2+1 … n] are all leaves O(nlgn) simple analysis

Time complexity

Observe: max_heapify takes O(1) for nodes that are one level above the leaves and in general O(l) time for nodes that are l levels above the leaves

Other sources

Wikipedia article

Python implementation

Docs

Official documentation proclaims:

Heapq.heapyfy(x) ransform list x into a heap, in-place, in linear time

Third-party documentation

The heap is an integral component in many algorithms — a data structure that keeps elements organised so that it is always easy to find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.

Code samples

Lets take an attempt to disclose much unique and useful heapq features.

Heap and numbers

Integers
from heapq import heapify

h1 = [-1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
heapify(h1)
print(h1)
>>> >>> >>> [-1, 0, 2, 6, 3, 5, 4, 7, 8, 9]
Decimal
h2 = [-2.2, 3.8, 1.1, -5]
heapify(h2)
print(h2)
>>> [-5, -2.2, 1.1, 3.8]

Heap of letters

h3 = ['f', 'd', 'q', 'g', 'b']
heapify(h3)
print(h3)
>>> ['b', 'd', 'q', 'g', 'f']

Heap of words

h4 = ['break', 'border', 'backer', 'bachelor', 'baccara']
heapify(h4)
print(h4)
>>> ['baccara', 'bachelor', 'backer', 'break', 'border']

The latter output looks a bit odd, but it's a heap by its definition (!)

Unexpected results

In the same manner, nlargest & nsmallest wouldn't work as expected:

from heapq import nlargest, nsmallest, heappush

print(nlargest(3, h4))
# for clarity
heappush(h4, 'baby')
print(h4)
print('\n')
print(nsmallest(3, h4))
>>> ['break', 'border', 'backer']
... >>> ['baby', 'bachelor', 'baccara', 'break', 'border', 'backer']

['baby', 'baccara', 'bachelor']

Seems like nlargest return the largest values in a descendant order, but nsmallest do it in a rising one.

Heap of tuples

from heapq import heappop


h = []
heappush(h, (5, 'write code'))
heappush(h, (7, 'release product'))
heappush(h, (1, 'write spec'))
heappush(h, (3, 'create tests'))
print(type(h),'\n', h)
heappop(h)
>>> >>> >>> >>> >>> >>> >>> <class 'list'> 
 [(1, 'write spec'), (3, 'create tests'), (5, 'write code'), (7, 'release product')]
(1, 'write spec')

As you might recon a heap in this particular case is a specific list, which is sorted on the fly.

heappush(h, (6, 'test it up'))
print('\n', h)
[(3, 'create tests'), (6, 'test it up'), (5, 'write code'), (7, 'release product')]

Obviously that order is corrupted, but it is strictly following by the heap definition.

heappop(h)
print('\n', h)
heappop(h)
(3, 'create tests')
[(5, 'write code'), (6, 'test it up'), (7, 'release product')]
(5, 'write code')

Time complexity

From StackOverflow

If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.

The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.

The sorted is faster if you will need to retrieve all elements in order later.

The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.

Lets compare how fast they are work

Comparison Task: We have a large list of integers which is appended 1000 times with a single value during each iteration. Try to extract 10 smallest values out of it after every modification. Compare the execution time

Lets create a random sequence of integers

from random import randint
numbers = [randint(0,10000) for i in range(100)]

Comparison between heapq and sort

Lets create a function to detect minimum values in the data-set:

def min_10_values_by_native_sort(l):
    l.sort()
    return l[:10]

And using heapq module

from heapq import nsmallest, heapify

def min_10_values_in_the_heap(h):
    heapify(h)
    return nsmallest(10, h)

Now it's time to evaluate with this data-set:

def time_complexity_analysis(n, l, f):
    result = []
    for i in range(n):
        l.append(randint(0,100))
        result.append(f(l))
    return result[-1]
i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i)
i = time(); time_complexity_analysis(10, numbers, min_10_values_in_the_heap); print(time()-i)
[24, 29, 30, 34, 52, 54, 64, 65, 72, 93]
0.00017976760864257812
[1, 5, 24, 25, 27, 29, 30, 34, 41, 50]
0.0005156993865966797

Algorithm analysis

Obviously that heapq module works 4 time slowly than a native sort. But try to elaborate our algorithm:

  • heapify initial data-set once
  • heappush new values into it many times
  • detect nsmallest after each modification

Wrong usage #2

Lets restrict our algorithm to heapify the data array only once.

def custom_heap_algorithm(n, h):
    """h is a preliminarily prepared heap"""
    result = []
    for i in range(n):
        heappush(h, randint(0, 100))
        result.append(nsmallest(10, h))
    return result[-1]
i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i)
i = time(); heapify(numbers); custom_heap_algorithm(10, numbers); print(time()-i)
[1, 1, 5, 19, 23, 24, 25, 27, 29, 30]
0.0001804828643798828
[1, 1, 2, 5, 8, 14, 18, 19, 23, 24]
0.00045680999755859375
i = time(); time_complexity_analysis(10**3, numbers, min_10_values_by_native_sort); print(time()-i)
i = time(); heapify(numbers); custom_heap_algorithm(10**3, numbers); print(time()-i)
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
0.02643871307373047
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
0.2994558811187744


blog comments powered by Disqus

Published

04 November 2016

Categories

emacs Python algorithms heap literate programming

Tags