# Heap queue algorithm dissection

## Table of Contents

## Theory overview

### 6.006 Introduction to Algorithms Heaps and heap sort

#### Applied usage as a `Priority Queue`

Implements a set

`S`

of elements, each of elements associated with a key.

#### Typical operations

- Insert(S, x): insett element x into set
`S`

- Max(S): return element of
`S`

with a largest key - Extract_max(S): ^^^ and remove it from
`S`

- Increase(S, x, k): increase the value of x's key to new value k

Figure 1: Max-heap

#### Heap as a Tree

- Root of tree: first element
- Parent(i) = i/2
- Left(i) = 2i
- Right(i) = 2i + 1

#### Linear view

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|

100 | 19 | 36 | 17 | 3 | 25 | 1 | 2 | 7 |

#### Max-Heap property

The key of a node >= the keys of its childen

#### Heap operations

##### Build_max_heap:

produces a max heap from an unsorted array

##### max_heapify:

correct a single violation of the heap property in a subtree's root

#### Tree leaves definition

Elements A[n/2+1 … n] are all leaves O(nlgn) simple analysis

#### Time complexity

Observe:

`max_heapify`

takes O(1) for nodes that are one level above the leaves and in general O(l) time for nodes that are l levels above the leaves

### Other sources

Wikipedia article

## Python implementation

### Docs

#### Official documentation proclaims:

`Heapq.heapyfy(x)`

ransform list x into a heap, in-place, in linear time

#### Third-party documentation

The heap is an integral component in many algorithms — a data structure that keeps elements organised so that it is always easy to

find the smallest value. We'll show you how you can use the heapq module to implement heaps in Python in just a few lines of code.

### Code samples

Lets take an attempt to disclose much unique and useful heapq features.

#### Heap and numbers

##### Integers

from heapq import heapify h1 = [-1, 3, 5, 7, 9, 2, 4, 6, 8, 0] heapify(h1) print(h1)

>>> >>> >>> [-1, 0, 2, 6, 3, 5, 4, 7, 8, 9]

##### Decimal

h2 = [-2.2, 3.8, 1.1, -5] heapify(h2) print(h2)

>>> [-5, -2.2, 1.1, 3.8]

#### Heap of letters

h3 = ['f', 'd', 'q', 'g', 'b'] heapify(h3) print(h3)

>>> ['b', 'd', 'q', 'g', 'f']

#### Heap of words

h4 = ['break', 'border', 'backer', 'bachelor', 'baccara'] heapify(h4) print(h4)

>>> ['baccara', 'bachelor', 'backer', 'break', 'border']

The latter output *looks a bit odd*, but it's a `heap`

by
its definition (!)

#### Unexpected results

In the same manner, `nlargest`

& `nsmallest`

*wouldn't* work as expected:

from heapq import nlargest, nsmallest, heappush print(nlargest(3, h4)) # for clarity heappush(h4, 'baby') print(h4) print('\n') print(nsmallest(3, h4))

>>> ['break', 'border', 'backer'] ... >>> ['baby', 'bachelor', 'baccara', 'break', 'border', 'backer'] ['baby', 'baccara', 'bachelor']

Seems like `nlargest`

return the largest values in a descendant order,
but `nsmallest`

do it in a rising one.

#### Heap of tuples

from heapq import heappop h = [] heappush(h, (5, 'write code')) heappush(h, (7, 'release product')) heappush(h, (1, 'write spec')) heappush(h, (3, 'create tests')) print(type(h),'\n', h) heappop(h)

>>> >>> >>> >>> >>> >>> >>> <class 'list'> [(1, 'write spec'), (3, 'create tests'), (5, 'write code'), (7, 'release product')] (1, 'write spec')

As you might recon a heap in this particular case is a specific list, which is sorted on the fly.

heappush(h, (6, 'test it up')) print('\n', h)

[(3, 'create tests'), (6, 'test it up'), (5, 'write code'), (7, 'release product')]

Obviously that order is corrupted, but it is strictly following by the
`heap`

definition.

heappop(h) print('\n', h) heappop(h)

(3, 'create tests') [(5, 'write code'), (6, 'test it up'), (7, 'release product')] (5, 'write code')

### Time complexity

From StackOverflow

If you use binary heap to pop all elements in order, the thing you do is basically heapsort. It is slower than sort algorightm in sorted function apart from it's implementation is pure python.

The heapq is faster than sorted in case if you need to add elements on the fly i.e. additions and insertions could come in unspecified order. Adding new element preserving inner order in any heap is faster than resorting array after each insertion.

The sorted is faster if you will need to retrieve all elements in order later.

The only problem where they can compete - if you need some portion of smallest (or largest) elements from collection. Although there are special algorigthms for that case, whether heapq or sorted will be faster here depends on the size of the initial array and portion you'll need to extract.

Lets compare how fast they are work

Comparison Task: We have a large list of integers which is appended 1000 times with a single value during each iteration. Try to extract 10 smallest values out of it after every modification. Compare the execution time

Lets create a random sequence of integers

from random import randint numbers = [randint(0,10000) for i in range(100)]

Comparison between `heapq`

and `sort`

Lets create a function to detect minimum values in the data-set:

def min_10_values_by_native_sort(l): l.sort() return l[:10]

And using `heapq`

module

from heapq import nsmallest, heapify def min_10_values_in_the_heap(h): heapify(h) return nsmallest(10, h)

Now it's time to evaluate with this data-set:

def time_complexity_analysis(n, l, f): result = [] for i in range(n): l.append(randint(0,100)) result.append(f(l)) return result[-1]

i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); time_complexity_analysis(10, numbers, min_10_values_in_the_heap); print(time()-i)

[24, 29, 30, 34, 52, 54, 64, 65, 72, 93] 0.00017976760864257812 [1, 5, 24, 25, 27, 29, 30, 34, 41, 50] 0.0005156993865966797

#### Algorithm analysis

Obviously that `heapq`

module works 4 time slowly than a native sort.
**But** try to elaborate our algorithm:

`heapify`

initial data-set once`heappush`

new values into it many times- detect
`nsmallest`

after each modification

#### Wrong usage #2

Lets restrict our algorithm to `heapify`

the data array only once.

def custom_heap_algorithm(n, h): """h is a preliminarily prepared heap""" result = [] for i in range(n): heappush(h, randint(0, 100)) result.append(nsmallest(10, h)) return result[-1]

i = time(); time_complexity_analysis(10, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); heapify(numbers); custom_heap_algorithm(10, numbers); print(time()-i)

[1, 1, 5, 19, 23, 24, 25, 27, 29, 30] 0.0001804828643798828 [1, 1, 2, 5, 8, 14, 18, 19, 23, 24] 0.00045680999755859375

i = time(); time_complexity_analysis(10**3, numbers, min_10_values_by_native_sort); print(time()-i) i = time(); heapify(numbers); custom_heap_algorithm(10**3, numbers); print(time()-i)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0.02643871307373047 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0.2994558811187744

blog comments powered by Disqus