Sorting Methods in STL Libraries

Last translate with upstream: 08eb0c5 PR #3333 on Jul 17, 2021.

This article will briefly introduce sorting algorithms implemented in C standard library and C++ standard template library.

Unless otherwise specified, all functions mentioned in this article are defined in header <algorithm>.

qsort

Reference: qsort, std::qsort

This function is an implementation of quicksort in C standard library defined in <stdlib.h>. In C++ standard library it is defined in <cstdlib>.

std::sort

Reference: std::sort

Usage:

// a[0] .. a[n - 1]: array needs to be sorted
std::sort(a, a + n);
// The above code directly modifies the order of elements in the array so that it is now sorted in ascending order.

// cmp is a custom comparison function
std::sort(a, a + n, cmp);  

std::sort is a sorting function in standard library that is more commonly used. The third parameter requires a binary comparison function. If cmp is not specified the array will be sorted from smallest to largest by default (or, std::less<>).

The old version of the C++ standard only requires its average time complexity to reach O(n\log n) , but since the C++11 standard requires its worst time complexity to reach O( n\log n)

However the C++ standard does not strictly specify the implementation of this function and it depends on the compiler used. In both libstdc++ and libc++, introsort is used.

std::nth_element

Reference: std::nth_element

Usage:

std::nth_element(first, nth, last);
std::nth_element(first, nth, last, cmp);

It re-sorts elements between [first, last) , making the element pointed by nth be the element at the same position in ordered version of [first, last) . Any elements before the new n -th element are equal to or smaller than the new n -th element.

The implementation is the unfinished introsort.

For the two usages, the C++ standard requires their average-case time complexity to be O(n) , where n = std::distance(first, last).

Often used to build K-D Tree.

std::stable_sort

Reference: std::stable_sort

Usage:

std::stable_sort(begin, end);
std::stable_sort(begin, end, cmp);

Stable sort means that the relative positions of equal elements after the sorting are the same as the original sequence.

The time complexity is O(n\log (n)^2) , or O(n\log n) if extra memory is available.

std::partial_sort

Reference: std::partial_sort

Usage:

std::partial_sort(first, mid, last);
std::partial_sort(first, mid, last, cmp);

Place the top k smallest elements in the first k positions in the sequence (the order of equal elements is not guaranteed), and the order of the remaining elements is not specified. If cmp is not specified the array will be sorted from smallest to largest by default (or, std::less<>).

Time complexity: Approximately performing cmp() for (last-first)\log(mid-first) times.

Principle:

The principle of implementing partial_sort is to perform the make_heap() operation on the elements in the interval [first, middle) in the original sequence to construct a max heap, and then take each element in [middle, last) to compare with the first — the maximum value in the heap. If it is less than the maximum value, the two elements are swapped, and the elements in [first, middle) are adjusted to maintain the max heap. After the comparison, the elements in [first, middle) are sorted by sort_heap() in ascending order. Please note that heap order is different ascending order.

Define operator

Reference: Operator Overloading.

For both built-in types (such as int) and user-defined structures, it is allowed to define a comparison function when calling the STL sorting function. You can pass a comparison function (usually the last parameter) when calling the function.

For user-defined structures, it is required to define at least one relationship operator or provide a comparison function before calling a STL sort function with it. Generally it is recommended to define operator<. 1

Here are a few examples:

int a[1009], n = 10;
// ......
std::sort(a + 1, a + 1 + n);                  // sort in ascending order
std::sort(a + 1, a + 1 + n, greater<int>());  // sort in descending order
struct data {
  int a, b;
  bool operator<(const data rhs) const {
    return (a == rhs.a) ? (b < rhs.b) : (a < rhs.a);
  }
} da[1009];
bool cmp(const data u1, const data u2) {
  return (u1.a == u2.a) ? (u1.b > u2.b) : (u1.a > u2.a);
}
// ......
std::sort(da + 1, da + 1 + 10);  // use the <operator defined in the structure to sort in ascending order
std::sort(da + 1, da + 1 + 10, cmp);  // use cmp function to compare, sort in descending order

Strictly weak order

Reference: Strict weak orderings The sorting operator must satisfy the strict weak ordering, otherwise there will be unpredictable situations (such as runtime error, can not be sorted correctly, etc).

The requirements of the strict weak order:

  1. x \not< x (non-reflexive)
  2. If x < y , then y \not< x (asymmetry)
  3. If x < y, y < z , then x < z (transitivity)
  4. If x \not< y, y \not< x, y \not< z, z \not< y , then x \not< z, z \not< x (incomparable transitivity)

Common mistakes:

  • Use <= to define the less-than operator in sorting.
  • When calling the sorting operator, reading the external value may change the array. (commonly seen in the shortest path algorithm)
  • The result of comparing the maximum and minimum values of multiple numbers is used as a sorting operator. (For example, the classic error in queens game or processing production scheduling)

Footnotes


  1. It is because most standard algorithms are using operator< for comparison. 


Comments