Order in Python

Effective ways to avoid chaos



Being able to sort a list of items (numbers, words, etc.) is a fundamental aspect in computer sciences. Just have a look at this extensive range of sorting algorithms to start understanding how many people have tried to find an answer to this “simple” task. From “bubble sort” to “simple pancake sort”, there is an algo out there tailored for any task you throw at it.

Intuitively, it is easy to understand why it is such a critical step in data handling. Imagine you need to find a customer’s phone number. Would you access that information faster if your customers were ordered in alphabetical order, or by looking through every single name?

Because Python is user friendly, there is a lot of indexation, sorting, and comparing going on under the hood in Python. You are probably familiar with some data types such as lists, dictionaries, strings, sets, and you probably know some of them are ordered (like strings, or lists), whilst others, such as sets and dictionary, are unordered.

It all depends on the type of data and what you want to do with it but you probably already know if a list is better suited than a set, Python will do all the heavy lifting without you having to worry about it.

This being said, most of the data you are dealing with will probably need some processing before you can get any insights. Fortunately, Python offers a few tools to do this kind of data manipulation.

You can always implement a sorting algorithm from scratch. For example, the following code cell will show you one of the ways to implement Bubble sort:

###################  collections used in this post  ################
>>> numbers = [19, 223, 2, 31, 45, 6, 12, -11, 121, 27, 38]
>>> colours = ['purple', 'red', 'blue', 'green', 'yellow']
>>> doors = dict(zip(numbers, colours))

>>> bubblesort(numbers)
[-11, 2, 6, 12, 19, 27, 31, 38, 45, 121, 223])

And, as Python can compare strings, this sorting function can sort strings in alphabetical order too:

>>> 'b' > 'a'
>>> bubblesort(colours)
['blue', 'green', 'purple', 'red', 'yellow']

It is all well and fine, but this code is rather annoying to maintain:

  • What bit of the code would you change to order the list in descending order?
  • What needs updating if you are after ordering a list of words by the number of letters they contain?
  • What about ordering a dictionary?

Built-in Functions

Fortunately, Python has built-in sorting functions: sorted() and list.sort(). sorted() works on iterables and creates a new ordered list. list.sort() orders the elements of the list in situ.

I am sure it won’t take a lot to convince you to use the built-in functions next time you need to sort data but, just in case:

  • There is no need to reinvent the wheel every time, they are built-in and ready to use.
  • Because they are built-in (and therefore coded in C), they are much quicker than anything you can come up with in python.
  • They offer a lot of flexibility thanks to a couple of arguments they accept.

How fast are the built in functions?

>>> %%timeit
>>> bubblesort(numbers)
[-11, 2, 6, 12, 19, 27, 31, 38, 45, 121, 223]
5.9 µs ± 897 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %%timeit
>>> sorted(numbers)
[-11, 2, 6, 12, 19, 27, 31, 38, 45, 121, 223]
165 ns ± 2.65 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %%timeit
>>> [numbers].sort()
>>> numbers
[-11, 2, 6, 12, 19, 27, 31, 38, 45, 121, 223]
77.8 ns ± 0.698 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The arguments

The syntax for those functions is sorted(iterable[, key = None][, reverse = False]). As you can see they can take up to two arguments:

reverse= is self explanatory. It is a boolean that determines if the ordering is ascending or descending:

>>> sorted(numbers, reverse=True)
[223, 121, 45, 38, 31, 27, 19, 12, 6, 2, -11]

key=, on the other hand, is where the magic happens. It is the parameter that is going to determine the basis used to order the iterable. As you will see below, it can be used in many ways.

For example, our list of colours can be sorted by the length of the words using the len function:

>>> sorted(colours, key=len)
['red', 'blue', 'green', 'purple', 'yellow']

And this is the reason why sorted() is so powerful. The key argument takes a function! Combined with the lambda function for example, it becomes very easy to sort our list of numbers by their squared values:

>>> sorted(numbers, key=lambda x: x*x)
[2, 6, -11, 12, 19, 27, 31, 38, 45, 121, 223]

Or circumvent Python treating capitalized letters separately by sorting the values based on their lower case:

>>> pangram = "The quick brown fox jumps over the lazy dog">>> sorted(pangram.split(), key=str.lower)
['brown', 'dog', 'fox', 'jumps', 'lazy', 'over', 'quick', 'The', 'the']

You can also build a function to sort a list on your own terms. For example, the following is comparing the last letter of words, returning -1, 0 or 1 for every pair compared. cmp_to_key() then translates those values into keys, which are used to sort the list:

>>> from functools import cmp_to_key >>> def chicken_or_egg(a, b): 
... if a[-1] > b[-1]:
... return 1
... elif a[-1] < b[-1]:
... return -1
... else:
... return 0
>>> sorted(colours, key = cmp_to_key(chicken_or_egg))
['red', 'blue', 'purple', 'green', 'yellow']

With these in mind, you might wonder what about dictionaries? How can sorted() be used to order a dictionary?

>>> sorted(doors)
[2, 19, 31, 45, 223]

Applying sorted() to the dictionary returns an ordered list of keys. If you want to keep the pair key:value, you can use the view object .items():

>>> sorted(doors.items())
[(2, 'blue'), (19, 'purple'), (31, 'green'), (45, 'yellow'), (223, 'red')]

And, as you might expect, sorting the values() of the dictionary, returns a list of alphabetically ordered values:

>>> sorted(doors.values())
['blue', 'green', 'purple', 'red', 'yellow']

What if you want to order the whole dictionary (pair keys:values) based on the values? Let’s throw lambda in the mix:

>>> sorted(doors.items(), key=lambda x:x[1])
[(2, 'blue'), (31, 'green'), (19, 'purple'), (223, 'red'), (45, 'yellow')]
>>> doors
{19: 'purple', 223: 'red', 2: 'blue', 31: 'green', 45: 'yellow'}

lambda can also get you out of trouble if the data structure is more complex:

>>> data = [{"number":19, "colour":'purple'},
... {"number":223, "colour":'red'},
... {"number":2, "colour":'blue'},
... {"number":31, "colour":'green'},
... {"number":45, "colour":'yellow'}
... ]
>>> sorted(data, key=lambda x: x['colour'])
[{'number': 2, 'colour': 'blue'},
{'number': 31, 'colour': 'green'},
{'number': 19, 'colour': 'purple'},
{'number': 223, 'colour': 'red'},
{'number': 45, 'colour': 'yellow'}]

Final Thoughts

.sort() and sorted() behave in the same way, the only differences being .sort() only works with lists , and replacing in situ.

As long as the information held in the iterables/lists is comparable, these methods will work wonders!

Keep in mind that unordered collections will remain unordered (technically). Even if your dictionary is alphabetically ordered, it won’t remember in which order the values were inserted. A work around would be to use OrderedDict from the collections library:

>>> from collections import OrderedDict>>> OrderedDict(sorted(doors.items(), key=lambda x:x[1]))
OrderedDict([(2, 'blue'),
(31, 'green'),
(19, 'purple'),
(223, 'red'),
(45, 'yellow')])

Useful Links

Sorted() - Python docs
list.sort() - Python docs
Sorting - HOW TO - Python docs
functools.cmp_to_key - Python docs
collections.OrderedDict() - Python docs