In the following the optimal sort algorithm bitsort is presented. It sorts stable with at all times equal time complexity O(n) and needs O(n) additional storage space. Here is assumed that the comparison of two values needs the time complexity O(1), the values are present as bit sequence and all bits of a value can be also processed in O(1). It is furthermore presupposed that the storage is realised as binary tree and the path from root to leaf can be run through also in O(1). If p is the number of parallel steps for parallel bitsort, so the time complexity is O(n/p).

Once a value has been processed, the values inserted so far are available sorted. Inserting an additional value has the time complexity O(1). After each insertion, the m values can be output with time complexity O(m). The storage must be provided with a certain own intelligence: Each storage node must be able to store the value that reaches it independently and to pass it to the next storage node, if it was already visited, resp. to give it back to the central unit, if it was still not visited, after it stored the value. The next storage node at each time results from the corresponding bit of the value within the storage hierarchy.

The bits of a value are at each time - beginning with the leading bit - processed as long until a difference to the previous value or equality is certain. Every value is inserted as leaf into a binary tree. For equality, a leaf stores either the number of the values for this leaf, the address of a list that contains the addresses for this value, or, if the bits of the value were still not processed completely, the value itself or the bit rest. The bit chain is always then broken up at the position where a leaf is built. Very many bits of the same value can be stored with their number. This is advisable, for example, for floating point numbers in exponential representation. The lists can be summarised to one list, if every entry refers to its successor (predecessor) of the same value resp. to null, if there is none. The inner nodes store the value that passed them at last, resp. the common bit chain of two values below the node from that these have branched.

Discussion of bitsort: It is well suited for parallel programming and stable. It is optimal under the presuppositions made. One should consider carefully whether an adaptation of the hardware pays off to achieve the described time complexity. Bitsort does only what is absolutely required, it is optimal for the rapid ongoing building of indices in (relatively fast growing) databases and for searching. Search trees need not to be balanced. It beats for values with relatively few bits (maximally address length) and corresponding hardware all sorting procedures known to me, if the values have to be read successively.

© 21.03.2010 by Boris Haase

• disclaimer • mail@boris-haase.de • pdf-version • bibliography • subjects • definitions • statistics • php-code • rss-feed • top