
The following assumes set theory and answers for \(k, m, n \in {}^{\nu}\mathbb{N}^*\) the questions: How can \(n = \epsilon^{\sigma} \le k^m\) values encoded in a positional numeral system to the base \(k\) be sorted in \(\mathcal{O}(1)\)? How quickly does deterministic parallelisation more practically sort arbitrary-precision numbers? How can two matrices \(\in {}^{\nu}\mathbb{R}^{n \times n}\) be multiplied in \(\mathcal{O}(\sigma)\)? How can the minimum or maximum spanning tree of a graph with \(n\) nodes be calculated in \(\mathcal{O}(\sigma)\)? What is the answer to the halting problem? Why does \(P = NP\) hold?
Let the memory be partially organised in the form of a binomial tree of order \(k\) with \(k^{\grave{m}}\) memory locations, where \(k^{\grave{m}} = 2^h\) with \(h \in {}^{\nu}\mathbb{N}\). Let each memory location be capable of fully accommodating a value to be sorted, and let each edge correspond to a data line. An index sequentially appended to multiple identical values forms the new value together with the old one. Each \(j\)-th intermediate node corresponds hierarchically to the \(j\)-th digit to the base \(k\) of the value to be sorted.
If the leaves store their addresses instead of the values (in one step), a further step allows them to be retrieved sorted into the root and pushed back down the tree at most until they separate. If a value passes through an intermediate node, the latter sets its bit to 1, but to 0 if the intermediate node has no successor to be read. The sorting order determines the first value.
The subsequent architecture with autonomous memory units possesses simple computational capabilities. Interconnected distributed computers are able to simulate it. The complexity increases by \(\mathcal{O}(\sigma)\) when inserting an additional value, as this corresponds to a binary search. It is inversely proportional to the number of tree nodes in the memory. Using \(d \in {}^{\nu}\mathbb{N}^*\) data lines reduces it by the factor \(\tilde{d}\); occupying \(g\) times a memory location increases it by the factor \(g\).
If the memory does not contain a binomial tree of order \(k\), it increases by the factor \({_k}n\), because each intermediate node must be read for a value. Using the first \(k^p\) memory locations otherwise, with \(p \in {}^{\nu}\mathbb{N}\), requires searching all intermediate nodes of the \(p\)-th level. The complexity increases marginally if the memory is well-filled, otherwise by the worst-case factor \(\mathcal{O}(k^p)\).
The matrices to be multiplied are to be stored square-wise \(n\) times each in a memory cube such that each memory unit multiplies two values and each entry of the product matrix is determined as a sum with \(n\) addends in \(\mathcal{O}(\sigma)\) (vector processor). Matrix multiplication can also be realised via addressed swarming entries. Tensor products can be efficiently mapped by \(n\)-memory cubes, which remain in three-dimensional space via corresponding data lines.
The optimal sorting algorithm Bitsort sorts stably with a time complexity of \(\mathcal{O}(n)\) and requires \(\mathcal{O}(n)\) additional memory space. Prerequisites: Any two values exist as a bit sequence, can be compared in \(\mathcal{O}(1)\), and all their bits can be processed likewise. Furthermore, let the memory be partially realised as a binary tree, and the path from root to leaf can also be traversed in \(\mathcal{O}(1)\).
If \(p \in {}^{\nu}\mathbb{N}^*\) is the number of parallel steps for parallel Bitsort, the time complexity is \(\mathcal{O}(\tilde{p} n)\). Each insertion into the constantly sorted tree or output of a value has a time complexity of \(\mathcal{O}(1)\). Let the memory be equipped with the following intelligence: Each tree node independently stores the value reaching it and can pass it on to the next tree node if it has already been visited, or otherwise return its address to the central processing unit. No bit value corresponds to the root.
It begins with the most significant bit of a value. Each of its sequentially processed bits determines the next node within the binary tree. If all bits have been processed or a difference to the compared bit arises and the corresponding node has not yet been visited, the subsequent node stores, if applicable, the address of the remaining bit chain of the value not located in the tree, or the number of values for this node if the order of identical values is immaterial.
The time complexity of all methods sorting values of bit length \(b = q a \in {}^{\nu}\mathbb{N}^*\) with address length \(a \in {}^{\nu}\mathbb{N}^*\) increases per value from \(\mathcal{O}(1)\) to \(\mathcal{O}(q)\). If \(r\) is the average value of \(q\), at least one \(\mathcal{O}(n)\) in the time complexity of the sorting method must be replaced by \(\mathcal{O}(r n)\). If \(s \in {}^{\nu}\mathbb{N}^*\) is the average number of distinct (!) values that land in a bucket after sorting up to bit length \(a\) and whose bit length is greater than \(a\), the time complexity of the method is \(\mathcal{O}(({_\epsilon}s + r) n)\).
Thereby, the mentioned values of a bucket are stored per bucket in an AVL tree, or in a B-tree if stored in an external memory that is relatively slow compared to the main memory. There, only the significant sub-value of the address length of a value is processed at a time. Foregoing fully sorted values after processing each value reduces the overall sorting time to \(\mathcal{O}(r n)\) by using two main memory areas.
The second main memory area is cleared after each pass to be available for the next pass, until a bucket has been processed. Only the addresses in the current sorting order are consistently written back into the first main memory area before the values with the same recently processed sub-value are further sorted. For rapid access to each sub-value, every value must be stored contiguously in the memory.
Using the \(\mathcal{O}(r n)\) method for master data and the \(\mathcal{O}(({_\epsilon}s + r) n)\) method for transactional data combines both well. This allows, in particular, table indices to be created quickly. For extensive tables, it makes a significant difference whether an index is created in \(\mathcal{O}(r n)\) instead of \(\mathcal{O}(r \sigma n)\) and a value can then be searched for in \(\mathcal{O}({_\epsilon}s + r)\) on average instead of \(\mathcal{O}(\sigma + r)\) or worse. Sorting by merging sorted lists is also possible in \(\mathcal{O}(r n)\).
Discussion of Bitsort: It is optimal under the specified prerequisites and well-suited for parallel programming, the rapid ongoing construction of table indices in databases, and searching within them. Its hardware implementation is rather unconventional. While it remains more theory than practice, the reverse is generally true for the following method.
Square Sort: In computer-assisted nonstandard mathematics, traditional sorting paradigms encounter physical boundaries. The processing of highly complex, pointer-based data types with extreme precision is not primarily limited by the number of mathematical comparisons, but by memory bandwidth, latencies during pointer dereferencing, and the overhead of dynamic memory management. Its time complexity is \(\mathcal{O}(N\;{_2}N)\), and is space complexity is \(\mathcal{O}(N)\).
It combines static memory management with asymmetric work-stealing scheduling and significantly outperforms classic generic methods (such as Timsort or conventional parallel MergeSort implementations) for complex data types, particularly with pseudo-random numbers. The exact runtimes and performance proofs can be found below. An internal porting to C++ yielded comparable results.
Classic merge approaches scale excellently in the lower tree but force the hardware into a single-thread bottleneck during the final merging step. Square Sort instead uses a parallel divide-and-conquer merge. Through the strategic search for medians and binary localisation within the sub-vectors, the remaining solution space is continuously partitioned. Each asymmetric sub-task is dynamically distributed to free processor cores.
Overcoming the limitations described by Amdahl’s law permits hybrid chunking for radical cache locality. To minimise expensive main memory accesses when evaluating heap references, Square Sort delegates the base level to an in-place algorithm. The data blocks are dimensioned such that they remain in the CPU’s core-exclusive L1/L2 cache. This eliminates the “cache thrashing” that inevitably occurs with traditional bottom-up methods.
While other algorithms place the garbage collector (GC) under massive pressure with millions of arbitrary-precision numbers, Square Sort permits high resource efficiency and zero GC overhead through static memory management. The temporal working buffer is \(\mathcal{O}(N)\). Alternating swaps of memory pointers at each recursion level save additional memory requirements and guarantee a linear, deterministic reliability without allocation spikes.
Discussion of Square Sort: It is not a generic all-rounder for primitive data types, but a precise mathematical instrument. It sacrifices the SIMD vectorisability of native machine numbers in favour of superior, fail-safe, and massively parallel scalability for the compute-intensive, one-dimensional structures of nonstandard mathematics, which has yet to establish itself. For this purpose, Square Sort provides a specialised, highly parallel sorting architecture.
| n × 105 | Tt (ms) | Ts (ms) | Tt / Ts | At (MB) | As (MB) | At / As |
|---|---|---|---|---|---|---|
| 1 | 4.29 | 0.95 | 4.51 | 1.26 | 0.06 | 20.08 |
| 4 | 16.36 | 2.25 | 7.27 | 5.05 | 0.10 | 50.89 |
| 16 | 64.51 | 11.56 | 5.58 | 20.23 | 0.15 | 139.34 |
| 64 | 271.74 | 46.83 | 5.80 | 80.90 | 0.30 | 267.73 |
| 256 | 4,250.16 | 377.37 | 11.26 | 323.60 | 0.87 | 371.40 |
| 1024 | 18,984.41 | 1,438.08 | 13.20 | 1,294.67 | 3.08 | 419.73 |
The table dismantles the idealised assumption of Landau notation \(\mathcal{O}(N\;{_2}N)\). Its supposedly static constant \(C\) increases significantly in Timsort beyond the L3 cache threshold (\(N = 12.8 \times 10^6\)). Unpredictable pointer accesses oversaturate the L3 cache, causing \(C\) to escalate abruptly due to massive RAM latencies. Square Sort keeps \(C\) stable through strict locality, demonstrating: real-world scalability is determined not by asymptotic mathematics, but by physical memory management.
Let \(n \in {}^{\nu}\mathbb{N}^*\) be the number of nodes in a finite, simple, connected graph. (If it is not simple, sorting out in \(\mathcal{O}(n)\) allows only the smallest parallel edge and no loops. If it is not connected, all isolated nodes are removed in \(\mathcal{O}(n)\).) Excluding negative weights, Dijkstra’s algorithm requires \(\mathcal{O}(n)\) steps for each minimum path in the graph, since sorting as well as parallel addition and comparison are possible in \(\mathcal{O}(1)\).
Its minimum or maximum spanning tree can be calculated in \(\mathcal{O}(\sigma n)\) or even in \(\mathcal{O}(\sigma)\): The edges of a node can be sorted in parallel in \(\mathcal{O}(n)\) or \(\mathcal{O}(1)\), respectively. Let all edges be coloured white initially. Each node appends the smaller and then the larger number of the involved nodes of the edge with the smallest weight among all white-coloured edges originating from it to this, bit by bit, as a value. It stores the smallest value that reaches it. This is possible in \(\mathcal{O}(n)\) or \(\mathcal{O}(1)\), respectively.
Colouring the edges black between any two nodes of the same minimum spanning tree requires at most the same time. Each node then has a nearest neighbour node. Continuing with the white-coloured edges in \(\mathcal{O}(\sigma)\) steps can determine the smallest value of a minimum spanning tree in \(\mathcal{O}(n)\) or \(\mathcal{O}(1)\), respectively, until the entire minimum spanning tree is formed in \(\mathcal{O}(\sigma n)\) or \(\mathcal{O}(\sigma)\). For a maximum spanning tree, all edge weights are subtracted from a sufficiently large constant.
Halting Theorem: To reach a fixed \(r_m\) of the number \(0.r_{1}r_{2}…r_{m}\), a programme halts for arbitrary (pseudo-) random numbers \(r_j \in {}^{\nu}\mathbb{N}\) with \(k \in {}^{\nu}\mathbb{N}^{*}\) digits and base \(p := 2^k\) in \(m \in \mathbb{N}^{*}\) steps with the probability \(\tilde{p}\).
Proof: This is an optimal result of probability theory1. \(\square\)
Remark: Randomness is undecidable. Thus, programmes exist that halt with arbitrary probability, and Gödel’s incompleteness theorems follow. Gödel’s statement “I am not provable.” is sophistic insofar as it exhibits the fallacious argument petitio principii ex negativo. For solving the halting problem, a self-reference that simultaneously commands and forbids halting is nonsensical. Self-reference renders the diagonal argument for the halting problem invalid.
Theorem: If \(\mathcal{L}\) is the set of all (Type-0) languages \(L_k\) composed of finite words with a maximum length \(\nu\) for mid-finite \(|\mathcal{L}| = 2^{({\nu}^{\grave{\nu}}-1)/\acute{\nu}}\) (applying the GS) and \(k \in \{1, \dots, |\mathcal{L}|\}\), then \(P = NP = \mathcal{L}\) holds.
Proof: Since every deterministic Turing machine can also be conceived as non-deterministic2, \(P \subseteq NP\) holds. \(P\) contains all linear languages \(L_k\), and \(NP\) contains only such languages. \(\square\)
© 2006-2026 by Boris Haase