Revisiting popcount on Postgres 14
Tl;dr: Skip to benchmark results.
With PostgreSQL version 14, the database management system introduced its own built-in SQL function for population count, called
bit_count. That was not the case in 2018, when I was conducting some research based on bitwise encodings. MySQL did have a
BIT_COUNT function limited to 64 bits, which did not suffice for my purposes, so I implemented my own extension based on common algorithms for the problem. This blog post revisits said implementation and draws a performance comparison to the built-in function. This is all for curiosity, as I don’t have any real need anymore, meaning this post will not go into details.
What is popcount#
Population count (short popcount), or sometimes also called bit count, is a function that returns the number of bits which are set to one for a given bit sequence. A simple example:
SELECT bit_count(5::bit(512)); # returns 2
This SQL command casts the decimal number 5 to a bit sequence of length 512 and counts the number of bits set to one. The number five corresponds to
101 in binary.
In combination with binary operations, popcount allows to extract a lot of valuable information from bitwise encoded data. Let’s say your standard user table contains a column containing which languages that each user speaks in bit-wise encoding. Counting the number of users that speak two specific languages can be achieved with one binary AND operation and a popcount. Suddenly otherwise very costly queries becomes easily reachable and performant.
Encoding data in bit sequences is rather uncommon, however, and requires some extra effort. This is definitely trading verbosity and convenience for performance. My research was specifically for scaling a matching engine to process high volumes of candidates on the fly, as a matching profile is being specified. In the end, processing 100'000 candidates added ~250ms to requests, with linear growth.
The implementation details of the new
bit_count function can be found in the PostgreSQL git repository here (or Github Mirror). It reveals that on the first call the most efficient and available implementation is detected and all future calls are rerouted directly thereafter. Interestingly, the intrinsic function for population count found in Visual Studio compilers takes precedent over using the hardware instruction
POPCNT in assembly directly. Without hardware support, the compiler’s intrinsic
__builtin_popcountl function is checked. As a last resort, an 8 bit lookup table is used.
My own implementation additionally provides a Hamming Weight approach, which sits somewhere between the lookup table and the intrinsic functions performance-wise. However, it only explores the assembly instruction directly as part of an 256bit unrolled experiment. As such, I’d expect that the new built-in function will provide comparable or slightly better performance to my own extension. Let’s see!
My benchmarking results from 2018 are as follows:
The take-aways were that generally all provided implementations showcased linear time as the column length increased and that something seemed to happen at 2^14, for which i could not find an explanation. Details can be found in the extension’s git repository. Let’s see how the numbers stack up today, in 2023.
Seeding the benchmark database#
To conduct the new benchmarks, a database had to be generated and seeded with enough data. There still seems to be no built-in way in PostgreSQL to generate random values for bit types at the lengths that this benchmark requires. In general, it would be cumbersome to input bit-data directly. One way around that is to use compression. Using OpenSSL, it’s possible to generate random values in hex format:
$ openssl rand -hex 8 # 8*8=64 bits 9e7f08f389cb4466
The hex format can be cast to the target bit format:
INSERT INTO table_name (column_name) VALUES (x'9e7f08f389cb4466'::bit(64));
Packaging above in a command line tool provides an convenient (albeit slow) way for seeding large databases with randomised bitwise encodings. At larger column lengths, such as 2^17, you might find that string/pipe lengths induce a limitation on how many inserts can be done in batch operations.
New benchmark results#
The test bench was running an Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (4 Hyper-Threads) on Ubuntu 22.04 and PostgreSQL 14.7.
Above is a log-log-plot of the six popcount algorithms, as well as the new built-in
bit_count function, over a constant sample size of
500'000, while increasing the column length (bit alignment) by powers of two.
The results confirm my expectations, however there are two interesting observations:
- Significantly reduced jump in runtime after lengths of 2^14 for the new implementation
- Slight degradation in performance after 2^16 for all algorithms, not seen in 2018
It would be very interesting to talk to someone closer to postgres development, to get an explanation for the sudden increased runtime around 2^16.
Nice to see built-in support for population count in PostgreSQL, allowing to sunset my own extension. Its use is therefore only recommended for postgres versions <14, when really necessary.