Why are blocks on Bluesky public?#
(...)
Technical approaches we’ve considered for private blocks#
One proposed mechanism to make blocks less public on Bluesky is the use
of
bloom filters. The basic idea is to encode block relationships in a statistical data
structure, and to distribute that data structure instead of the set of
actual blocks. The data structure would make it easy to check if there
was a block relationship between two specific accounts, but not make it
easy to list all of the blocks. Other servers and clients in the network
would then use the data structure to enforce the blocking behaviors.
they go on to list some drawbacks to this approach, such as:
-
bloom filter blocks would incur a lot of CPU, bandwidth, and storage
overhead
- mitigations for the above incur latency & reliability issues
-
it doesn't actually solve the problem (blocks are still
enumerable on a per-account basis. lol. lmao)
what baffles me is that there are
significantly worse problems than these that go completely
unmentioned! I'll list some below (note that I assume some familiarity
with
the data structure in question):
-
bloom filters incur false positives - this is the
first thing you learn about them in educational contexts! in
this application, false positives means you block one account and later
find that you've apparently blocked a random subset of other accounts,
too. not ideal!
-
traditional bloom filters have no delete operation,
meaning you just can't unblock people. modifications that
do permit deletes (like
counting bloom filters) unavoidably cause false negatives as well:
-
in the above scenario, imagine that you find a random account that
got blocked as a false-positive. unblocking that account means
subtracting the same hash that collided to block them in the first
place, unblocking the person you intended to block as well.
-
once you've blocked multiple accounts, you can start to experience
false positives on accounts whose hashes only partially collided
with any given account that you meant to block. the bloom filter
doesn't know this, so it allows you to unblock the false positive
anyway, breaking up to 100% of your intended blocks whose hashes
partially collided.
-
since federated blocks would require instances to agree on a hashing
algorithm, collisions can be brute-forced relatively
easily. an attacker could choose a victim, identify a potential
collision (or even multiple partial collisions that overlap to
create a FP, which would be much faster), then create account(s) that
bait out blocks from the entire userbase, or maybe just the victim's
network. anyone who blocks the bait account(s) inadvertently blocks the
victim as well!
-
this could be made more difficult to execute by hashing features
that the user doesn't control, but I think federation still means
that a malicious instance could facilitate this attack.
-
splitting up the bloom filter by instance would rapidly increase the
storage costs, and since bloom filters can't really be resized once
non-empty, this mitigation would scale awfully.
all of the above can be made less likely by increasing the size of the
filter, but this exacerbates Bluesky's stated problems with the concept,
namely overhead costs. it's just a terrible idea all around. it's unlikely
that they'll ever take this idea past the blog post, but the fact that
they would publish this without even addressing these failure modes
dramatically lowers my faith in their technical understanding of
distributed systems (one of the few areas where I thought they had an
advantage in the field).