What were the limitations that required you to move all customers to a shared sy...

garrettf · on Oct 6, 2021

Good question, that was an option. The main motivating factor here was that vacuums were beginning to take dangerously long. O(weeks) to complete, independent of the load on the database. While migrating spaces in segments would have reduced the number of records future vacuums need to scan, we were already running against the clock to complete one vacuum prior to TXID wraparound[0]. To kick off replication for specific spaces we would have needed to write our shard key to all data owned by those spaces. That would further contribute to TXID growth, and was not something we were comfortable doing.

At the end of the day, this is something we could have explored in more depth, but we were ultimately comfortable with the risk tradeoff of migrating all users at once vs. the consequences of depending on the monolith for longer, largely thanks to the effort we put into validating our migration strategy.

[0] https://blog.sentry.io/2015/07/23/transaction-id-wraparound-...

nicoburns · on Oct 7, 2021

Seems like something that might still be worth exploring, as if I’m thinking about this correctly, it would allow you to create new shards on the fly, and to migrate workspaces between shards while only locking one workspace at a time, and only for the amount of time required to catch up that single workspace.