Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Update:

I updated the post based on the conversation below, I wholly missed an important callout about performance, and wasn't super clear that you do need to wait for the completion record to be written before responding to the client. That was implicitly mentioned by writing the completion record coming before responding, but I made it clearer to avoid confusion.

Also the dual WAL approach is worse for latency, unless you can amortize the double write over multiple async writes, so the cost paid amortizes across the batch, but when batch size is closer to 1, the cost is higher.



From the update added to the post:

> This is tracked through io_uring's completion queue - we only send a success response after receiving confirmation that the completion record has been persisted to stable storage.

Which completion queue event(s) are you examining here? I ask because the way this is worded makes it sound like you're waiting solely for the completion queue event for the _write_ to the "completion wal".

Doing that (waiting only on the "completion wal" write CQE)

1. doesn't ensure that the "intent wal" has been written (because it's a different io_uring and a different submission queue event used to do the "intent wal" write from the "completion wal" write), and

2. doesn't indicate the "intent wal" data or the "completion wal" data has made it to durable storage (one needs fsync for that, the completion queue events for writes don't make that promise. The CQE for an fsync opcode would indicate that data has made it to durable storage if the fsync has the right ordering wrt the writes and refers to the appropriate fd and data ranges. Alternatively, there are some flags that have the effect of implying an fsync following a write that could be used, but those aren't mentioned)


How can you know that the completion record is written to disk?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: