_______ __ _______ | | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | || _ || __|| < | -__|| _| | || -__|| | | ||__ --| |___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| on Gopher (inofficial) URI Visit Hacker News on the Web COMMENT PAGE FOR: URI Writing a storage engine for Postgres: An in-memory table access method (2023) rubenvanwyk wrote 8 hours 0 min ago: Iâve always wondered why OLTP databases didnât go the route of tiered storage systems: save to memory, cache to NVME, save permanently to object storage, with different levels of guarantees for each level. bittermandel wrote 35 min ago: Neon does a variant of this. The WAL goes through a Paxos consensus directly on NVMe, which then is transformed to page files and stored in Object Storage branko_d wrote 5 hours 30 min ago: Probably because of the "D" in ACID transactions, so the transaction log cannot be meaningfully write-cached. OTOH, writing to tables/indexes is already done "out of order" and aggressively cached in the buffer pool, and flushed to permanent storage only occasionally (and relatively rarely, e.g. SQL Server does it approximately once a minute). whizzter wrote 5 hours 44 min ago: Even if they wanted to try something like that, it many cases it'd probably require a fair bit of code-restructuring so ideas aren't tried willy-nilly. PostgreSQL is great in that they've put serious engineering effort into things like SQL standard,reliability,etc , but one thing that's frankly quite horrid in 2025 is that their reliance on a fork-model for processing has left them with many _important global variables_ that needs a fair bit of refactoring to take out (the fork-model does give some stability perks since the code is written in C, so it's not an entirely horrible choice). hans_castorp wrote 6 hours 35 min ago: Oracle's "flash cache" was that, but that was mainly intended to mitigate performance of spinning hard disks. Not sure if that is still a thing though. If I'm not mistaken, then Oracle's Exadata puts "intelligence" into the storage nodes, so they can evaluate WHERE conditions independently, so they seem to take the role of a compute node as well, not only storage. I don't know if they are capable of evaluating other operations there as well (e.g. aggregations or joins) tanelpoder wrote 5 hours 19 min ago: Google's (Postgres-compatible) AlloyDB Omni also has similar functionality now - the main DB action, persistence, etc still has to land on persistent block storage, but additional data can be cached for reading on local NVMe disks. Oracle's Exadata is a whole another beast (I co-authored a book about it back in 2010 and it has improved even further since then). It's a hybrid, shared storage distributed system - not consensus-based replication (although they support RAFT for global data systems now too), but a distributed, coherent buffer cache (global cache) based database system. As it's shared storage, you can write copies of blocks, WAL to multiple separate storage cells (NVMe or even just remote RAM) via direct RDMA operations, without OS kernel or system calls involved. For analytic queries, yep Oracle can push down filtering, column projection, many aggregations and join filters (bloom filters) for early filtering into the storage cells. The bloom filters are used for early filtering of the next table in the join, based on the output of the previous query plan nodes so far. hardwaresofton wrote 7 hours 16 min ago: See: URI [1]: https://github.com/neondatabase/neon beoberha wrote 7 hours 44 min ago: This is what SQL Server Hyperscale does. Iâd assume Aurora does something similar too o11c wrote 9 hours 7 min ago: (2023), but this still seems to be the only real documentation on the internet. For reference, the (very minimal!) official docs: URI [1]: https://www.postgresql.org/docs/current/tableam.html DIR <- back to front page