_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Writing a storage engine for Postgres: An in-memory table access method (2023)
       
       
        rubenvanwyk wrote 8 hours 0 min ago:
        I’ve always wondered why OLTP databases didn’t go the route of
        tiered storage systems: save to memory, cache to NVME, save permanently
        to object storage, with different levels of guarantees for each level.
       
          bittermandel wrote 35 min ago:
          Neon does a variant of this. The WAL goes through a Paxos consensus
          directly on NVMe, which then is transformed to page files and stored
          in Object Storage
       
          branko_d wrote 5 hours 30 min ago:
          Probably because of the "D" in ACID transactions, so the transaction
          log cannot be meaningfully write-cached.
          
          OTOH, writing to tables/indexes is already done "out of order" and
          aggressively cached in the buffer pool, and flushed to permanent
          storage only occasionally (and relatively rarely, e.g. SQL Server
          does it approximately once a minute).
       
          whizzter wrote 5 hours 44 min ago:
          Even if they wanted to try something like that, it many cases it'd
          probably require a fair bit of code-restructuring so ideas aren't
          tried willy-nilly.
          
          PostgreSQL is great in that they've put serious engineering effort
          into things like SQL standard,reliability,etc , but one thing that's
          frankly quite horrid in 2025 is that their reliance on a fork-model
          for processing has left them with many _important global variables_
          that needs a fair bit of refactoring to take out (the fork-model does
          give some stability perks since the code is written in C, so it's not
          an entirely horrible choice).
       
          hans_castorp wrote 6 hours 35 min ago:
          Oracle's "flash cache" was that, but that was mainly intended to
          mitigate performance of spinning hard disks. Not sure if that is
          still a thing though.
          
          If I'm not mistaken, then Oracle's Exadata puts "intelligence" into
          the storage nodes, so they can evaluate WHERE conditions
          independently, so they seem to take the role of a compute node as
          well, not only storage. I don't know if they are capable of
          evaluating other operations there as well (e.g. aggregations or
          joins)
       
            tanelpoder wrote 5 hours 19 min ago:
            Google's (Postgres-compatible) AlloyDB Omni also has similar
            functionality now - the main DB action, persistence, etc still has
            to land on persistent block storage, but additional data can be
            cached for reading on local NVMe disks.
            
            Oracle's Exadata is a whole another beast (I co-authored a book
            about it back in 2010 and it has improved even further since then).
            It's a hybrid, shared storage distributed system - not
            consensus-based replication (although they support RAFT for global
            data systems now too), but a distributed, coherent buffer cache
            (global cache) based database system. As it's shared storage, you
            can write copies of blocks, WAL to multiple separate storage cells
            (NVMe or even just remote RAM) via direct RDMA operations, without
            OS kernel or system calls involved.
            
            For analytic queries, yep Oracle can push down filtering, column
            projection, many aggregations and join filters (bloom filters) for
            early filtering into the storage cells. The bloom filters are used
            for early filtering of the next table in the join, based on the
            output of the previous query plan nodes so far.
       
          hardwaresofton wrote 7 hours 16 min ago:
          See:
          
   URI    [1]: https://github.com/neondatabase/neon
       
          beoberha wrote 7 hours 44 min ago:
          This is what SQL Server Hyperscale does. I’d assume Aurora does
          something similar too
       
        o11c wrote 9 hours 7 min ago:
        (2023), but this still seems to be the only real documentation on the
        internet.
        
        For reference, the (very minimal!) official docs:
        
   URI  [1]: https://www.postgresql.org/docs/current/tableam.html
       
       
   DIR <- back to front page