_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Cache warming at Netflix: Leveraging EBS for moving petabytes of data
       
       
        Naac wrote 2 days ago:
        So reading this article am I understanding correctly that multiple VMs
        are mounting the same EBS volumes, and some write while others read?
        
        What's the underlying technology here? If EBS is implemented over iSCSI
        AFAIK this is not supported currently available to do in most distros.
       
        throwdbaaway wrote 2 days ago:
        It is interesting to see that they could move 76.8GB of data in 1 hour
        per-instance with the old architecture, while I can move 120GB of data
        in 1 hour locally between 2 laptops over wifi.
       
          sneak wrote 2 days ago:
          The article mentions that this is due to S3 throttling.
          
          An additional issue is that the same interface is used for serving
          the data out to the cache customers as is used for fetching from S3,
          (serving hundreds of gigs to hundreds/thousands of clients from ram)
          which is something your laptop also probably isn't doing.
          
          S3 is big and reliable. It isn't fast or cheap.
       
            throwdbaaway wrote 2 days ago:
            I think it is fair to expect that even with the interface actively
            serving traffic, the data transfer over s3 should still be faster
            than the data transfer over wifi. Someone should take a closer look
            at that S3 throttling?
       
        quiffledwerg wrote 2 days ago:
        Surely Netflix pays only a small fraction of list price for AWS.
        
        I wonder how that reduced price might influence the technical advice
        from Netflix.
       
          WatchDog wrote 2 days ago:
          I imagine most medium to large enterprises using AWS have some
          discount, especially if you are operating at the kinds of scale that
          this article deals in.
       
          quiffledwerg wrote 2 days ago:
          Further to this …. following the technical advice of Netflix on AWS
          may well bankrupt you given how low Netflix pays for AWS.
          
          I wouldn’t be racing to emulate the way they do things.
       
            maltalex wrote 2 days ago:
            > I wouldn’t be racing to emulate the way they do things.
            
            Netflix or not, emulating how someone else conducts their business
            without a firm understanding of the underlying reasons is often a
            bad idea.
       
        Loic wrote 2 days ago:
        Nearly off-topic remark. The pandemic pushed our family to get a
        Netflix account. Yesterday, I did a small review of my router logs.
        Before Netflix we had an average 60GB download traffic per month, since
        Netflix we are at around 400GB per month.
        
        Even so (or because) I started my Internet life with a 14.4 kbps modem,
        went through all the upgrades and now have a 70 Mbits DSL line at home,
        manage servers all over the world, I was surprised by the incredible
        amount of data streaming is moving.
       
          ksec wrote 2 days ago:
          An extra 340GB per month. At 10Mbps that is roughly 75 hours of
          Netflix per month. And that is for the whole family. So really not
          that much.
          
          But I have been mentioning this for quite some time, we are
          fundamentally limited by our time spent to watch video. Which means
          our appetite for Data wont grow forever.
       
            nickysielicki wrote 2 days ago:
            Resolution and framerate requirements will only expand. If Facebook
            succeeds at the metaverse idea where everyone lives in VR after
            work, you’re going to have every member of a household consuming
            a serverside rendering of 360 degree extremely high resolution and
            high framerate    feed.
       
          ilogik wrote 2 days ago:
          just a quick note here, most of that traffic was probably served from
          your ISP's network and never reached AWS:
          
   URI    [1]: https://openconnect.netflix.com/en/
       
            treesknees wrote 2 days ago:
            And yet it still counts against my Xfinity monthly bandwidth cap.
            Pretty soon our wireline ISPs will be playing the same games we've
            seen in the mobile space for years, uncharging for data etc.
       
          bentsku wrote 2 days ago:
          I was also surprised, I was living in Finland for a while, and they
          often have unlimited data in their mobile phone plans there.
          We used an iPad to watch Netflix, while sharing my connection with my
          phone. Watched the data downloaded after 3 or 4 months, it was close
          to 2TB. It seems... unreal?
       
            bluedino wrote 2 days ago:
            3.2TB/month ends up being 24/7 at 10mbs
            
            How much Netflix were you watching?
       
              nstart wrote 2 days ago:
              I think the commenter you are replying to meant 2TB after 3 or 4
              months. If so, I’m guessing that sounds more reasonable
       
        netol wrote 2 days ago:
        Is this to cache huge static files such as videos encoded in multiple
        resolutions/formats?
       
          virtuallynathan wrote 2 days ago:
          No, this isn’t for video data, those tend to just be stored on S3.
          EVCache is like memcached for small data.
       
        WatchDog wrote 2 days ago:
        What file system works while being mounted on two different operating
        systems?
       
          throw0101a wrote 2 days ago:
          Solaris has (had?) QFS which allows for multi-host access:
          
          > Shared QFS adds a multi-writer global filesystem, allowing multiple
          machines to read from & write to the same disks concurrently through
          the use of multi-ported disks or a storage area network. (QFS also
          has a single-writer/multi-reader mode which can be used to share
          disks between hosts without the need for a network connection.)
          
          * [1] A few jobs ago I inherited a setup where    files/artefacts were
          uploaded to a file server but had to be shared with clients, but for
          'legal' reasons the internal host could not be exposed to the outside
          world—but (some of?) the data could be. So an external disk pack
          was purchased and one SAS port went to the internal machine and the
          other SAS port went to the external machine.
          
   URI    [1]: https://en.wikipedia.org/wiki/QFS
       
          fragmede wrote 2 days ago:
          Cluster filesystems are designed for this - Ceph, CXFS, Isilon,
          Gluster, I'm probably forgetting one or two more. (Notably, not ZFS.)
          It used to be a much rarer need, but with the advent of VMs all over
          the place, it's a need that the market hadn't seen fit to produce a
          quality cluster filesystem that also has Windows and macOS drivers,
          in addition to Linux & BSD.
       
          toast0 wrote 2 days ago:
          Not the one they're using, apparenrtly.
          
          > On the destination instance, the EBS volume is mounted with RO
          (Read-Only) permissions. But the caveat to this is — we do not use
          a clustered file system on top of EBS. So the destination side
          can’t see the changes made to EBS volume instantaneously. Instead,
          the Cache Populator instance will unmount and mount the EBS volume to
          see the latest changes that happened to the file system. This allows
          both writers and readers to work concurrently, thereby speeding up
          the entire warming process.
          
          In the before times, I had read about attaching a disk to two
          different scsi controllers (in different hosts) and you can also do
          that with fibre channel or extra fancy double ended sas drives. But
          that was almost always as a way to access the drive from one system
          at a time.
       
          shaicoleman wrote 2 days ago:
          I'm guessing any filesystem would work, when it's only writing on one
          server.
          
          "On the destination instance, the EBS volume is mounted with RO
          (Read-Only) permissions. But the caveat to this is — we do not use
          a clustered file system on top of EBS. So the destination side
          can’t see the changes made to EBS volume instantaneously. "
       
            mlyle wrote 2 days ago:
            Nah.  Because not every intermediate state of a filesystem is
            self-consistent, and because caching can "tear" updates.  So
            Amazon's multi-attach documentation says:
            
            > Standard file systems, such as XFS and EXT4, are not designed to
            be accessed simultaneously by multiple servers, such as EC2
            instances. Using Multi-Attach with a standard file system can
            result in data corruption or loss, so this is not safe for
            production workloads. You can use a clustered file system to ensure
            data resiliency and reliability for production workloads.
            
            > Multi-Attach enabled volumes do not support I/O fencing. I/O
            fencing protocols control write access in a shared storage
            environment to maintain data consistency. Your applications must
            provide write ordering for the attached instances to maintain data
            consistency.
            
            Or from the Netflix document:
            
            > It ignores files with partial file system metadata. Cache
            populators can see incomplete file system metadata due to the way
            it mounts the filesystem, with “No Recovery” option. It
            proceeds with complete files, leaving incomplete files to be
            processed by a subsequent iteration. This is explained further in
            step 7.
            
            AKA you get random read errors, etc, and need to cope.
            
            Behavior of something like this can be expected to be very kernel
            version and filesystem dependent, even with the most defensive
            application access strategy.
       
              profile53 wrote 1 day ago:
              > Nah. Because not every intermediate state of a filesystem is
              self-consistent, and because caching can "tear" updates. So
              Amazon's multi-attach documentation says:
              
              Pretty much every modern file system is tolerant to computer
              crashes via journaling. Mounting an in-use FS as read only is
              effectively the same as mounting a file system after a computer
              crash. Not self consistent, but close enough it can be repaired
              in memory (i.e. without touching the disk) via replaying the
              journal.
       
                mlyle wrote 1 day ago:
                > Pretty much every modern file system is tolerant to computer
                crashes via journaling. Mounting an in-use FS as read only is
                effectively the same as mounting a file system after a computer
                crash. Not self consistent, but close enough it can be repaired
                in memory (i.e. without touching the disk) via replaying the
                journal.
                
                Filesystems generally don't replay the journal to memory, but
                instead replay it to disk as part of going read-write. 
                Conventional replay of the journal would make these issues even
                worse.
       
                  profile53 wrote 1 day ago:
                  You're right. I knew ext4 is crash consistent even in a read
                  only mode, but after looking it up, ext4 will ignore the read
                  only mount request, replay the journal to disk, then mount in
                  read only mode. So, it doesn't just replay the journal to an
                  in memory structure when mounted read only.
       
                    mlyle wrote 1 day ago:
                    Yah, and replaying the journal while there are other
                    writers is dangerous: it can undo their changes.  So you
                    either make the device fully read-only and make your
                    filesystem cope with not being able to do the journal
                    applies.. or you use something like lvm where you can just
                    apply the journal to a snapshot / set of in-memory deltas.
                    
                    But it's worse than this and the above doesn't solve the
                    problems.  Say you've mounted already, the filesystem is
                    changing under you, and assume no cache for simplicity
                    (caching makes this even worse):
                    
                    For instance, say update A truncates a file and frees up
                    some storage.  update B allocates this free space to
                    another file.  Update C writes to it.
                    
                    A different system starts a read, and finds the metadata
                    for the first file.   Then it hesitates, and later reads
                    the data from update C in the wrong file.
                    
                    Most filesystems are not written with this use case in
                    mind-- mount readonly while someone else updates.  It's
                    something that I've tried doing with varying degrees of
                    success in the past (sometimes perfect, sometimes with
                    strangeness at the application level, sometimes with kernel
                    panics).
       
                profile53 wrote 1 day ago:
                Oh, I just realized I misunderstood your point. Because the
                data on disk gets updated while the in memory structures on the
                read only side don’t, you’ll get garbage and have to
                swallow random errors. I bet it only works in very specific
                write-once type environments like Netflix, because otherwise
                updating a file would lead to reading garbage on the other
                side.
       
              eloff wrote 2 days ago:
              It sounds like one would have to code very defensively to make
              that work safely.
       
              a-dub wrote 2 days ago:
              might be kinda fun to build a "ringbufferfs" that uses multi
              attach ebs volumes as slow shared memory for pushing bits around
              via the ebs side channel.
              
              probably cheaper and more reliable too than provisioning a full
              size volume and relying on implementation quirks in existing
              single host filesystems and ebs.
       
       
   DIR <- back to front page