_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Digital Archivists: Protecting Public Data from Erasure
       
       
        hsuduebc2 wrote 1 day ago:
        I wonder. Maybe for this would be blockchain actually usefull 
        technology?
       
          jefurii wrote 4 hours 57 min ago:
          git-annex is not exactly blockchain but because of the way it
          operates -- storing files by their hashes, the whole Git commit
          structure -- it gives you several useful things:  It becomes easy to
          clone repositories while guaranteeing that clones are identical.  It
          also becomes easy to ensure that files are not tampered with.
       
          badlibrarian wrote 20 hours 15 min ago:
           [1] Though given the space in general and some of the people
          involved it all should be audited very carefully.
          
   URI    [1]: https://blog.archive.org/2023/10/20/celebrating-1-petabyte-o...
       
        mikrl wrote 1 day ago:
        How does this relate to dox?
        
        Let’s say an individual posted identifying or incriminating
        information online, inadvertently or intentionally, in a public place.
        
        Then a third party decides to store it, and possibly make it accessible
        to others.
        
        If the original self doxxing user then pulled the original dox, but was
        unable to scrub the rest, would that information still be considered
        public, or would it be private? Was it ever truly public? Or private
        for that matter?
       
          ziddoap wrote 1 day ago:
          If you intentionally post something publicly, it's public. Full stop.
          
          The tricky part is dealing with inadvertent or malicious (i.e. some
          other party), posting of private information to a public space.
          That's really hard to deal with on multiple levels.
          
          For one, the archives would retain the information and scrubbing it
          is effectively impossible.
          
          Secondly, legitimate things which should remain public (i.e. were
          posted publicly, are of public interest, etc.) can be argued to have
          been inadvertently or maliciously posted. So you need some way to
          moderate and create rulings for each individual case, which quickly
          becomes untenable due to the sheer volume of information being posted
          and the inordinate amount of time required to investigate vs. post.
       
          calebio wrote 1 day ago:
          That's a really good question.
          
          In my head, I'm imagining someone early in the morning posting a
          flyer up on a bulletin board downtown.
          
          Throughout the day many folks walked by and took photos of the flyer
          with their cell phone.
          
          At the end of the day, the original person came back and removed the
          flyer.
          
          IMO, at the time that the folks took the photo of the flyer, that
          flyer was public information. It remains public information even
          after the flyer is removed[0].
          
          This isn't a great analogy of mine, and has plenty of holes, but was
          interesting to me after I read your comment.  I know it was in the
          context of doxxing, but I think it's pretty interesting
          philosophically.
          
          I think something similar applies to photos taken of other people in
          public spaces.    Both the person who took the photo and the subject of
          the photo are no longer in that physical public space, but the
          actions took place within that space.
          
          I think something similar applies to digital "public spaces".  But
          what does a public space even mean in the context of walled
          gardens[1], etc.
          
          [0] you then run into the question of what happens if someone posts
          non-public information, publicly?
          [1] are digital walled garden communities that different from
          physical communities that gate access, whether free or paid.  
          Whether information shared within those contexts are public or
          private is an interesting thread as well.
       
          sixothree wrote 1 day ago:
          Which data set are you thinking this might apply to?
       
        Damogran6 wrote 1 day ago:
        Hypothetically: 
        -Government leader says they're nuking data
        -Mad rush to back up data through other means
        -Government leader declares they've 'transferred the cost of
        maintaining data out of government, thus making for a smaller, more
        efficient, government'
        
        I hate everything about this.
       
          riku_iki wrote 1 day ago:
          In general it makes sense to shift this part to business, if data is
          valuable, there will be market and services. Probably problem is how
          fast they nuked without grace period.
       
            tehjoker wrote 22 hours 15 min ago:
            im okay with data being hosted for free or cheap by the government
            and not being price gouged for access to public data
       
              riku_iki wrote 21 hours 4 min ago:
              I think many people are very not Ok how government handles data:
              
   URI        [1]: https://news.ycombinator.com/item?id=43237352
       
                forgetfreeman wrote 20 hours 37 min ago:
                Are these same people proposing private industry would do a
                better job?
                
   URI          [1]: https://privacybee.com/blog/these-are-the-largest-data...
       
                  riku_iki wrote 20 hours 26 min ago:
                  Government is also regularly being hacked
       
                    tehjoker wrote 18 hours 30 min ago:
                    when was the last time we didn't hear about private
                    companies getting hacked lmao they're terrible!!
       
          krunck wrote 1 day ago:
          There is inherent inefficiency in government accountability efforts.
          I'm ok with that.
       
        dmillar wrote 1 day ago:
        Many criminal records, petty or otherwise, are public record. When
        archived, expunged or dismissed infractions never truly become that. A
        traffic violation or other petty misdemeanor from 20 years ago, that
        has been expunged from official record, can show up on a background
        check because companies archive public data. So, there is a flip side
        to this.
       
          InvOfSmallC wrote 17 hours 21 min ago:
          The fact that you get it out from your criminal record doesn't mean
          they get forgotten. Think about a paper writing about your crime.
          That will be public and archived forever.
       
          overfeed wrote 23 hours 18 min ago:
          Public data is incompatible with secrecy. Expunged records still
          appear in newspapers archives if the local reporter on the Crimes
          beat captured the proceedings. IMO, "expunged" means removed from
          Official court records - not from the public memory, including
          newspapers, archived websites, police blotters and prosecutors'
          files.
       
        nla wrote 1 day ago:
        Best thing I ever heard from the head of archives at the BBC:
        
        Once you format shift, you will always be format shifting.
        
        Keep your originals whenever you can.
       
          rippit wrote 8 hours 7 min ago:
          As someone who spent the last 2 days figuring out how best to
          digitise my father's old Hi8, Digital8 and MiniDV tapes, I take
          umbridge with this!
          
          Keep originals if you can, but make copies ASAP, as close to lossless
          as possible. Don't depend on the right hardware being around in the
          future.
       
          pjc50 wrote 10 hours 55 min ago:
          I can see the value in this, but .. originals, and the gear to read
          them, do not last forever. Plus for many formats the act of reading
          puts wear on the physical artifacts. So if you want to actually use
          the information, you have to format shift it to digital in the first
          place. And then you're back to the same question as the rest of us,
          how to maintain the bits.
       
          anitil wrote 18 hours 34 min ago:
          I don't understand this phrase, are you able to explain it?
       
            bell-cot wrote 14 hours 5 min ago:
            Guess:    If properly stored (physically), good-quality paper
            documents and photographs will last for centuries.  But as soon as
            you digitize them - you're now chained to the treadmill of
            maintaining/upgrading/migrating digital archiving systems. 
            Compared to keeping the old-fashioned Archive Storage Room dry (and
            fire-free), that's 100X the labor and expense.    Forever.
       
              wizzard0 wrote 10 hours 5 min ago:
              A lot of paper archives and libraries burned just recently in LA.
       
                bell-cot wrote 8 hours 22 min ago:
                True.
                
                But from fire-resistant storage cabinets, to concrete-lined
                file rooms, to underground archives, the tech to make archives
                ~99.5% fire-proof is more than a century old.  And if you add
                redundant storage sites for the high-value stuff...
                
                Vs. anything digital is far more vulnerable to digital malice.
       
        badlibrarian wrote 1 day ago:
        There's a lot of panic and overlap in the space; a way to coordinate
        these efforts would be helpful.
        
        Internet Archive et al. made noise and promises but told volunteers to
        stop because they couldn't actually handle the ingest. [1] These folks
        made a notable effort.
        
   URI  [1]: https://www.reddit.com/r/Archiveteam/comments/1jbgycm/us_gover...
   URI  [2]: https://webrecorder.net/blog/2025-03-25-govarchive-us-and-mirr...
       
        Teever wrote 1 day ago:
        I made this related submission[0] recently but it was flagged.
        
        This stuff is very important to talk about so I hope that this
        submission by rbanffy isn't also flagged.
        
        [0]
        
   URI  [1]: https://news.ycombinator.com/item?id=43543075
       
          donnachangstein wrote 1 day ago:
          No it isn't. It's merely a cause du jour for data hoarders to justify
          their hobby in light of this Chicken Little hysteria.
          
          30 years ago it was thought collecting every issue of magazines like
          TV Guide was important. No one even knows what that is anymore.
          
          No one is ever going to look at 99% of this data. In the meantime,
          send more hard drives for my NAS!!
       
            thowawatp302 wrote 22 hours 45 min ago:
            I’ve had the idea of recreating tv channels on my plex server by
            using tv guide data from the late 90s early 00s
            
            The insurmountable part of that project would be getting the guide
            data.
            
            You don’t know what other people will want in the future
       
              Teever wrote 21 hours 22 min ago:
              That's a great idea.
              
              There's are sites that stream old content with a old tube tv UI
              wrapped around the video frame but they don't have all the
              commercials and they don't follow the old schedules like you
              suggest.
              
              I've got a friend who has hoarded digitized copies of VHS
              recordings of old cartoons from that era complete with the
              commercials, so the content is definitely out there.
       
            squarefoot wrote 1 day ago:
            Among the deleted data there was the police accountability
            database. You probably won't have to deal with thugs now feeling
            omnipotent and immune from prosecution because of this.
            
   URI      [1]: https://www.police1.com/federal-law-enforcement/national-l...
       
              squarefoot wrote 14 hours 36 min ago:
              Typo that I can't correct anymore: that would be "won't want to
              deal".
       
            hermannj314 wrote 1 day ago:
            My wife takes thousands of photos every year, when my daughter was
            young she took even more.
            
            When we were moving out of our apartment there was damage to a door
            hinge that we never noticed when we moved in but that had
            definitely been there from the onset of our two years of living in
            that apartment.
            
            Guess what?  I had a photo from the day after we moved in of that
            door hinge in a state of damage!  Not because we took the photo for
            that intention, but because my daughter was playing in the hallway
            and my wife snapped a photo and it just happened to capture the
            damage.  Saved me several hundreds of dollars in repair costs from
            my landlord.
            
            You are right, 99% of the data will never be looked at.  But do you
            know what the 1% is today?  I'm guessing you don't.
       
              donnachangstein wrote 1 day ago:
              Your example of personal family photos is in no way comparable to
              storing terabytes of essentially unindexed data for which one has
              no detailed knowledge about, under the notion that the government
              is somehow lighting a match to everything, and they're going to
              save it.
              
              The government doesn't delete anything. It might be moved or
              inaccessible to the public but that data is somewhere in
              perpetuity.
              
              It's one of the most deranged larps I've ever seen, then they pat
              each other on the back on BlueSky, desperately wanting to be a
              part of something.
              
              These people envision themselves as folk heroes when what they
              really need to do is go outside and touch grass.
       
                spookie wrote 20 hours 54 min ago:
                > The government doesn't delete anything. It might be moved or
                inaccessible to the public but that data is somewhere in
                perpetuity.
                
                If the government is democratic and values integrity? Sure.
                
                Otherwise I wouldn't bet on it. My own country's history books
                and my parents' own life stories have already warned me about
                how fickle democracy is. No democratic country is free from
                that fact. Some think "checks and balances" ought to be enough
                to prevent it, but I wouldn't be so sure.
       
                alnwlsn wrote 1 day ago:
                Patently false.
                
   URI          [1]: https://www.archives.gov/personnel-records-center/fire...
       
                nancyminusone wrote 1 day ago:
                If it's inaccessible to the public, it might as well be
                deleted. What's the difference? If you can't get it, you don't
                have it.
       
            peppermill wrote 1 day ago:
            I think the data being discussed is quite a bit different than old
            TV Guides...
       
              zorpner wrote 23 hours 15 min ago:
              I wonder if those would be useful in identifying the potential
              contents of specific Marion Stokes tapes (my understanding is
              that they're sorted, but are only labeled with channel and
              date/time and are being archived slowly):
              
   URI        [1]: https://libwww.freelibrary.org/blog/post/5393
       
              NoMoreNicksLeft wrote 23 hours 36 min ago:
              I was, believe it if you wish, thinking about old TV guides just
              this morning and wondering how one would even go about archiving
              those. Most of the stumbling blocks for taking apart the glued
              binding for scanning have been figured out, of course, but for
              any given week there may have been as many as 60 or 70 editions
              (for each television market, I think). None of these have proper
              ISSN numbers as far as I'm aware, and other than the listings
              they can be visually indistinguishable. Then there is the
              challenge of finding those, and not knowing whether this or that
              edition is missing (from time to time, the company would create
              new additions for new regions, or fold old ones back into some
              other are) along with even parsing the content. Many of these tv
              shows aren't on themoviedb or thetvdb, and if the shows are, then
              there won't be episode listings (there were 6000 Donahue talk
              show episodes, after all). On top of all of that, you can't
              necessarily know what was on tv at a given time and day, with
              federal government preemptions, commercials, unreported
              last-minute rescheduling, etc.
              
              But I can also see why people might want to keep more interesting
              data, like when the Federal Cheese-Sniffing Agency moved offices
              back in 1982 and they have meticulous records of the 483 filing
              cabinets that had to be moved from the original location to their
              new home in Furrytown, Pennsylvania.
       
            dreamworld wrote 1 day ago:
            It might be of some interest to cultural historians in the future.
            But I think it makes more sense to take sample+curated data. But in
            any case if we can afford it, eh why not.
       
              rbanffy wrote 1 day ago:
              We don't know now what to curate for the future. We should
              preserve as much of everything we can - we don't know what will
              be important in 50, or 500 years.
              
              Case in point: retrocomputing is my hobby. I buy, restore,
              preserve, and use old computers. Most of them are home computers,
              because business computers go directly from the office to the
              recycling facility or the landfill. Unless someone deliberately
              preserved, say, a Burroughs B-25 desktop, or the similar from
              Data General, they are gone.
       
                Suppafly wrote 1 day ago:
                My son is into retrocomputing, mostly using older hardware I
                have from when I was younger, and we have a stack of old compaq
                desktops where you can't access the bios because it requires a
                specific floppy that is nearly impossible to find online. This
                is 486/pentium era stuff, the older stuff is even harder to
                find.
       
                  rbanffy wrote 10 hours 2 min ago:
                  I've been looking for a DEC terminal with Sixel, Tektronix
                  and ReGIS graphics for a while, with zero success. They
                  weren't rare at all - they were a massive success, and, yet,
                  it seems almost all ended up in a recycling facility or an
                  e-waste dumpster. Many other terminals emulated them and
                  expanded on their feature set.
       
          hsuduebc2 wrote 1 day ago:
          I agree. I do not understand how this is perceived as an political
          issue and thus got flagged.
          
          Climate change is perceived for some reason politically too and not
          get flagged so often.
       
       
   DIR <- back to front page