_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   VASA-1: Lifelike audio-driven talking faces generated in real time
       
       
        lufeofpierrre wrote 13 hours 8 min ago:
        The people that killed my fam used this to maintain the illusion they
        were alive for like 4 years and extract informaton etc. On one hand it
        was nice to see them but on the other a very odd feeling talking to
        them knowing they were dead (will for sure get down voted but idk
        trippy interesting skynet moment the usual crowd on HN will never
        experience)
       
        fennecbutt wrote 18 hours 17 min ago:
        Don't know why they're not releasing it right away.
        
        If they can do it, so can someone else and hiding it makes it worse. If
        it's widely available people will quickly realise that the talking head
        on YT spouting racist BS is AI. This process needs to happen faster.
        
        Ofc there will still be people who don't care or understand, but there
        will always be people who are for example racist and don't care if the
        affirmation for their beliefs comes from a human or a machine.
       
        thih9 wrote 1 day ago:
        I could see this being used in movie production.
       
        metalspoon wrote 2 days ago:
        AI can talk with me. Why need a friend in real life?
       
        dang wrote 2 days ago:
        Related: [1] (via [2] , but we merged that thread hither)
        
   URI  [1]: https://arstechnica.com/information-technology/2024/04/microso...
   URI  [2]: https://news.ycombinator.com/item?id=40088826
       
        RcouF1uZ4gsC wrote 2 days ago:
        > To show off the model, Microsoft created a VASA-1 research page
        featuring many sample videos of the tool in action
        
        With AI stuff, I have learned to be very skeptical until and unless a
        relatively publicly accessible demo with user specified inputs is
        available.
        
        It is way too easy for humans to cherry pick the nice outputs, or to
        take advantage of biases in the training data to generate nice outputs,
        and is not at all reflective of how it holds up in the real world.
        
        Part of the reason why ChatGPT, Stable Diffusion, Dall-E had such an
        impact is the people could try and see for themselves without being
        told how awesome it was by the people making it.
       
        smusamashah wrote 2 days ago:
        This is good but nowhere as good as EMO [1] ( [2] )
        
        This one has too much fake looking body movement and looks
        eerie/robotic/uncanny valley. The lips don't sync properly in many
        places. Eye movement and over all head and body movement is not very
        natural at all.
        
        While EMO looks just perfect mostly. The very first two videos on EMO
        page are perfect example of that. See the rap near the end to see how
        good EMO is at lip sync.
        
   URI  [1]: https://humanaigc.github.io/emote-portrait-alive/
   URI  [2]: https://news.ycombinator.com/item?id=39533326
       
          ec109685 wrote 21 hours 6 min ago:
          There were some misses with emo too, but Hepburn at the end was
          amazing.
       
          majkinetor wrote 1 day ago:
          This is real time!
       
          cchance wrote 2 days ago:
          Another research project with 0 model release
       
        BobaFloutist wrote 2 days ago:
        Oh good!
       
        egberts1 wrote 2 days ago:
        Cool! Now we can expect to see an endless stream of dead president's
        speeches "LIVE" from the White House.
        
        This should end well.
       
        andrewstuart wrote 2 days ago:
        Despite vast investment in AI by VCs and vast numbers of startups in
        the field, these sort of things remain unavailable as simple consumer
        installable software.
        
        Every second day HN has some post about some new amazing AI system.
        Never available to download run and use.
        
        Why the vast investment and no startup selling consumer downloadable
        software to do it?
       
        cs702 wrote 2 days ago:
        And it's only going to get faster, better, easier, cheaper.[a]
        
        Meanwhile, yesterday my credit card company asked me if I wanted to use
        voice authentication for verifying my identity "more securely" on the
        phone. Surely the company spent many millions of dollars to enable this
        new security-theater feature.
        
        It begs the question: Is every single executive and manager at my
        credit card company completely unaware that right now anyone can clone
        anyone else's voice by obtaining a short sample audio clip taken from
        any social network? If anyone is aware, why is the company acting like
        this?
        
        Corporate America is so far behind the times it's not even funny.
        
        ---
        
        [a] With apologies to Daft Punk.
       
          tennisflyi wrote 21 hours 48 min ago:
          Pretty much. You think they’re smart or with it. They’re just
          lucky fogies
       
          stubish wrote 1 day ago:
          The point is lowering liability. By choosing to not use voice
          authentication (or whatever), it becomes easier to argue that fraud
          is your fault. Or if you did use it, the company 'is doing everything
          they can' and 'exceeding industry standards' so it isn't their fault,
          either. It also just makes them seem more secure to the uninitiated
          (the security-theater bit, yes).
          
          Maybe one day someone will successfully argue that adding easily
          defeated checks lowers security, by adding friction for no reason or
          instilling false confidence in users at both ends.
       
          dade_ wrote 2 days ago:
          Yes, they are and they also know it isn't foolproof so that isn't the
          only information being compared against.  Some services compare the
          calling number is compared against live activity on the PSTN (ie,
          subscriber's phone is not in an active call, but their number is
          being presented as as the caller ID is one such metric).  Many of
          these deep fake generators with public access have watermarks in the
          audio.     The audio stream comparison continues, it needs to speak
          like you, word and phrase choices.  There are other fingerprints of
          generated audio that you can't hear, but are still obvious at the
          moment.  With security, it always cat and mouse with fraudsters on
          one hand and the effort/frustration with customers on the other.
          
          Asking customers questions that they don't remember and that
          fraudsters have in front of them isn't working and the time it takes
          for agents to authenticate is very expensive.
          
          While there is no doubt that companies will screw up with security,
          you are making wild accusations without reference to any evidence.
       
          supercheetah wrote 2 days ago:
          That scene from Sneakers would be so different nowadays. [1] "My
          voice is my passport. Verify me." [2] 1. [1] 2.
          
   URI    [1]: https://youtu.be/WdcIqFOc2UE?si=Df3DtSakatp9eD0L
   URI    [2]: https://youtu.be/-zVgWpVXb64?si=yT2GZpb7E2yZoEYl
       
          ryandrake wrote 2 days ago:
          Any time you add a "new" security gate to your product, it should be
          in addition to and not instead of the existing gates. Biometrics
          should not replace username/password, they should be in addition to.
          Security Questions like "What was your first pet's name" should not
          be able to get you in the backdoor. SMS verification alone should not
          allow you to reset your password. Same with this voice authentication
          stuff. It should be another layer, not a replacement of your actual
          credentials.
          
          If you treat it as OR instead of AND, then your security is only as
          good as the worst link in the chain.
       
            recursive wrote 2 days ago:
            If you make your product sufficiently inconvenient, then you'll
            have the unassailable security posture of having no users.
       
          fragmede wrote 2 days ago:
          I mean, what do you want them to do? If we think their security
          officers are freaking out and holding meetings right now about what
          to do, or if they're asleep at the wheel, we'd be seeing the same
          thing from the outside, no?
       
            addandsubtract wrote 2 days ago:
            No, because multiple companies are pushing this atm. If it was only
            company I would agree, but with multiple, you'd have at least one
            that would back out of it again.
       
          user_7832 wrote 2 days ago:
          >  Is every single executive and manager at my credit card company
          completely unaware that right now anyone can clone anyone else's
          voice by obtaining a short sample audio clip taken from any social
          network?
          
          Your mistake is assuming the company cares. The "company" is a
          hundred different disjointed departments that only care about not
          getting caught Equifax-style (or filing for bankruptcy if caught). If
          the marketing director sees a shiny new thing that might boost some
          random KPI they may not really care about security.
          
          However in the rare chance that your bank is actually half decent,
          I'd suggest contacting their IT/Security teams about your concerns.
          Maybe you'll save some folks from getting scammed?
       
            cyanydeez wrote 2 days ago:
            Also, this feature is probably just some midd level execs plan for
            a bonus, not a rigorously reviewed and planned. It's also probably
            in the pipeline for a decade so if they don't push it out, suddenly
            they get no bonus for cancelling a project.
            
            Corporations are ultimately no better than governments and likely
            worse depending on what their regulatory environment looks like.
       
              iamflimflam1 wrote 2 days ago:
              There’s a really important thing here for anyone trying to do
              sales to big companies.
              
              Find an exec that needs a project to advance their career. Make
              your software that project.
              
              Suck in as many other execs into the project so their careers
              become coupled to getting your software rolled out.
       
                amindeed wrote 2 days ago:
                That's clever!
       
        SirMaster wrote 2 days ago:
        It looks all warpy and stretchy.  That's not how skin and face muscles
        work.  Looks fake to me.
       
          Zopieux wrote 2 days ago:
          I find the hairs to be the least realistic, they look elastic, which
          is unsurprising: highly detailed things like hairs are hard to
          simulate with good fidelity.
       
        FredPret wrote 2 days ago:
        Anyone have any good ideas for how we're going to do politics now?
        
        Today a big ML model can do this and it's somewhat regulate-able,
        tomorrow people can do this on their contact-lens supercomputers and
        anyone can generate a video of anything.
        
        Is going back to personally knowing your local representative the only
        way? How will we vote for national candidates if nobody knows what they
        think or say?
       
          fennecfoxy wrote 5 hours 9 min ago:
          Same way we've always done it; largely ignorant and apathetic masses
          that only care about waving their team's flag and don't give a damn
          about most of their teams' policies as long as they can still say X,Y
          and Z things at the Christmas dinner table.
          
          Democracy is already an illusion of choice anyway; just look at
          democratic candidates. It's gonna be Biden V Trump _again_. For
          London mayoral elections Sadiq is pretty much guaranteed to get in
          _again_. For UK main election it's gonna be the typical Tories V
          Labour BS _again_, with no new fresh young candidates with new ideas.
          
          Democracy is rotting everywhere it exists thanks to the idea of
          parties, party politics and the human need to pick a tribe and attack
          every other tribe.
       
          TimedToasts wrote 2 days ago:
          > Anyone have any good ideas for how we're going to do politics now?
          
          If a business is showing a demo of this you can be assured that the
          Government already has this tech and has for a period of time.
          
          > How will we vote for national candidates if nobody knows what they
          think or say?
          
          You don't know what they think or say now - hopefully this disabuses
          people of this notion.
       
            _djo_ wrote 1 day ago:
            > If a business is showing a demo of this you can be assured that
            the Government already has this tech and has for a period of time.
            
            That may have been true once upon a time, but it no longer is. And
            even in the areas it was true it was mostly for niche areas like
            cryptanalysis.
            
            Governments simply cannot attract or keep the level of talent
            required to have been far ahead of industry on LLMs and similar
            tech, especially not with the huge difference in salaries and
            working conditions.
       
          hooverd wrote 2 days ago:
          People already believe any quote you slap on a JPEG.
       
          dwb wrote 2 days ago:
          We already rely on chains of trust going back to the original source,
          and will still. I find these alarmist posts a bit mystifying –
          before photography, anyone could fake a quote of anyone, and human
          civilisation got quite far. We had a bit over a hundred years where
          phographic-quality images were possible and very hard to fake (which
          did and still does vary with technology), but clearly now we’re
          past that. We’ll manage!
       
            GeoAtreides wrote 2 days ago:
            In the before times we didn't have social media and its algorithms
            and reach. Does it matter that the chains of trust debunk a viral
            lie 24 hours after it had spread? Not that there's a lot of trust
            in the chains of trust to begin with. And if you still have trust,
            then you're not the target of the viral lie. And if you still have
            trust, then how long can you hold on that trust when the lies keep
            coming 24/7 one after another without end. As one movie critic once
            put it: You might not have noticed it, but your brain did. Very
            malleable this brain of ours.
            
            The civilization might be fine, sure. Now, democracy, on the other
            hand...
       
            woleium wrote 2 days ago:
            The issue is better phrased as “how will we survive the
            transition while some folk still believe the video they are seeing
            is irrefutable proof the event happened?”
       
            marcusverus wrote 2 days ago:
            Presidential elections are frequently pretty close. Taking the
            electoral college into account (not the popular vote, which doesn't
            matter) Donald Trump won the 2016 election by a grand total of
            ~80,000 votes in three states[0].
            
            Knowing that retractions rarely get viral exposure, it's not
            difficult to imagine that a few sufficiently-viral videos could
            swing enough votes to impact a presidential election. Especially
            when considering that the average person is not up to speed on the
            current state of the tech, and so has not been prompted to build up
            the mindset that's required to fend off this new threat.
            
            [0]
            
   URI      [1]: https://www.washingtonpost.com/news/the-fix/wp/2016/12/01/...
       
            BobaFloutist wrote 2 days ago:
            Yeah I mean tabloids have been fooling people with doctored photos
            for decades.
            
            Potentially we'll need slightly tighter regulations on formal press
            (so that people that care for accurate information have a place
            they can get it) and definitely we'll want to steer the culture
            back towards holding them accountable for misinformation, but
            credulous people have always had easy access to bad information.
            
            I'm much more worried at the potential abuse cases that involve
            ordinary people that aren't public figures, and have much less
            ability to defend themselves. Heck, even celebrities are a more
            vulnerable targets than politicians.
       
          kmlx wrote 2 days ago:
          > How will we vote for national candidates if nobody knows what they
          think or say?
          
          i’m going to burst your bubble here, but most voters have no idea
          about policies or candidates. most voters vote based on inertia or
          minimal cues, not on policies or candidates.
          
          i suggest you look up “The American Voter”, “The Democratic
          Dilemma: Can Citizens Learn What They Need to Know?” and
          “American National Election Studies”.
       
          4ndrewl wrote 2 days ago:
          DNS? Might be that we need a radical (for some) change of viewpoint.
          
          Just as there's no privacy on the internet, how about 'theres very
          little trust on the internet'. Assume everything not securely signed
          by a trusted party is false.
       
            fennecfoxy wrote 5 hours 2 min ago:
            A large number of people don't really care about verifying what
            they've heard is true or not before repeating it, eventually making
            it fact amongst themselves.
            
            Hell I've been guilty of spouting BS before, just because I've
            heard something from so many people. Then find that when I look it
            up, it's not true.
            
            It's not really a tech problem, it's more of a human problem imo,
            like so many others. But there is literally nothing we can do about
            it.
       
          hx8 wrote 2 days ago:
          Hyper targeted placement of generated content designed to entice you
          to donate to political campaigns and to vote.  Perhaps leading to a
          point where entire video clips are generated for a single viewer. 
          Politicians and political commentators will lease their likeness and
          voice out for targeted messaging to be generated using their
          likeness.  Less reputable platforms will allow disinformation
          campaigns to spread.
       
          T-A wrote 2 days ago:
          > Today a big ML model can do this
          
          Not that big: [1]
          
   URI    [1]: https://github.com/Zejun-Yang/AniPortrait
   URI    [2]: https://huggingface.co/ZJYang/AniPortrait/tree/main
       
            cchance wrote 2 days ago:
            Didn't see that one pretty cool, not as good as Emo or Vasa but
            pretty good
       
          qup wrote 2 days ago:
          People in my circles have been saying this for a few years now, and
          we've yet to see it happen.
          
          I've got my popcorn ready.
          
          But you can rest easy. Everyone just votes for the candidate their
          party picked, anyway.
       
            FredPret wrote 2 days ago:
            It'll happen - deepfakes aren't good enough yet. But when they
            become ubiquitous and hard to spot, it'll be chaos until the
            average person is mentally inoculated against believing any video /
            anything on the internet.
            
            I wonder if it's possible to digitally sign footage as it's
            captured? It'd be nice to have some share-able demonstrably true
            media.
            
            Edit: I'm a centrist and I definitely would lean one way or the
            other based on who the options are (or who I think they are).
       
        binkHN wrote 2 days ago:
        Full details at
        
   URI  [1]: https://www.microsoft.com/en-us/research/project/vasa-1/
       
        karaterobot wrote 2 days ago:
        We need some clear legislation around this right now.
       
          CamperBob2 wrote 2 days ago:
          Legislation only impairs the good guys.
       
          4ndrewl wrote 2 days ago:
          In which jurisdiction?
       
            karaterobot wrote 2 days ago:
            What jurisdiction would not benefit from legislation around
            duplicating people's identities using AI?
       
          stronglikedan wrote 2 days ago:
          counterpoint: we don't need any more legislation
       
            qwertox wrote 2 days ago:
            I tend towards agreeing with you. Many of the problems, like
            impersonation, are already illegal.
            
            And replacing a person which spreads lies, as can be seen in most
            TV or glossy cover ads, shouldn't trigger some new legal action.
            The only difference is that now the actor is also a lie.
            
            And countries which use actors or news anchors for spreading
            propaganda surely won't see an issue with replacing them with AI
            characters.
            
            People who then get to read that their most favorite, stunningly
            beautiful Instagram or TikTok influencer is nothing but a fat,
            chips-eating ugly person using AI, may try to raise some legal
            issues to soothe their disappointment. They then might raise a
            point which sounds reasonable, but which would then force
            politicians to also tackle the lies which are spread in
            TV/Magazines ads.
            
            Maybe clearly labeling any use of this tech, maybe even with a QR
            code linking to who is the owner of the AI, similar to QR codes on
            meat packaging which allow you to track the origin of the meat,
            would be something what laws could be helpful with, in the spirit
            of transparency.
       
        physhster wrote 2 days ago:
        A fantastic technological advance for election interference!
       
          IshKebab wrote 2 days ago:
          As if this technology was needed.
       
          RGamma wrote 2 days ago:
          Such an exciting startup idea! I'm thrilled!
       
        balls187 wrote 2 days ago:
        I'm curious what is the reason for deepfake research and what the
        practical application is.
        
        Can someone explain the commercial need to take someones likeness and
        generate video content?
        
        If I was an a-list celebrity, I would give permission for coke to make
        a commercial with my likeness, provided I am allowed final approval of
        the finished ad?
        
        Do I have an avatar that attends my zoom work calls?
       
          criddell wrote 2 days ago:
          If beautiful people have an advantage in the job market, maybe people
          will use deepfake technology when doing zoom interviews? Maybe they
          will use it to alter their accent?
       
          bonton89 wrote 2 days ago:
          Propaganda, political manipulation, narrative nudging, regular scams
          and advertising.
          
          Even though most of those things are illegal you could just have
          foreign cat's paw firms do it. Maybe you fire them for "going to far"
          after the damage is done, assuming some even manages to connect the
          dots.
       
          jdietrich wrote 2 days ago:
          In this case, replacing humans in service jobs. From the paper:
          
          "Such technology holds the promise of enriching digital
          communication,
          increasing accessibility for those with communicative impairments,
          transforming education methods with interactive AI tutoring, and
          providing therapeutic support and social interaction in healthcare."
          
          A convincing simulacrum of empathy could plausibly be the most
          profitable product since oil.
       
          szundi wrote 2 days ago:
          Imagine being the CEO and you just grab your salary and options, go
          home, sit in the hot tub while one of the interns carefully prompts
          GPT and VASE how you are giving a speech online about strategic
          directions. /s
       
          SkyPuncher wrote 2 days ago:
          One the surface, it's a simple, understandable demo for the masses.
          While at the same time, it hints at deeper commercial usage.
          
          Disney has been using digital likeness to maintain characters who's
          actors/actresses have died. Princess Leia is the most prominent
          example. Arguably, there is significant realistic value in being able
          to generate a human-like character that doesn't have to be recast.
          That character can be any age, at any time, and look exactly like the
          actor/actress.
          
          For actors/actresses, I suspect many of them will start licensing
          their image/likeness as they look to wind down their careers. It
          gives them on-going income with very little effort.
       
          r1chardnl wrote 2 days ago:
          Apple Vision Pro personas competition
       
          JamesBarney wrote 2 days ago:
          Video games, entertainment, and avatars seems like the big ones.
       
            HeatrayEnjoyer wrote 2 days ago:
            If that is really the reason then this is insane and everyone
            involved should put their keyboards down and stop what they are
            doing.
            
            This would be as if we invented and sold nuclear weapons to dig out
            quarry mines faster. The inconvenience it saves us quickly
            disappears into the overwhelming shadow of the enormous harm now
            enabled.
       
              ImPostingOnHN wrote 2 days ago:
              > This would be as if we invented and sold nuclear weapons to dig
              out quarry mines faster.
              
              ”Project Plowshare was the overall United States program for
              the development of techniques to use nuclear explosives for
              peaceful construction purposes.”[0]
              
              0:
              
   URI        [1]: https://en.wikipedia.org/wiki/Project_Plowshare
       
                wumeow wrote 2 days ago:
                Yeah, and it was terminated. Much harder to put this genie back
                in the bottle.
       
          mensetmanusman wrote 2 days ago:
          The purpose is to give remote workers the ability to clone themselves
          and automate their many jobs. /s
          
          (but actually, because laziness is the driver of all innovation, I
          wouldn't be surprised if this happens).
       
          hypeatei wrote 2 days ago:
          Entertainment maybe? I know that's not necessarily an ethical reason
          but some have made hilarious AI-generated songs already.
       
          bugglebeetle wrote 2 days ago:
          State disinformation and propaganda campaigns.
       
            NortySpock wrote 2 days ago:
            Corporate disinformation and propaganda campaigns.
            
            Personal disinformation and propaganda campaigns.
            
            Oh Brave New World, that has such fake people in it!
       
        TriangleEdge wrote 2 days ago:
        Why is this research being done? Is this some kind of arms race? The
        only purpose of this technology I can think of is getting spies to
        abuse others.
        
        Am I going to have to do AuthN and AuthZ on every phone call and zoom
        now?
       
          zamadatix wrote 22 hours 6 min ago:
          > It paves the way for real-time engagements with lifelike avatars
          that emulate human conversational behaviors.
          
          Teams started rolling out Avatars [1] , this would be a step up. I'm
          not really a fan but that doesn't mean I can excuse the use case.
          
   URI    [1]: https://techcommunity.microsoft.com/t5/microsoft-teams-blog/...
       
          berniedurfee wrote 1 day ago:
          I was thinking about this the other day. An implantable Yubikey type
          device that integrates with whatever device you’re using to
          validate your identity for phone calls or video conferences.
          
          Subdermal X.509 maybe with some sort of neurolink adapter so you can
          confirm the request for identity. Though, first versions might be
          just a small button you need to press during the handshake.
       
          HarHarVeryFunny wrote 2 days ago:
          > Why is this research being done?
          
          I think it's mostly "because it can be done". These types of
          impressive demos have become relatively low hanging fruit in terms of
          how modern machine learning can be applied.
          
          One could imagine commercial applications (VR, virtual "try before
          you buy", etc), but things like this can also be a flex by the AI
          labs, or a PhD student wanting to write a paper.
       
          phkahler wrote 2 days ago:
          Newscasters and other talking heads will be out of business. Just
          pipe the script into some AI and get video.
       
          danmur wrote 2 days ago:
          We all know why this is really happening. Clippy 2.0.
       
          1659447091 wrote 2 days ago:
          Advertising. Now you and your friends star in the streaming
          commercials and digital billboards near you! (whether you want to or
          not)
       
          andybak wrote 2 days ago:
          Because the text for this is only a slight variation of the tech for
          a broad range of legitimate applications?
          
          Because even this precise tech has legitimate use cases?
          
          > The only purpose of this technology I can think of is getting spies
          to abuse others.
          
          Can you really not think of any other use cases?
       
            lo0dot0 wrote 1 day ago:
            Why don't you get more specific about your claims?
       
              andybak wrote 1 day ago:
              Jeez. I dunno. Sometimes I just reach my threshold for the time
              I'm prepared to spend debating with strangers on the internet.
       
            krainboltgreene wrote 2 days ago:
            Why don't you list some legitimate and useful values of this work?
            Especially at the price we and this company are paying.
       
          tithe wrote 2 days ago:
          I get the feeling it's "someone's going to do this, so it might as
          well be us."
          
          It's fascinating how research can take on a life of its own and will
          be pushed, by someone, to its own conclusion.  Even for immensely
          destructive technologies (e.g., atomic weapons, viruses), the impact
          of a technology is its own attractor (could you say that's
          risk-seeking behavior?)
          
          > Am I going to have to do AuthN and AuthZ on every phone call and
          zoom now?
          
          "Alexa, I need an alibi for yesterday at noon."
       
          Arnavion wrote 2 days ago:
          On the other hand, if deepfaking becomes common enough that everyone
          stops trusting everything they read / see on the internet, it would
          be a net good against the spread of disinformation compared to today.
       
            piva00 wrote 2 days ago:
            That's the whole issue though, spread of disinformation eroded
            trust, furthering this into obliteration of all trust is not a good
            outcome.
       
            anigbrowl wrote 2 days ago:
            everyone stops trusting everything
            
            Why would you expect this to happen? Lots of people are gullible,
            if it were otherwise a lot of well-known politicians would be out
            of a job or would never have been elected to begin with.
       
              ryandrake wrote 2 days ago:
              If it's even commoner than "common enough" then anyone could at
              least try to help their gullible friends and family by sending
              them a deepfake video of them doing/saying something they've
              never said. A lot of people will suddenly wise up when a problem
              affects them directly.
       
            notaustinpowers wrote 2 days ago:
            I don't see the extinction of trust through the introduction of
            garbage falsehoods to be a net good.
            
            Believing that everything you eat is poisoned is no way to live.
            Believing that everything you see is a lie is also no way to live.
       
              throwthrowuknow wrote 2 days ago:
              Before photography this was just the normal state of the world.
              Think a little, back then any story or picture you saw was made
              by a person and you only had their reputation to go by. Think
              some more and you realize that’s never changed even with
              pictures and video. Easy AI generated pictures and video just
              remove the illusion of trust.
       
            hiatus wrote 2 days ago:
            I don't see that as an outcome. We have already seen a grand
            erosion of trust in institutions. Moving to an even lower trust
            society does not sound like it would have positive consequences for
            discourse, public policy, or society at large.
       
              throwthrowuknow wrote 2 days ago:
              The benefit is that you can only trust in person interaction with
              social and governmental institutions so people will have to leave
              their damn house again and go talk to each other face to face.
              Too many of our current problems are caused by people only
              interacting with each other and the world through third parties
              who are performing a MITM operation for their own benefit.
       
                1attice wrote 2 days ago:
                This assumes that it's a two-way door.
                
                Over the past century and a half, we've moved into vast,
                anonymous spaces, where I'm as likely to know and get along
                with my neighbour as I am to win the lottery.
                
                And this is important. No, it's not just a matter of putting on
                an effort to learn who my neighbour is -- my neighbour is
                literally someone whose life experiences are wildly different,
                whose social outcomes will be wildly different, whose beliefs
                and values are wildly different, and, for all I know, goes to
                conferences about how to eliminate me and my kind.
                
                (This last part is not speculation; I'm trans; see: CPAC)
                
                And these are my reasons. My neighbour is probably equivalently
                terrified of me, or what I represent, or the media I consume,
                or the conferences that I go to.
                
                Generalizing, you can't take a bunch of random people whose
                only bond is that they share meatspace-proximity, draw a circle
                around them, and declare them a community; those communities
                are _gone_, and you can no more bring them back than you can
                revive a corpse. (This would also probably not be a good idea,
                even if it were possible: they were also incredibly
                uncomfortable places for anyone who didn't fit in, and we have
                generations of fiction about people risking everything to leave
                for those big anonymous cities we created in step 1.)
                
                So, here we are, dependent on technology to stay in touch with
                far-flung friends and lovers and family, all of us, scattered
                like spiderwebs across the globe, and now into the strands
                drips a poison.
                
                Daniel Dennett was right. Counterfeit people are an enormous
                danger to civilization. Research like this should stop
                immediately.
       
              rightbyte wrote 2 days ago:
              Ironically low effort deep fakes might increase trust in
              organizations that have had the budget to fake stuff since their
              inception. The losers are 'citizen journalist' broadcasting on
              Youtube etc.
       
        alfalfasprout wrote 2 days ago:
        What this is starting to reveal is that there's a clear need for some
        kind of chain of custody system that guarantees the authenticity of
        what we see. Nikon/Canon tried doing this in the past, but improper
        storage of private keys lead to vulnerabilities. As far as I'm aware
        it's never extended to video either.
        
        With modern secure hardware keys it may yet be possible. The difficulty
        is that any kind of photo/video manipulation would break the signature
        (and there are practical reasons to want to be able to edit videos
        obviously).
        
        In the ideal world, any mutation to the original source content could
        be traceable to the original source content. But that's not an easy
        problem to solve.
       
          qingcharles wrote 1 day ago:
          None of that works, it's simply theatre.
          
          I can just take a (crypto-signed) photo of another photo.
       
            pedalpete wrote 1 day ago:
            The public block chain would show the chain of custody/ownership,
            so the photo of a photo would show the the final crypto signature
            does not belong to the claimed owner.
            
            You are correct that I as a viewer can't just rely on a
            crypto-signature like a watermark, I'd have to verify the chain of
            custody, but if I wanted to do that, it is available to do so.
       
          PeterisP wrote 2 days ago:
          I think it's practically impossible for such a system to be globally
          trustworthy due to the practical inevitability of "but improper
          storage of private keys lead to vulnerabilities" scenarios.
          
          People will expect or require that chain of custody only if all or at
          least the vast majority of the content they want would have that
          chain of custody.
          
          Photo/video content will have that chain of custody only if all or
          almost all of devices recording that content will support it -
          including all the cheapest mass-produced devices in reasonably
          widespread use anywhere in the world.
          
          And that chain of custody provides the benefit only if literally 100%
          of these manufacturers have their private keys secure 100% of the
          time, which is simply not happening; at least one such key will leak,
          if not unintentionally then intentionally for some intelligence
          agency who wants to fake content.
          
          And what do you do once you see a leak of the private keys used for
          signing the certificates for the private keys securely embedded in
          (for example) all of 2029 Huawei smartphones, which could be like 200
          million phones? The users won't replace their phones just because of
          that, and you'll have all these users making content - so everyone
          will have to choose to either auto-block and discard everything from
          all those 200 million users, or permit content with a potentially
          fake chain of custody; and I'm totally certain that most people will
          prefer the latter.
       
            macrolime wrote 2 days ago:
            Multisig by the user and camera manufacturer can help to some
            extent.
       
              PeterisP wrote 1 day ago:
              Multisig requires user cooperation, many users will not care to
              cooperate, and chain of custody verification really starts
              working only if you can get (force) ~100% of legitimate users
              globally to adopt the system.
              
              Also, for the potential creators of political fakes, such a
              multisig won't change things - getting a manufacturer's key may
              take some effort, but getting (and 'burning') keys of a dozen
              random people is relatively trivial in many ways  - e.g. buying
              off of poor people, stealing from compromised random machines, or
              simply issuing fake identities for state-backed actors.
       
          bonton89 wrote 2 days ago:
          I expect this type of system to be implemented in my lifetime. It
          will allow whistleblowers and investigative sources to be discredited
          or tracked down and persecuted.
       
            20after4 wrote 2 days ago:
            Unfortunately that seems inevitable.
       
          throw__away7391 wrote 2 days ago:
          No, we are merely returning to the pre-photography state of things
          where a mere printed image is not sufficient evidence for anything.
       
            anigbrowl wrote 2 days ago:
            merely
            
            You say this as if it were not a big deal, but losing a century's
            worth of authentication infrastructure/practises is a Bad Thing
            which will have large negative externalities.
       
              throw__away7391 wrote 2 days ago:
              It isn't really though. It has been technical possible to
              convincingly doctor photos for some time already, gradually
              getting easier, cheaper, and faster with time for decades, and
              even now the currently available tech has limitations and the
              full change is not going to happen overnight.
       
            BobaFloutist wrote 2 days ago:
            Pre-photography it at least took effort, practice, and time, to
            draw something convincing. Any skill with that much of a barrier to
            entry kind of automatically reduces the ability to be anonymous.
            And we didn't have the ability to instantaneously distribute images
            world-wide.
       
            hx8 wrote 2 days ago:
            True, an image, audio clip, or video is not enough evidence to
            establish truth.
            
            We still need a way to establish truth.  It's important for
            security cameras, for politics, and for public figures.  Here are
            some things we could start looking into.
            
            * Cameras that sign their output.  Yes, this camera caught this
            video, and it hasn't been modified.  This is a must for recordings
            being used in court evidence IMO.  Otherwise framing a crime is as
            easy as a few deep fakes and planting some DNA or fingerprints at
            the scene of the crime.
            
            * People digitally signing pictures/audio/videos of them.  Even if
            they digitally modified the data it shows that they consent to
            having their image associated with that message.  It reduces the
            strength of the attack vector of deep fake videos for reputation
            sabotage.
            
            * Malicious content source detection and flagging.  Think email
            spam filter type tagging of fake content.  Community notes on X
            would be another good example.
            
            * Digital manipulation detection.  I'm less than hopeful this will
            be the way in the long term, but could be used to disprove some
            fraud.
       
              djmips wrote 1 day ago:
              Every image is an NFT?
       
              alex_suzuki wrote 2 days ago:
              Signing is great, but the hard part is managing keys and trust.
       
              alchemist1e9 wrote 2 days ago:
              Blockchains can be used for cryptography time-stamping.
              
              I’ve always had a suspicion that governments and large
              companies would prefer a world without hard cryptographic proofs.
              After wikileaks they noticed DKIM can cause them major blowback.
              Somehow general public isn’t aware all the emails were proven
              authentic with DKIM signatures and even in fairly educated
              circles people believe the “emails were fake” but it’s not
              actually possible.
       
                PeterisP wrote 2 days ago:
                Quite the opposite, governments and large companies even
                explicitly run services for digital timestamping of documents -
                if I wanted to potentially assert some facts in court, I'd
                definitely prefer having that e-document with a timestamp
                notarized from my local government service instead of Bitcoin,
                because while the cryptography is the same, it would be much
                simpler from the practical legal perspective, requiring less
                time and effort and cost to get the court to accept that.
       
            tass wrote 2 days ago:
            There goes the dashcam industry…
       
              barbazoo wrote 2 days ago:
              You're being downvoted but I think the comment raises a good
              question. what will happen when someone gets accused of doctoring
              their dashcam footage? Or any footage used for evidence.
       
                tass wrote 1 day ago:
                I wasn’t really kidding with my comment. I just recently used
                camera footage as part of an accident claim and the assessor
                immediately said “that wasn’t your fault, we take
                responsibility on behalf of the driver”.
                
                In a few years time when (if) faking realistic footage becomes
                trivial, I suspect this kind of video will have a much, much
                higher level of scrutiny or only be accepted from certain
                sources such as government owned traffic cameras.
       
        m3kw9 wrote 2 days ago:
        If you see talking heads with static/simple/blurred backgrounds from
        now on, assume it is fake.  In the near future they will accompany
        realistic backgrounds and even less detectable fakes, we will have to
        assume all vids could be faked.
       
          hypeatei wrote 2 days ago:
          I wonder how video evidence in court is going to be affected by this.
          Both from a defense and prosecution perspective.
          
          Technically videos could've been faked before but it would require a
          ton of effort and skill that no average person would have.
       
            PeterisP wrote 2 days ago:
            Just as before, a major part of photo or video evidence in court is
            not the actual video itself, but a person testifying "on that day I
            saw this horrible event, where these things happened, and here's
            attached evidence that I filmed which illustrates some details of
            what I saw." - which would be a valid consideration even without
            the photo/video, but the added details do obviously help.
            
            Courts already wouldn't generally approve random footage without
            clear provenance.
       
            greenavocado wrote 2 days ago:
            There will be a new cottage industry of AI detectives that serve as
            expert witnesses and they will attest to the originality of media
            to the court
       
          Retric wrote 2 days ago:
          I still find the faces themselves to be really obviously wrong. The
          sound is just off, close enough to tell who is being imitated but not
          particularly good.
       
            tyingq wrote 2 days ago:
            It's interesting to me that some of the long-standing things are
            still there.  For example, lots of people with an earring in only
            one ear, unlikely asymmetry in the shape or size of their ears,
            etc.
       
            tredre3 wrote 2 days ago:
            Especially the hair "physics" and sometimes the teeth shift around
            a bit.
            
            But that's nitpicking. It's good enough to fool someone not
            watching too closely. And the fact that the result is this good
            with a single photo is truly astonishing, we used to have to train
            models on thousands of photos for days only to end up with a worse
            result!
       
        jazzyjackson wrote 3 days ago:
        i get why this is interesting but why is it desirable?
        
        real jurassic park "too preoccupied with whether they could" vibes
       
          acidburnNSA wrote 3 days ago:
          Now I can join the meeting "in a suit" while being out
          paddleboarding!
       
        ilaksh wrote 3 days ago:
        The paper mentions it uses Diffusion Transformers. The open source
        implementation that comes up in Google is Facebook Research's PyTorch
        implementation which is a non-commercial license. [1] Is there
        something equivalent but MIT or Apache?
        
        I feel like diffusion transformers are key now.
        
        I wonder if OpenAI implemented their SORA stuff from scratch or if they
        built on the Facebook Research diffusion transformers library. That
        would be interesting if they violated the non-commercial part.
        
        Hm. Found one:
        
   URI  [1]: https://github.com/facebookresearch/DiT
   URI  [2]: https://github.com/milmor/diffusion-transformer-keras
       
        IshKebab wrote 3 days ago:
        Oh god don't watch their teeth! Proper creepy.
        
        Still, apart from the teeth this looks extremely convincing!
       
          mtremsal wrote 2 days ago:
          The teeth resizing dynamically is incredibly distracting, or more
          positively, a nice way to identify fakes. For now.
       
          ygjb wrote 3 days ago:
          yeah, teeth, tongue movement and lack of tongue shape and the
          "stretching" of the skin around the cheeks in the images pushed the
          videos right into the uncanny valley for me.
       
        pxoe wrote 3 days ago:
        maybe making a webpage with 27 videos isn't the greatest web design
        idea
       
          zamadatix wrote 21 hours 58 min ago:
          It's up to your browser on whether those are actually loaded all at
          once. E.g. on Chrome Desktop with no data saver modes enabled it
          buffers the first couple seconds of each video then when you play it
          grabs the remaining MBs for that. That way you can see the videos as
          quick as you like but not actually have to load all 27 fully just
          because you opened the page.
       
          sitzkrieg wrote 3 days ago:
          the busted two scrolling sections on mobile really doesnt help
       
        gedy wrote 4 days ago:
        My first thought was "oh no the interview fakes", but then I realized -
        what if they just kept using the face?    Would I care?
       
          PeterisP wrote 2 days ago:
          It would be interesting that a remote candidate could easily identify
          as whatever ethnicity, age or even gender they consider most
          beneficial for hiring to avoid discrimination or fit certain
          diversity incentives.
          
          Tech like this has the potential to bring us back to the days of "on
          the Internet, nobody know's you're a dog"
          
   URI    [1]: https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_...
       
          acidburnNSA wrote 3 days ago:
          Yeah, even if they just use LLMs to do all the work, or are a LLM
          themselves, as long as they can do the work I guess.
          
          Weird implications for various regulations though.
       
        fluffet wrote 4 days ago:
        This is absolutely crazy. And it'll only get better from here. Imagine
        "VASA-9" or whatever.
        
        I thought deepfakes were still quite a bit away but after this I will
        have to be way more careful online. It's not far from behind something
        that can show up in your "YouTube shorts" feed and trick you if you
        didn't already know it was AI.
       
          smusamashah wrote 3 days ago:
          This is good but nowhere as good as EMO [1] ( [2] )
          
          This one has too much movement and looks eerie/robotic/uncanny
          valley. While EMO looks just perfect.
          
   URI    [1]: https://humanaigc.github.io/emote-portrait-alive/
   URI    [2]: https://news.ycombinator.com/item?id=39533326
       
            vessenes wrote 3 days ago:
            Hard disagree -- I think you might be misremembering how EMO looks
            in practice -- I'm sure we'll learn VASA-1 "telltales" but to my
            eyes there are far fewer than EMO - zero of the EMO videos were
            'perfect' for me, and many show little glitches or missing sync.
            VASA-1 still blinks a bit more than I think is natural, but it
            looks much more fluid.
            
            Both are, BTW, AMAZING!! Pretty crazy.
       
              smusamashah wrote 3 days ago:
              In VASA there is way to much body movement instead of just being
              he head as if camera is moving in the strong winds. EMO is a lot
              more human like. In the very first video on the EMO page I still
              cannot see it as a generated video, its that real. The lip
              movement, the expressions are in almost in perfect sync with the
              voice. That is absolutely not the case with VASA
       
        fullstackchris wrote 4 days ago:
        lol how does something like this get only 50ish votes but some
        hallucinating video slop generator from some of the other competitors
        gets thousands?
       
        qwertox wrote 4 days ago:
        So an ugly person will be able to present his or her ideas on the same
        visual level as a beautiful person. Is this some sort of
        democratization?
       
        nycdatasci wrote 4 days ago:
        “We have no plans to release an online demo, API, product, additional
        implementation details, or any related offerings until we are certain
        that the technology will be used responsibly and in accordance with
        proper regulations.”
       
          araes wrote 2 days ago:
          Translation: "We're attempting to preserve our moat, and this is the
          correct PR blurb.  We'll release an API once we're far enough ahead
          and extracted enough money."
          
          Like somebody on Ars noted "anybody notice it's an election year?" 
          You don't need to release an API, all online videos are now
          suspicious authenticity.  Somebody make a video of Trump or Biden's
          eyes following the mouse cursor around.  Real videos turned into fake
          videos.
       
          sitzkrieg wrote 3 days ago:
          money will change that
       
          justinclift wrote 4 days ago:
          > until we are certain that the technology will be used responsibly
          ...
          
          That's basically "never" then, so we'll see how long they hold out.
          
          Scammers are already using the existing voice/image/video generation
          apparently fairly successfully. :(
       
            spacemanspiff01 wrote 3 days ago:
            Having a delay, where people can see what's coming down the pipe,
            does have value. In a year there may/will be a open source model.
            
            But knowing that this is possible is important to know.
            
            I'm fairly clued in, and am constantly surprised at how fast things
            are changing.
       
              justinclift wrote 3 days ago:
              > But knowing that this is possible ...
              
              Who knowing this is possible?
              
              The general elderly person isn't going to know any time soon. 
              The SV IT people probably will.
              
              It's not an even distribution of knowledge. ;/
       
            ilaksh wrote 3 days ago:
            Eventually someone will implement one of these really good recent
            ones as open source and then it will be on replicate etc. right now
            the open source ones like SadTalker and Video Retalking are not
            live and are unconvincing.
       
          feyman_r wrote 4 days ago:
          /s it doesn’t have the phrase LLM in the title
       
        gavi wrote 4 days ago:
        The GPU requirements for realtime video generation are very minimal in
        the grand scheme of things. Assault on reality itself.
       
        nojvek wrote 4 days ago:
        I like the considerations topic.
        
        There’s likely also a an unsaid statement. This is for us only and
        we’ll be the only ones making money from it with our definition of
        “safety” and “positive”.
       
        mdrzn wrote 4 days ago:
        Holy shit these are really high quality and basically in realtime on a
        4090. What a time to be alive.
       
          rbinv wrote 4 days ago:
          It really is something. 40 FPS on a 4090, damn.
       
        acidburnNSA wrote 4 days ago:
        Oh no. "Cameras on please!" will be replaced by "AI generated faces off
        please!" in teams.
       
        nowhereai wrote 4 days ago:
        woah. so far not in the news. this is the only article
        
   URI  [1]: https://www.ai-gen.blog/2024/04/microsoft-vasa-1-ai-technology...
       
       
   DIR <- back to front page