_______ __ _______ | | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | || _ || __|| < | -__|| _| | || -__|| | | ||__ --| |___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| on Gopher (inofficial) URI Visit Hacker News on the Web COMMENT PAGE FOR: URI VASA-1: Lifelike audio-driven talking faces generated in real time lufeofpierrre wrote 13 hours 8 min ago: The people that killed my fam used this to maintain the illusion they were alive for like 4 years and extract informaton etc. On one hand it was nice to see them but on the other a very odd feeling talking to them knowing they were dead (will for sure get down voted but idk trippy interesting skynet moment the usual crowd on HN will never experience) fennecbutt wrote 18 hours 17 min ago: Don't know why they're not releasing it right away. If they can do it, so can someone else and hiding it makes it worse. If it's widely available people will quickly realise that the talking head on YT spouting racist BS is AI. This process needs to happen faster. Ofc there will still be people who don't care or understand, but there will always be people who are for example racist and don't care if the affirmation for their beliefs comes from a human or a machine. thih9 wrote 1 day ago: I could see this being used in movie production. metalspoon wrote 2 days ago: AI can talk with me. Why need a friend in real life? dang wrote 2 days ago: Related: [1] (via [2] , but we merged that thread hither) URI [1]: https://arstechnica.com/information-technology/2024/04/microso... URI [2]: https://news.ycombinator.com/item?id=40088826 RcouF1uZ4gsC wrote 2 days ago: > To show off the model, Microsoft created a VASA-1 research page featuring many sample videos of the tool in action With AI stuff, I have learned to be very skeptical until and unless a relatively publicly accessible demo with user specified inputs is available. It is way too easy for humans to cherry pick the nice outputs, or to take advantage of biases in the training data to generate nice outputs, and is not at all reflective of how it holds up in the real world. Part of the reason why ChatGPT, Stable Diffusion, Dall-E had such an impact is the people could try and see for themselves without being told how awesome it was by the people making it. smusamashah wrote 2 days ago: This is good but nowhere as good as EMO [1] ( [2] ) This one has too much fake looking body movement and looks eerie/robotic/uncanny valley. The lips don't sync properly in many places. Eye movement and over all head and body movement is not very natural at all. While EMO looks just perfect mostly. The very first two videos on EMO page are perfect example of that. See the rap near the end to see how good EMO is at lip sync. URI [1]: https://humanaigc.github.io/emote-portrait-alive/ URI [2]: https://news.ycombinator.com/item?id=39533326 ec109685 wrote 21 hours 6 min ago: There were some misses with emo too, but Hepburn at the end was amazing. majkinetor wrote 1 day ago: This is real time! cchance wrote 2 days ago: Another research project with 0 model release BobaFloutist wrote 2 days ago: Oh good! egberts1 wrote 2 days ago: Cool! Now we can expect to see an endless stream of dead president's speeches "LIVE" from the White House. This should end well. andrewstuart wrote 2 days ago: Despite vast investment in AI by VCs and vast numbers of startups in the field, these sort of things remain unavailable as simple consumer installable software. Every second day HN has some post about some new amazing AI system. Never available to download run and use. Why the vast investment and no startup selling consumer downloadable software to do it? cs702 wrote 2 days ago: And it's only going to get faster, better, easier, cheaper.[a] Meanwhile, yesterday my credit card company asked me if I wanted to use voice authentication for verifying my identity "more securely" on the phone. Surely the company spent many millions of dollars to enable this new security-theater feature. It begs the question: Is every single executive and manager at my credit card company completely unaware that right now anyone can clone anyone else's voice by obtaining a short sample audio clip taken from any social network? If anyone is aware, why is the company acting like this? Corporate America is so far behind the times it's not even funny. --- [a] With apologies to Daft Punk. tennisflyi wrote 21 hours 48 min ago: Pretty much. You think theyâre smart or with it. Theyâre just lucky fogies stubish wrote 1 day ago: The point is lowering liability. By choosing to not use voice authentication (or whatever), it becomes easier to argue that fraud is your fault. Or if you did use it, the company 'is doing everything they can' and 'exceeding industry standards' so it isn't their fault, either. It also just makes them seem more secure to the uninitiated (the security-theater bit, yes). Maybe one day someone will successfully argue that adding easily defeated checks lowers security, by adding friction for no reason or instilling false confidence in users at both ends. dade_ wrote 2 days ago: Yes, they are and they also know it isn't foolproof so that isn't the only information being compared against. Some services compare the calling number is compared against live activity on the PSTN (ie, subscriber's phone is not in an active call, but their number is being presented as as the caller ID is one such metric). Many of these deep fake generators with public access have watermarks in the audio. The audio stream comparison continues, it needs to speak like you, word and phrase choices. There are other fingerprints of generated audio that you can't hear, but are still obvious at the moment. With security, it always cat and mouse with fraudsters on one hand and the effort/frustration with customers on the other. Asking customers questions that they don't remember and that fraudsters have in front of them isn't working and the time it takes for agents to authenticate is very expensive. While there is no doubt that companies will screw up with security, you are making wild accusations without reference to any evidence. supercheetah wrote 2 days ago: That scene from Sneakers would be so different nowadays. [1] "My voice is my passport. Verify me." [2] 1. [1] 2. URI [1]: https://youtu.be/WdcIqFOc2UE?si=Df3DtSakatp9eD0L URI [2]: https://youtu.be/-zVgWpVXb64?si=yT2GZpb7E2yZoEYl ryandrake wrote 2 days ago: Any time you add a "new" security gate to your product, it should be in addition to and not instead of the existing gates. Biometrics should not replace username/password, they should be in addition to. Security Questions like "What was your first pet's name" should not be able to get you in the backdoor. SMS verification alone should not allow you to reset your password. Same with this voice authentication stuff. It should be another layer, not a replacement of your actual credentials. If you treat it as OR instead of AND, then your security is only as good as the worst link in the chain. recursive wrote 2 days ago: If you make your product sufficiently inconvenient, then you'll have the unassailable security posture of having no users. fragmede wrote 2 days ago: I mean, what do you want them to do? If we think their security officers are freaking out and holding meetings right now about what to do, or if they're asleep at the wheel, we'd be seeing the same thing from the outside, no? addandsubtract wrote 2 days ago: No, because multiple companies are pushing this atm. If it was only company I would agree, but with multiple, you'd have at least one that would back out of it again. user_7832 wrote 2 days ago: > Is every single executive and manager at my credit card company completely unaware that right now anyone can clone anyone else's voice by obtaining a short sample audio clip taken from any social network? Your mistake is assuming the company cares. The "company" is a hundred different disjointed departments that only care about not getting caught Equifax-style (or filing for bankruptcy if caught). If the marketing director sees a shiny new thing that might boost some random KPI they may not really care about security. However in the rare chance that your bank is actually half decent, I'd suggest contacting their IT/Security teams about your concerns. Maybe you'll save some folks from getting scammed? cyanydeez wrote 2 days ago: Also, this feature is probably just some midd level execs plan for a bonus, not a rigorously reviewed and planned. It's also probably in the pipeline for a decade so if they don't push it out, suddenly they get no bonus for cancelling a project. Corporations are ultimately no better than governments and likely worse depending on what their regulatory environment looks like. iamflimflam1 wrote 2 days ago: Thereâs a really important thing here for anyone trying to do sales to big companies. Find an exec that needs a project to advance their career. Make your software that project. Suck in as many other execs into the project so their careers become coupled to getting your software rolled out. amindeed wrote 2 days ago: That's clever! SirMaster wrote 2 days ago: It looks all warpy and stretchy. That's not how skin and face muscles work. Looks fake to me. Zopieux wrote 2 days ago: I find the hairs to be the least realistic, they look elastic, which is unsurprising: highly detailed things like hairs are hard to simulate with good fidelity. FredPret wrote 2 days ago: Anyone have any good ideas for how we're going to do politics now? Today a big ML model can do this and it's somewhat regulate-able, tomorrow people can do this on their contact-lens supercomputers and anyone can generate a video of anything. Is going back to personally knowing your local representative the only way? How will we vote for national candidates if nobody knows what they think or say? fennecfoxy wrote 5 hours 9 min ago: Same way we've always done it; largely ignorant and apathetic masses that only care about waving their team's flag and don't give a damn about most of their teams' policies as long as they can still say X,Y and Z things at the Christmas dinner table. Democracy is already an illusion of choice anyway; just look at democratic candidates. It's gonna be Biden V Trump _again_. For London mayoral elections Sadiq is pretty much guaranteed to get in _again_. For UK main election it's gonna be the typical Tories V Labour BS _again_, with no new fresh young candidates with new ideas. Democracy is rotting everywhere it exists thanks to the idea of parties, party politics and the human need to pick a tribe and attack every other tribe. TimedToasts wrote 2 days ago: > Anyone have any good ideas for how we're going to do politics now? If a business is showing a demo of this you can be assured that the Government already has this tech and has for a period of time. > How will we vote for national candidates if nobody knows what they think or say? You don't know what they think or say now - hopefully this disabuses people of this notion. _djo_ wrote 1 day ago: > If a business is showing a demo of this you can be assured that the Government already has this tech and has for a period of time. That may have been true once upon a time, but it no longer is. And even in the areas it was true it was mostly for niche areas like cryptanalysis. Governments simply cannot attract or keep the level of talent required to have been far ahead of industry on LLMs and similar tech, especially not with the huge difference in salaries and working conditions. hooverd wrote 2 days ago: People already believe any quote you slap on a JPEG. dwb wrote 2 days ago: We already rely on chains of trust going back to the original source, and will still. I find these alarmist posts a bit mystifying â before photography, anyone could fake a quote of anyone, and human civilisation got quite far. We had a bit over a hundred years where phographic-quality images were possible and very hard to fake (which did and still does vary with technology), but clearly now weâre past that. Weâll manage! GeoAtreides wrote 2 days ago: In the before times we didn't have social media and its algorithms and reach. Does it matter that the chains of trust debunk a viral lie 24 hours after it had spread? Not that there's a lot of trust in the chains of trust to begin with. And if you still have trust, then you're not the target of the viral lie. And if you still have trust, then how long can you hold on that trust when the lies keep coming 24/7 one after another without end. As one movie critic once put it: You might not have noticed it, but your brain did. Very malleable this brain of ours. The civilization might be fine, sure. Now, democracy, on the other hand... woleium wrote 2 days ago: The issue is better phrased as âhow will we survive the transition while some folk still believe the video they are seeing is irrefutable proof the event happened?â marcusverus wrote 2 days ago: Presidential elections are frequently pretty close. Taking the electoral college into account (not the popular vote, which doesn't matter) Donald Trump won the 2016 election by a grand total of ~80,000 votes in three states[0]. Knowing that retractions rarely get viral exposure, it's not difficult to imagine that a few sufficiently-viral videos could swing enough votes to impact a presidential election. Especially when considering that the average person is not up to speed on the current state of the tech, and so has not been prompted to build up the mindset that's required to fend off this new threat. [0] URI [1]: https://www.washingtonpost.com/news/the-fix/wp/2016/12/01/... BobaFloutist wrote 2 days ago: Yeah I mean tabloids have been fooling people with doctored photos for decades. Potentially we'll need slightly tighter regulations on formal press (so that people that care for accurate information have a place they can get it) and definitely we'll want to steer the culture back towards holding them accountable for misinformation, but credulous people have always had easy access to bad information. I'm much more worried at the potential abuse cases that involve ordinary people that aren't public figures, and have much less ability to defend themselves. Heck, even celebrities are a more vulnerable targets than politicians. kmlx wrote 2 days ago: > How will we vote for national candidates if nobody knows what they think or say? iâm going to burst your bubble here, but most voters have no idea about policies or candidates. most voters vote based on inertia or minimal cues, not on policies or candidates. i suggest you look up âThe American Voterâ, âThe Democratic Dilemma: Can Citizens Learn What They Need to Know?â and âAmerican National Election Studiesâ. 4ndrewl wrote 2 days ago: DNS? Might be that we need a radical (for some) change of viewpoint. Just as there's no privacy on the internet, how about 'theres very little trust on the internet'. Assume everything not securely signed by a trusted party is false. fennecfoxy wrote 5 hours 2 min ago: A large number of people don't really care about verifying what they've heard is true or not before repeating it, eventually making it fact amongst themselves. Hell I've been guilty of spouting BS before, just because I've heard something from so many people. Then find that when I look it up, it's not true. It's not really a tech problem, it's more of a human problem imo, like so many others. But there is literally nothing we can do about it. hx8 wrote 2 days ago: Hyper targeted placement of generated content designed to entice you to donate to political campaigns and to vote. Perhaps leading to a point where entire video clips are generated for a single viewer. Politicians and political commentators will lease their likeness and voice out for targeted messaging to be generated using their likeness. Less reputable platforms will allow disinformation campaigns to spread. T-A wrote 2 days ago: > Today a big ML model can do this Not that big: [1] URI [1]: https://github.com/Zejun-Yang/AniPortrait URI [2]: https://huggingface.co/ZJYang/AniPortrait/tree/main cchance wrote 2 days ago: Didn't see that one pretty cool, not as good as Emo or Vasa but pretty good qup wrote 2 days ago: People in my circles have been saying this for a few years now, and we've yet to see it happen. I've got my popcorn ready. But you can rest easy. Everyone just votes for the candidate their party picked, anyway. FredPret wrote 2 days ago: It'll happen - deepfakes aren't good enough yet. But when they become ubiquitous and hard to spot, it'll be chaos until the average person is mentally inoculated against believing any video / anything on the internet. I wonder if it's possible to digitally sign footage as it's captured? It'd be nice to have some share-able demonstrably true media. Edit: I'm a centrist and I definitely would lean one way or the other based on who the options are (or who I think they are). binkHN wrote 2 days ago: Full details at URI [1]: https://www.microsoft.com/en-us/research/project/vasa-1/ karaterobot wrote 2 days ago: We need some clear legislation around this right now. CamperBob2 wrote 2 days ago: Legislation only impairs the good guys. 4ndrewl wrote 2 days ago: In which jurisdiction? karaterobot wrote 2 days ago: What jurisdiction would not benefit from legislation around duplicating people's identities using AI? stronglikedan wrote 2 days ago: counterpoint: we don't need any more legislation qwertox wrote 2 days ago: I tend towards agreeing with you. Many of the problems, like impersonation, are already illegal. And replacing a person which spreads lies, as can be seen in most TV or glossy cover ads, shouldn't trigger some new legal action. The only difference is that now the actor is also a lie. And countries which use actors or news anchors for spreading propaganda surely won't see an issue with replacing them with AI characters. People who then get to read that their most favorite, stunningly beautiful Instagram or TikTok influencer is nothing but a fat, chips-eating ugly person using AI, may try to raise some legal issues to soothe their disappointment. They then might raise a point which sounds reasonable, but which would then force politicians to also tackle the lies which are spread in TV/Magazines ads. Maybe clearly labeling any use of this tech, maybe even with a QR code linking to who is the owner of the AI, similar to QR codes on meat packaging which allow you to track the origin of the meat, would be something what laws could be helpful with, in the spirit of transparency. physhster wrote 2 days ago: A fantastic technological advance for election interference! IshKebab wrote 2 days ago: As if this technology was needed. RGamma wrote 2 days ago: Such an exciting startup idea! I'm thrilled! balls187 wrote 2 days ago: I'm curious what is the reason for deepfake research and what the practical application is. Can someone explain the commercial need to take someones likeness and generate video content? If I was an a-list celebrity, I would give permission for coke to make a commercial with my likeness, provided I am allowed final approval of the finished ad? Do I have an avatar that attends my zoom work calls? criddell wrote 2 days ago: If beautiful people have an advantage in the job market, maybe people will use deepfake technology when doing zoom interviews? Maybe they will use it to alter their accent? bonton89 wrote 2 days ago: Propaganda, political manipulation, narrative nudging, regular scams and advertising. Even though most of those things are illegal you could just have foreign cat's paw firms do it. Maybe you fire them for "going to far" after the damage is done, assuming some even manages to connect the dots. jdietrich wrote 2 days ago: In this case, replacing humans in service jobs. From the paper: "Such technology holds the promise of enriching digital communication, increasing accessibility for those with communicative impairments, transforming education methods with interactive AI tutoring, and providing therapeutic support and social interaction in healthcare." A convincing simulacrum of empathy could plausibly be the most profitable product since oil. szundi wrote 2 days ago: Imagine being the CEO and you just grab your salary and options, go home, sit in the hot tub while one of the interns carefully prompts GPT and VASE how you are giving a speech online about strategic directions. /s SkyPuncher wrote 2 days ago: One the surface, it's a simple, understandable demo for the masses. While at the same time, it hints at deeper commercial usage. Disney has been using digital likeness to maintain characters who's actors/actresses have died. Princess Leia is the most prominent example. Arguably, there is significant realistic value in being able to generate a human-like character that doesn't have to be recast. That character can be any age, at any time, and look exactly like the actor/actress. For actors/actresses, I suspect many of them will start licensing their image/likeness as they look to wind down their careers. It gives them on-going income with very little effort. r1chardnl wrote 2 days ago: Apple Vision Pro personas competition JamesBarney wrote 2 days ago: Video games, entertainment, and avatars seems like the big ones. HeatrayEnjoyer wrote 2 days ago: If that is really the reason then this is insane and everyone involved should put their keyboards down and stop what they are doing. This would be as if we invented and sold nuclear weapons to dig out quarry mines faster. The inconvenience it saves us quickly disappears into the overwhelming shadow of the enormous harm now enabled. ImPostingOnHN wrote 2 days ago: > This would be as if we invented and sold nuclear weapons to dig out quarry mines faster. âProject Plowshare was the overall United States program for the development of techniques to use nuclear explosives for peaceful construction purposes.â[0] 0: URI [1]: https://en.wikipedia.org/wiki/Project_Plowshare wumeow wrote 2 days ago: Yeah, and it was terminated. Much harder to put this genie back in the bottle. mensetmanusman wrote 2 days ago: The purpose is to give remote workers the ability to clone themselves and automate their many jobs. /s (but actually, because laziness is the driver of all innovation, I wouldn't be surprised if this happens). hypeatei wrote 2 days ago: Entertainment maybe? I know that's not necessarily an ethical reason but some have made hilarious AI-generated songs already. bugglebeetle wrote 2 days ago: State disinformation and propaganda campaigns. NortySpock wrote 2 days ago: Corporate disinformation and propaganda campaigns. Personal disinformation and propaganda campaigns. Oh Brave New World, that has such fake people in it! TriangleEdge wrote 2 days ago: Why is this research being done? Is this some kind of arms race? The only purpose of this technology I can think of is getting spies to abuse others. Am I going to have to do AuthN and AuthZ on every phone call and zoom now? zamadatix wrote 22 hours 6 min ago: > It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors. Teams started rolling out Avatars [1] , this would be a step up. I'm not really a fan but that doesn't mean I can excuse the use case. URI [1]: https://techcommunity.microsoft.com/t5/microsoft-teams-blog/... berniedurfee wrote 1 day ago: I was thinking about this the other day. An implantable Yubikey type device that integrates with whatever device youâre using to validate your identity for phone calls or video conferences. Subdermal X.509 maybe with some sort of neurolink adapter so you can confirm the request for identity. Though, first versions might be just a small button you need to press during the handshake. HarHarVeryFunny wrote 2 days ago: > Why is this research being done? I think it's mostly "because it can be done". These types of impressive demos have become relatively low hanging fruit in terms of how modern machine learning can be applied. One could imagine commercial applications (VR, virtual "try before you buy", etc), but things like this can also be a flex by the AI labs, or a PhD student wanting to write a paper. phkahler wrote 2 days ago: Newscasters and other talking heads will be out of business. Just pipe the script into some AI and get video. danmur wrote 2 days ago: We all know why this is really happening. Clippy 2.0. 1659447091 wrote 2 days ago: Advertising. Now you and your friends star in the streaming commercials and digital billboards near you! (whether you want to or not) andybak wrote 2 days ago: Because the text for this is only a slight variation of the tech for a broad range of legitimate applications? Because even this precise tech has legitimate use cases? > The only purpose of this technology I can think of is getting spies to abuse others. Can you really not think of any other use cases? lo0dot0 wrote 1 day ago: Why don't you get more specific about your claims? andybak wrote 1 day ago: Jeez. I dunno. Sometimes I just reach my threshold for the time I'm prepared to spend debating with strangers on the internet. krainboltgreene wrote 2 days ago: Why don't you list some legitimate and useful values of this work? Especially at the price we and this company are paying. tithe wrote 2 days ago: I get the feeling it's "someone's going to do this, so it might as well be us." It's fascinating how research can take on a life of its own and will be pushed, by someone, to its own conclusion. Even for immensely destructive technologies (e.g., atomic weapons, viruses), the impact of a technology is its own attractor (could you say that's risk-seeking behavior?) > Am I going to have to do AuthN and AuthZ on every phone call and zoom now? "Alexa, I need an alibi for yesterday at noon." Arnavion wrote 2 days ago: On the other hand, if deepfaking becomes common enough that everyone stops trusting everything they read / see on the internet, it would be a net good against the spread of disinformation compared to today. piva00 wrote 2 days ago: That's the whole issue though, spread of disinformation eroded trust, furthering this into obliteration of all trust is not a good outcome. anigbrowl wrote 2 days ago: everyone stops trusting everything Why would you expect this to happen? Lots of people are gullible, if it were otherwise a lot of well-known politicians would be out of a job or would never have been elected to begin with. ryandrake wrote 2 days ago: If it's even commoner than "common enough" then anyone could at least try to help their gullible friends and family by sending them a deepfake video of them doing/saying something they've never said. A lot of people will suddenly wise up when a problem affects them directly. notaustinpowers wrote 2 days ago: I don't see the extinction of trust through the introduction of garbage falsehoods to be a net good. Believing that everything you eat is poisoned is no way to live. Believing that everything you see is a lie is also no way to live. throwthrowuknow wrote 2 days ago: Before photography this was just the normal state of the world. Think a little, back then any story or picture you saw was made by a person and you only had their reputation to go by. Think some more and you realize thatâs never changed even with pictures and video. Easy AI generated pictures and video just remove the illusion of trust. hiatus wrote 2 days ago: I don't see that as an outcome. We have already seen a grand erosion of trust in institutions. Moving to an even lower trust society does not sound like it would have positive consequences for discourse, public policy, or society at large. throwthrowuknow wrote 2 days ago: The benefit is that you can only trust in person interaction with social and governmental institutions so people will have to leave their damn house again and go talk to each other face to face. Too many of our current problems are caused by people only interacting with each other and the world through third parties who are performing a MITM operation for their own benefit. 1attice wrote 2 days ago: This assumes that it's a two-way door. Over the past century and a half, we've moved into vast, anonymous spaces, where I'm as likely to know and get along with my neighbour as I am to win the lottery. And this is important. No, it's not just a matter of putting on an effort to learn who my neighbour is -- my neighbour is literally someone whose life experiences are wildly different, whose social outcomes will be wildly different, whose beliefs and values are wildly different, and, for all I know, goes to conferences about how to eliminate me and my kind. (This last part is not speculation; I'm trans; see: CPAC) And these are my reasons. My neighbour is probably equivalently terrified of me, or what I represent, or the media I consume, or the conferences that I go to. Generalizing, you can't take a bunch of random people whose only bond is that they share meatspace-proximity, draw a circle around them, and declare them a community; those communities are _gone_, and you can no more bring them back than you can revive a corpse. (This would also probably not be a good idea, even if it were possible: they were also incredibly uncomfortable places for anyone who didn't fit in, and we have generations of fiction about people risking everything to leave for those big anonymous cities we created in step 1.) So, here we are, dependent on technology to stay in touch with far-flung friends and lovers and family, all of us, scattered like spiderwebs across the globe, and now into the strands drips a poison. Daniel Dennett was right. Counterfeit people are an enormous danger to civilization. Research like this should stop immediately. rightbyte wrote 2 days ago: Ironically low effort deep fakes might increase trust in organizations that have had the budget to fake stuff since their inception. The losers are 'citizen journalist' broadcasting on Youtube etc. alfalfasprout wrote 2 days ago: What this is starting to reveal is that there's a clear need for some kind of chain of custody system that guarantees the authenticity of what we see. Nikon/Canon tried doing this in the past, but improper storage of private keys lead to vulnerabilities. As far as I'm aware it's never extended to video either. With modern secure hardware keys it may yet be possible. The difficulty is that any kind of photo/video manipulation would break the signature (and there are practical reasons to want to be able to edit videos obviously). In the ideal world, any mutation to the original source content could be traceable to the original source content. But that's not an easy problem to solve. qingcharles wrote 1 day ago: None of that works, it's simply theatre. I can just take a (crypto-signed) photo of another photo. pedalpete wrote 1 day ago: The public block chain would show the chain of custody/ownership, so the photo of a photo would show the the final crypto signature does not belong to the claimed owner. You are correct that I as a viewer can't just rely on a crypto-signature like a watermark, I'd have to verify the chain of custody, but if I wanted to do that, it is available to do so. PeterisP wrote 2 days ago: I think it's practically impossible for such a system to be globally trustworthy due to the practical inevitability of "but improper storage of private keys lead to vulnerabilities" scenarios. People will expect or require that chain of custody only if all or at least the vast majority of the content they want would have that chain of custody. Photo/video content will have that chain of custody only if all or almost all of devices recording that content will support it - including all the cheapest mass-produced devices in reasonably widespread use anywhere in the world. And that chain of custody provides the benefit only if literally 100% of these manufacturers have their private keys secure 100% of the time, which is simply not happening; at least one such key will leak, if not unintentionally then intentionally for some intelligence agency who wants to fake content. And what do you do once you see a leak of the private keys used for signing the certificates for the private keys securely embedded in (for example) all of 2029 Huawei smartphones, which could be like 200 million phones? The users won't replace their phones just because of that, and you'll have all these users making content - so everyone will have to choose to either auto-block and discard everything from all those 200 million users, or permit content with a potentially fake chain of custody; and I'm totally certain that most people will prefer the latter. macrolime wrote 2 days ago: Multisig by the user and camera manufacturer can help to some extent. PeterisP wrote 1 day ago: Multisig requires user cooperation, many users will not care to cooperate, and chain of custody verification really starts working only if you can get (force) ~100% of legitimate users globally to adopt the system. Also, for the potential creators of political fakes, such a multisig won't change things - getting a manufacturer's key may take some effort, but getting (and 'burning') keys of a dozen random people is relatively trivial in many ways - e.g. buying off of poor people, stealing from compromised random machines, or simply issuing fake identities for state-backed actors. bonton89 wrote 2 days ago: I expect this type of system to be implemented in my lifetime. It will allow whistleblowers and investigative sources to be discredited or tracked down and persecuted. 20after4 wrote 2 days ago: Unfortunately that seems inevitable. throw__away7391 wrote 2 days ago: No, we are merely returning to the pre-photography state of things where a mere printed image is not sufficient evidence for anything. anigbrowl wrote 2 days ago: merely You say this as if it were not a big deal, but losing a century's worth of authentication infrastructure/practises is a Bad Thing which will have large negative externalities. throw__away7391 wrote 2 days ago: It isn't really though. It has been technical possible to convincingly doctor photos for some time already, gradually getting easier, cheaper, and faster with time for decades, and even now the currently available tech has limitations and the full change is not going to happen overnight. BobaFloutist wrote 2 days ago: Pre-photography it at least took effort, practice, and time, to draw something convincing. Any skill with that much of a barrier to entry kind of automatically reduces the ability to be anonymous. And we didn't have the ability to instantaneously distribute images world-wide. hx8 wrote 2 days ago: True, an image, audio clip, or video is not enough evidence to establish truth. We still need a way to establish truth. It's important for security cameras, for politics, and for public figures. Here are some things we could start looking into. * Cameras that sign their output. Yes, this camera caught this video, and it hasn't been modified. This is a must for recordings being used in court evidence IMO. Otherwise framing a crime is as easy as a few deep fakes and planting some DNA or fingerprints at the scene of the crime. * People digitally signing pictures/audio/videos of them. Even if they digitally modified the data it shows that they consent to having their image associated with that message. It reduces the strength of the attack vector of deep fake videos for reputation sabotage. * Malicious content source detection and flagging. Think email spam filter type tagging of fake content. Community notes on X would be another good example. * Digital manipulation detection. I'm less than hopeful this will be the way in the long term, but could be used to disprove some fraud. djmips wrote 1 day ago: Every image is an NFT? alex_suzuki wrote 2 days ago: Signing is great, but the hard part is managing keys and trust. alchemist1e9 wrote 2 days ago: Blockchains can be used for cryptography time-stamping. Iâve always had a suspicion that governments and large companies would prefer a world without hard cryptographic proofs. After wikileaks they noticed DKIM can cause them major blowback. Somehow general public isnât aware all the emails were proven authentic with DKIM signatures and even in fairly educated circles people believe the âemails were fakeâ but itâs not actually possible. PeterisP wrote 2 days ago: Quite the opposite, governments and large companies even explicitly run services for digital timestamping of documents - if I wanted to potentially assert some facts in court, I'd definitely prefer having that e-document with a timestamp notarized from my local government service instead of Bitcoin, because while the cryptography is the same, it would be much simpler from the practical legal perspective, requiring less time and effort and cost to get the court to accept that. tass wrote 2 days ago: There goes the dashcam industry⦠barbazoo wrote 2 days ago: You're being downvoted but I think the comment raises a good question. what will happen when someone gets accused of doctoring their dashcam footage? Or any footage used for evidence. tass wrote 1 day ago: I wasnât really kidding with my comment. I just recently used camera footage as part of an accident claim and the assessor immediately said âthat wasnât your fault, we take responsibility on behalf of the driverâ. In a few years time when (if) faking realistic footage becomes trivial, I suspect this kind of video will have a much, much higher level of scrutiny or only be accepted from certain sources such as government owned traffic cameras. m3kw9 wrote 2 days ago: If you see talking heads with static/simple/blurred backgrounds from now on, assume it is fake. In the near future they will accompany realistic backgrounds and even less detectable fakes, we will have to assume all vids could be faked. hypeatei wrote 2 days ago: I wonder how video evidence in court is going to be affected by this. Both from a defense and prosecution perspective. Technically videos could've been faked before but it would require a ton of effort and skill that no average person would have. PeterisP wrote 2 days ago: Just as before, a major part of photo or video evidence in court is not the actual video itself, but a person testifying "on that day I saw this horrible event, where these things happened, and here's attached evidence that I filmed which illustrates some details of what I saw." - which would be a valid consideration even without the photo/video, but the added details do obviously help. Courts already wouldn't generally approve random footage without clear provenance. greenavocado wrote 2 days ago: There will be a new cottage industry of AI detectives that serve as expert witnesses and they will attest to the originality of media to the court Retric wrote 2 days ago: I still find the faces themselves to be really obviously wrong. The sound is just off, close enough to tell who is being imitated but not particularly good. tyingq wrote 2 days ago: It's interesting to me that some of the long-standing things are still there. For example, lots of people with an earring in only one ear, unlikely asymmetry in the shape or size of their ears, etc. tredre3 wrote 2 days ago: Especially the hair "physics" and sometimes the teeth shift around a bit. But that's nitpicking. It's good enough to fool someone not watching too closely. And the fact that the result is this good with a single photo is truly astonishing, we used to have to train models on thousands of photos for days only to end up with a worse result! jazzyjackson wrote 3 days ago: i get why this is interesting but why is it desirable? real jurassic park "too preoccupied with whether they could" vibes acidburnNSA wrote 3 days ago: Now I can join the meeting "in a suit" while being out paddleboarding! ilaksh wrote 3 days ago: The paper mentions it uses Diffusion Transformers. The open source implementation that comes up in Google is Facebook Research's PyTorch implementation which is a non-commercial license. [1] Is there something equivalent but MIT or Apache? I feel like diffusion transformers are key now. I wonder if OpenAI implemented their SORA stuff from scratch or if they built on the Facebook Research diffusion transformers library. That would be interesting if they violated the non-commercial part. Hm. Found one: URI [1]: https://github.com/facebookresearch/DiT URI [2]: https://github.com/milmor/diffusion-transformer-keras IshKebab wrote 3 days ago: Oh god don't watch their teeth! Proper creepy. Still, apart from the teeth this looks extremely convincing! mtremsal wrote 2 days ago: The teeth resizing dynamically is incredibly distracting, or more positively, a nice way to identify fakes. For now. ygjb wrote 3 days ago: yeah, teeth, tongue movement and lack of tongue shape and the "stretching" of the skin around the cheeks in the images pushed the videos right into the uncanny valley for me. pxoe wrote 3 days ago: maybe making a webpage with 27 videos isn't the greatest web design idea zamadatix wrote 21 hours 58 min ago: It's up to your browser on whether those are actually loaded all at once. E.g. on Chrome Desktop with no data saver modes enabled it buffers the first couple seconds of each video then when you play it grabs the remaining MBs for that. That way you can see the videos as quick as you like but not actually have to load all 27 fully just because you opened the page. sitzkrieg wrote 3 days ago: the busted two scrolling sections on mobile really doesnt help gedy wrote 4 days ago: My first thought was "oh no the interview fakes", but then I realized - what if they just kept using the face? Would I care? PeterisP wrote 2 days ago: It would be interesting that a remote candidate could easily identify as whatever ethnicity, age or even gender they consider most beneficial for hiring to avoid discrimination or fit certain diversity incentives. Tech like this has the potential to bring us back to the days of "on the Internet, nobody know's you're a dog" URI [1]: https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_... acidburnNSA wrote 3 days ago: Yeah, even if they just use LLMs to do all the work, or are a LLM themselves, as long as they can do the work I guess. Weird implications for various regulations though. fluffet wrote 4 days ago: This is absolutely crazy. And it'll only get better from here. Imagine "VASA-9" or whatever. I thought deepfakes were still quite a bit away but after this I will have to be way more careful online. It's not far from behind something that can show up in your "YouTube shorts" feed and trick you if you didn't already know it was AI. smusamashah wrote 3 days ago: This is good but nowhere as good as EMO [1] ( [2] ) This one has too much movement and looks eerie/robotic/uncanny valley. While EMO looks just perfect. URI [1]: https://humanaigc.github.io/emote-portrait-alive/ URI [2]: https://news.ycombinator.com/item?id=39533326 vessenes wrote 3 days ago: Hard disagree -- I think you might be misremembering how EMO looks in practice -- I'm sure we'll learn VASA-1 "telltales" but to my eyes there are far fewer than EMO - zero of the EMO videos were 'perfect' for me, and many show little glitches or missing sync. VASA-1 still blinks a bit more than I think is natural, but it looks much more fluid. Both are, BTW, AMAZING!! Pretty crazy. smusamashah wrote 3 days ago: In VASA there is way to much body movement instead of just being he head as if camera is moving in the strong winds. EMO is a lot more human like. In the very first video on the EMO page I still cannot see it as a generated video, its that real. The lip movement, the expressions are in almost in perfect sync with the voice. That is absolutely not the case with VASA fullstackchris wrote 4 days ago: lol how does something like this get only 50ish votes but some hallucinating video slop generator from some of the other competitors gets thousands? qwertox wrote 4 days ago: So an ugly person will be able to present his or her ideas on the same visual level as a beautiful person. Is this some sort of democratization? nycdatasci wrote 4 days ago: âWe have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.â araes wrote 2 days ago: Translation: "We're attempting to preserve our moat, and this is the correct PR blurb. We'll release an API once we're far enough ahead and extracted enough money." Like somebody on Ars noted "anybody notice it's an election year?" You don't need to release an API, all online videos are now suspicious authenticity. Somebody make a video of Trump or Biden's eyes following the mouse cursor around. Real videos turned into fake videos. sitzkrieg wrote 3 days ago: money will change that justinclift wrote 4 days ago: > until we are certain that the technology will be used responsibly ... That's basically "never" then, so we'll see how long they hold out. Scammers are already using the existing voice/image/video generation apparently fairly successfully. :( spacemanspiff01 wrote 3 days ago: Having a delay, where people can see what's coming down the pipe, does have value. In a year there may/will be a open source model. But knowing that this is possible is important to know. I'm fairly clued in, and am constantly surprised at how fast things are changing. justinclift wrote 3 days ago: > But knowing that this is possible ... Who knowing this is possible? The general elderly person isn't going to know any time soon. The SV IT people probably will. It's not an even distribution of knowledge. ;/ ilaksh wrote 3 days ago: Eventually someone will implement one of these really good recent ones as open source and then it will be on replicate etc. right now the open source ones like SadTalker and Video Retalking are not live and are unconvincing. feyman_r wrote 4 days ago: /s it doesnât have the phrase LLM in the title gavi wrote 4 days ago: The GPU requirements for realtime video generation are very minimal in the grand scheme of things. Assault on reality itself. nojvek wrote 4 days ago: I like the considerations topic. Thereâs likely also a an unsaid statement. This is for us only and weâll be the only ones making money from it with our definition of âsafetyâ and âpositiveâ. mdrzn wrote 4 days ago: Holy shit these are really high quality and basically in realtime on a 4090. What a time to be alive. rbinv wrote 4 days ago: It really is something. 40 FPS on a 4090, damn. acidburnNSA wrote 4 days ago: Oh no. "Cameras on please!" will be replaced by "AI generated faces off please!" in teams. nowhereai wrote 4 days ago: woah. so far not in the news. this is the only article URI [1]: https://www.ai-gen.blog/2024/04/microsoft-vasa-1-ai-technology... DIR <- back to front page