NHacker Next
- new
- past
- show
- ask
- show
- jobs
- submit
login
▲4TB of voice samples just stolen from 40k AI contractors at Mercor (self.__VINEXT_RSC_CHUNKS__=self.__VINEXT_RSC_CHUNKS__||[];self.__VINEXT_RSC_CHUNKS__.push("2:I[\"aadde9aaef29\",[],\"default\",1]\n3:I[\"6e873226e03b\",[],\"Children\",1]\n5:I[\"bc2946a341c8\",[],\"LayoutSegmentProvider\",1]\n6:I[\"6e873226e03b\",[],\"Slot\",1]\n7:I[\"3506b3d116f7\",[],\"ErrorBoundary\",1]\n8:I[\"a9bbde40cf2d\",[],\"default\",1]\n9:I[\"3506b3d116f7\",[],\"NotFoundBoundary\",1]\na:\"$Sreact.suspense\"\n:HL[\"/assets/index-BLEkI_5r.css\",\"style\"]\n")get="_blank">app.oravys.com)
Rendered at 22:29:25 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
Awesome, if you're a victim of an AI company having your voice, you can help yourself by sending another AI company your voice!
> Audio is never used to train commercial models without explicit consent
I'm sure Mercor has explicit consent as well, legal teams are reasonably good at legally covering their asses with license terms.
Now 40k people have learned that biometrics aren't passwords. You can't rotate your voice.
The problem is that even if you know that, you still get bombarded by banking apps promising "biometrics are more secure than passwords, switch now!"
"My voice is my passport. Verify me."
I have to renew my passport every 10 years or so. How do I do that with my voice? I guess it's time to take some vocal lessons.
The fediverse take on that was "customers are advised to rotate their faces and birthdays."
Do you need to calibrate it to be able to repeat it, and does that calibration change if you are at a different altitude and in different conditions, such as humidity?
Does merely changing altitude (or ambient pressure) change voice enough to be considered different by a recognition or synthesizing system?
I guess you don't listen to Sinatra.
Voices aren't strong.
There just aren't that many unique characteristic parameters behind a voice - it's largely dictated by an evolutionary shared shared larynx and vocal tract. They aren't fingerprints.
The fact that human voice impersonation is not only widely possible but popular should give you an indication of this. Prosody, intonation, range, etc. - it's all flexible and can be learned and duplicated.
The signals are simple too, because we have to encode and decode them quickly. You may or may not be able to picture and rotate an apple tree in your head, but you can easily read this sentence in the voice of David Attenborough.
Moreover, you can easily fine tune a voice model to fit any other speaker. You can store the unique speaker embeddings in a very thin layer. Zero and few shot unseen sampling can even come close to full reproduction. You can measure this all quantitatively.
Voices are not, and never have been, fingerprints. They're just not that unique.
In the idealized world, the legal system is meant to provide an accessible alternative to violence for reconciling disputes, but it's increasingly wielded as an impossibly kafkaesque system meant to maintain corporate power over individuals.
I think "CYA" is an overly-flowery term for the reality that they're blocking every avenue for legal recourse, while a variety of other avenues still exist for which adding friction requires the maintenance of expensive and ongoing costs (owning multiple residences, hiring security, etc.)
(To be clear, I am advocating for a more accessible and level legal system, not for UHC-style violence.)
Ah, I see. So, when discussing ways to ensure cuatomers cannot utilize our warranty process, I'll make sure to do so in ways that are not traceable and won't show up in discovery.
The bigger the company, the more speculation there is about stuff people don't actually understand.
Back when the relevant laws were written, most communications was oral and in-person, writing was reserved for the "important stuff". We now apply the laws that were designed for memos to messages on Slack, which are a lot like conversations than permanent documents.
In other cases I have heard people who ought to know better speculating about “what if” they didn’t have to follow the letter of some corporate policy that was rooted in risk avoidance. Again, it looks bad but it doesn’t mean anything concrete (except that the person might have iffy judgment).
Heard that first from a US mil commander who once ran for a minor political office like state rep.
This is an overly flowery way of saying: violence.
The worst of the consequences are the same. People end up dead, destitute, and/or with long-term health consequences and are unable to enjoy the fruits labor in the worst cases. In the milder cases i think i'd prefer a bruise for a week to a huge financial loss.
A lot of people were basically wiretapping themselves AND their businesses!
While a lot of Mercor "contractors" claim Mercor over-reached with data gathering via Insightful, it's kind of smart because people are too afraid to complain too much knowing they'll not only lose their primary job, but also open themselves up to uncapped liability for willful misconduct.
[0] https://www.wsj.com/tech/ai/mercor-ai-startup-personal-data-...
Selling the solution to the problem you caused ought to be illegal.
Most tech solutions are built on the problems they created. This includes phones, cars, computers, every software upgrade, and almost every electronic gadget. You are forced to use them because the world around you is no longer compatible with the way of life that was before the introduction of these tech.
See https://en.wikipedia.org/wiki/General_Motors_streetcar_consp...
Similarly, phones are required now for some activities, like online banking. First it was an option, then it became the norm.
Court records are public in the US. If creditors want to know if you’ve been in financial trouble, they should check for bankruptcies and lawsuits, not the extrajudicial version of those that the credit reporting companies run based on hearsay.
The good thing about the grift economy is it grifts itself, like the turtles!
Mercer hasn't released many public statements over the incident. Social media posts aren't necessarily public; but I did find this breach notification sample filed with CA - https://oag.ca.gov/ecrime/databreach/reports/sb24-621099 . I guess we'll see if our legislators finally take data privacy seriously.
Mercor has definitely released statements with boilerplate "investigations are underway."
I don’t even use biometrics on apple devices, I use a 6 digit pin.
It was always a stupid idea.
The thing about been willing to trade convenience for security is you get called paranoid and then when the other shoe does drop and you are still doing that you still get called paranoid for the current thing you are not doing that “everyone does”.
Germans (because of course) have a word for this: "Datensparsamkeit". Being frugal with your data.
I don't know if it's the reason you imply. In the 70s, there were big debates in Germany about privacy and data storage. They spoke of one's data shadow (Datenschatten). I suspect this word comes from that tradition. The reason the word exists would then be the reflection (Verwaltigung) on WW2.
But yes. We Americans know Germans more for their silly big words. But statements like that can be misinterpreted as the German perspective of themselves doesn't quite match the American stereotypes.
- we learned the hard way that data will be used to kill people, during the Nazi regime
- we learned it again in the GDR with the Stasi being a little less obvious but still ruining people's livelihoods
- and German comes up with compound words for such things
In the US of course the government buys this sort of information legally from corporations.
There is also the rather famous example of how earlier census data was used in the 40’s.
Once the government has your data, they have it. The next generation of representatives may not follow all the same rules and norms
So yeah, of course they've developed that type of distrust. Americans should have also after the 50-60s paranoia of red scare, black people etc. Instead they just spend a few decades building a anti-social state.
Who doesn’t want that old post going extinct forever when they were shit faced outside of a bar in Nashville but now they are in their mid-life and are “respectable” members of society.
Nowadays you just throw all the data into a black box and believe whatever it says blindly.
Or did you mean the "big data" crowd which thought 500GB was noteworthy? I don't think anyone took those serious, neither in 2010s nor now. That was always "small" data
500GB is in the "fits" category.
We do?
https://www.tomshardware.com/pc-components/ssds/kioxia-unvei...
Or 250 of these ~$400 4tb flash drives and an insane number of dongles to connect them all:
https://www.slashgear.com/1847725/largest-usb-thumb-drive-hi...
Related "monetizing user data" seems to just mean ads. Ads on everything, forever, until the userbase gets fed up and moves to a new service that definitely won't do that, and the cycle repeats about every 3 years.
I see this whenever an LLM’s impact is assessed. We know. The issue is scale and the ability for smaller and smaller groups (down to individuals) to execute at scale.
Fake news always existed. Now one dude in India can flood multiple sock puppet media accounts with right wing content/images (actual example) at a scale previously unimaginable.
My concern is that I can open up chatGPT and even with a free, “anonymous” account run an assembly line generating tens of thousands of words a day to pump to Twitter that are good enough to prop up multiple fake accounts and cause mayhem.
Now make it thousands of people like me doing it. Now add funding and political orgs. Add company leadership that turns a blind eye so long as it drives engagement. This scale and pipeline wasn’t possible 5 years ago, even if we clearly see the throughline.
I’m not even getting into fake images either. That used to require some know how. There are basically no hurdles and even if most people learn it’s fake, millions likely won’t. If you’re a little lucky, less scrupulous “news” outlets will amplify it for you as well for free.
I have the faintest possible hope that such things are going to be the death knell of social media. Yeah a lot of credulous idiots are happily giving AI thirst traps their money for stroking their confirmation bias, but that's just who's left at this point. It feels like every social media app I use is gradually bleeding users who aren't hopelessly addicted to the dopamine treadmill, because what's left is just plain unappealing to them, which selects for the people who are most vulnerable to AI shit, which is far from ideal, but also means those platforms are comprised ever more of that vulnerable population and nobody else. And the problem with all these businesses going through that is without a diverse, growing audience, you just become InfoWars, slinging the same slop to the same people every day, and every ounce of said slop is great for what's left of your audience, but absolute garbage for getting anyone new in it. And it just goes on that way until you sputter out and die (or harass the wrong group of parents I guess).
I wish all social media sites a very haha die in a fire.
No media uploading, memes are few and far between (usually punished), etc.
[0] https://www.opensourceshakespeare.org/views/plays/play_view....
[1] https://www.etymonline.com/word/data
Except no company is learning this lesson.
The enterprise threat model includes "our own users", and the modus operandi is to maintain as much information on that threat as possible.
I jest but the majority of the "normal" people I know are happy to hand over biometrics because _it's easier_. We need to start branding biometrics as "forever passwords" or something to help people understand just what they're handing over when they validate access to their checking account or enter Disney World or whatever else.
Fingerprints, DNA, iris scans, gait patterns, etc. are all something you can't change (much like a permanent account ID) and are constantly being presented to the world (much like an email address). In addition under US law, police can compel presentation of fingerprints, but passwords are protected under the 5th amendment.
in a certain light, it's kind of admirable. they live like the world is the way it should be
So I could easily see a lot of people viewing this as a positive.
Them being forever passwords is the value prop. The risk scene has changed, but that was essentially always the pitch.
I could think of quite a few things. I know that my bank and brokerage use voice ID.
I mean, just look at what happened to Crowdstrike....
Even have a nice UI on top.
https://voicebox.sh/
https://commonvoice.mozilla.org/en/languages
good luck with this. most finance people deal with hundreds to thousands of clients. they obviously cant remember everyones code word. commonly used finance systems arent setup to securely store these codewords. they dont have processes or policies in place to implement or adhere to any sort of codeword verification.
>Rotate where voiceprints are still in use. [...] Do that now, ideally from a new recording in a different acoustic environment than the leaked sample.
would this even have an effect? i have never heard of "rotating" a voice print. isnt the whole point of a voice print that you cant really change it? if simply switching your environment completely changes your voice print, that would make voice prints utterly useless to begin with.
The other use cases (like calling payroll, etc) likely don’t have the same protections and probably would be more effective.
there are automated systems for this already. my bank, isp, etc. use them when you call in to skip the traditional verification steps. this fact is also highlighted in the article.
the problem is that there isnt typically a system in place for setting up or validating code words, so the advice given is not practical to implement.
I love that the answer here is basically.. - you don't -
But maybe mitigate at unreasonable personal costs.
How about services simply stop taking public information as proof of identity?
The deeper problem is that most of these companies collected this data because they could, not because they needed it for the core service. 'Datensparsamkeit' is the right frame: the voice samples were a liability sitting on a server waiting for exactly this.
Half the time I call a company they say “we are recording your voice for security / authentication purposes”.
The companies that do that have all the information on me that they require for me to set up an account, so their data breaches will be just like this one, but 1000x larger.
Can we just fast forward through the part where this works for ID theft, past the firefox age verification plugin that uses these datasets, and even through the part where people in the plugin dataset are digital outcasts (this voice has been used too many times. Want to try another?)
At the end of this dark predictable tunnel, maybe there will be a ban on biometrics for important stuff, a repeal of the age verification laws, and actual privacy legislation with teeth.
I've had to open a bank account for a company here a few years ago and that was right on the bubble of this happening and they still had an option to come by in person with the proper documentation, which I did, now it is all outsourced.
These companies are the fattest targets and they're run by incompetents. You should assume that anything you give them will eventually be part of some hack.
The scarier piece is that an attacker pulls a contractor from the dump, finds their employer on linkedin, then calls that companys IT helpdesk for a password reset with the cloned voice.
Fwiw we put up a free realtime face swap demo a while back at https://www.callstrike.ai/deepfake-security-training .. worth a look if you want to actually feel how trivial this has gotten.
Can't wait for them to crash and burn.
:)