Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data

stopthatgirl7 · 2 years ago

Google Researchers’ Attack Prompts ChatGPT to Reveal Its Training Data

Kbin_space_program · edit-2 2 years ago

That’s a bald faced lie.

and it can produce copyrighted works.
E.g. I can ask it what a Mindflayer is and it gives a verbatim description from copyrighted material.

I can ask Dall-E “Angua Von Uberwald” and it gives a drawing of a blonde female werewolf. Oops, that’s a copyrighted character.

@KingRandomGuy@lemmy.world · 2 years ago

I think what they mean is that ML models generally don’t directly store their training data, but that they instead use it to form a compressed latent space. Some elements of the training data may be perfectly recoverable from the latent space, but most won’t be. It’s not very surprising as a result that you can get it to reproduce copyrighted material word for word.

ayaya · 2 years ago

I think you are confused, how does any of that make what I said a lie?

TimeSquirrel · 2 years ago

I can do that too. It doesn’t mean I directly copied it from the source material. I can draw a crude picture of Mickey Mouse without having a reference in front of me. What’s the difference there?

Flying Squid · 2 years ago

If you have a crude picture of Mickey Mouse and you make money from it, Disney definitely has a chance at going after you.

brianorca · 2 years ago

That’s due to trademark, not copyright.