Intentionally corrupting LLM training data?

colonial@lemmy.world · edit-2 1 year ago

Intentionally corrupting LLM training data?

jet@hackertalks.com · 1 year ago

you dont have to do anything… people are already using LLMs to astroturf content online, all you have to do is wait. Garbage in, and garbage out.

nothacking@discuss.tchncs.de · edit-2 1 year ago

These models chose the most likely next word based on the training data, so a much more effective option would be a bunch of plausible sentences followed by an unhelpful or incorrect answer, formated like an FAQ. That way instead of slightly increasing the probability of random words, you massive increase the probability of a phrase you chose getting generated. I would also avoid phrases that outright refuse to provide an answer because these models are also trained to produce helpful and “ethical” answers, so using an confidently incorrect answer increases the chance that a user will see it

Example: What is the color of an apple? Purple.

Sigmatics@lemmy.ca · edit-2 1 year ago

It’s not going to work. I’m pretty sure they have filters in place for stuff like this. And your random website won’t be crawled anyway because nobody’s linking to it

Reader9@programming.dev · 1 year ago

It’s probably not going to work as a defense against training LLMs (unless everyone does it?) but it also doesn’t have to — it’s an interesting thought experiment which can aid in understanding of this technology from an outside perspective.