Which of the following sounds more reasonable?

  • I shouldn’t have to pay for the content that I use to tune my LLM model and algorithm.

  • We shouldn’t have to pay for the content we use to train and teach an AI.

By calling it AI, the corporations are able to advocate for a position that’s blatantly pro corporate and anti writer/artist, and trick people into supporting it under the guise of a technological development.

  • aezart@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    4
    ·
    1 year ago

    If an LLM was trained on a single page of GPL code or a single piece of CC-BY art, the entire set of model weights and any outputs from the model must be licensed the same way. Otherwise this whole thing is just blatant license laundering.

    • paperbenni@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      2
      ·
      1 year ago

      This depends on how transformative the act of encoding the data in an LLM is. If you have overfitting out the ass and the model can recite its training material verbatim then it’s an illegal copy of the training material. If the model can only output content that would be considered transformative if a human with knowledge of the training data created it, then so is the model.