• givesomefucks@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    1 year ago

    In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

    Both filings make a broader case against AI, claiming that by definition, the models are a risk to the Copyright Act because they are trained on huge datasets that contain potentially copyrighted information

    They’ve got a point.

    If you ask AI to summarize something, it needs to know what it’s summarizing. Reading other summaries might be legal, but then why not just read those summaries first?

    If the AI “reads” the work first, then it would have needed to pay for it. And how do you deal with that? Is a chatbot treated like one user? Or does it need to pay for a copy for each human that asks for a summary?

    I think if they’d have paid for a single ebbok Library subscription they’d be fine. However the article says they used pirate libraries so it could read anything on the fly.

    Pointing an AI at pirated media is going to be hard to defend in court. And a class action full of authors and celebrities isn’t going to be a cakewalk. They’ve got a lot of money to fight, and have lots of contacts for copyright laws. I’m sure all the publishers are pissed too.

    Everyone is going after AI money these days, this seems like the rare case where it’s justified

    • limeaide@lemmy.ml
      link
      fedilink
      arrow-up
      0
      ·
      1 year ago

      Can the sources where ChatGPT got it’s information from be traced? What if it got the information from other summaries?

      I think the hardest thing for these companies will be validating the information their AI is using. I can see an encyclopedia-like industry popping up over the next couple years.

      Btw I know very little about this topic but I find it fascinating

      • rainroar@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Yes! They publish the data sources and where they got everything from. Diffusers (stable diffusion/midjoirny etc) and GPT both use tons of data that was taken in ways that likely violate that data’s usage agreement.

        Imo they deserve whatever lawsuits they have coming.

        • radarsat1@lemmy.ml
          link
          fedilink
          arrow-up
          0
          ·
          1 year ago

          likely violate that data’s usage agreement.

          It doesn’t seem to be too common for books to include specific clauses or EULAs that prohibit their use as data in machine learning systems. I’m curious if there are really any aspects that cover this without it being explicitly mentioned. I guess we’ll find out.