• NeoNachtwaechter@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    2
    ·
    6 months ago

    No surprise, and this is going to happen to everybody who uses neural net models for production. You just don’t know where your data is, and therefore it is unbelievably hard to change data.

    So, if you have legal obligations to know it, or to delete some data, then you are deep in the mud.

    • erv_za@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      6 months ago

      I think of ChatGPT as a “text generator”, similar to how Dall-E is an “image generator”.
      If I were openai, I would post a fictitious person disclaimer at the bottom of the page and hold the user responsible for what the model does. Nobody holds Adobe responsible when someone uses Photoshop.

        • vithigar@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          2
          ·
          6 months ago

          LLMs don’t actually store any of their training data, though. And any data being held in context is easily accessible and can be wiped or edited to remove personal data as necessary.

          • NeoNachtwaechter@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            6 months ago

            LLMs don’t actually store any of their training data,

            Data protection law covers all kinds of data processing.

            For example, input is processing, too. Output is processing, too. Section 4 of the GDPR.

            If you really want to rely on excuses, you would need wayyy better ones.

            • vithigar@lemmy.ca
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              2
              ·
              6 months ago

              Right, so keep personal data out of the training set and use it only in the easily readable and editable context. It’ll still “hallucinate” details about people if you ask it for details about people, but those people are fictitious.