Which of the following sounds more reasonable?

  • I shouldn’t have to pay for the content that I use to tune my LLM model and algorithm.

  • We shouldn’t have to pay for the content we use to train and teach an AI.

By calling it AI, the corporations are able to advocate for a position that’s blatantly pro corporate and anti writer/artist, and trick people into supporting it under the guise of a technological development.

  • Iceblade@lemmy.world
    link
    fedilink
    English
    arrow-up
    17
    ·
    1 year ago

    IMO content created by either AI or LLMs should have a special license and be considered AI public domain (unless they can prove that they own all content the AI was trained on). Commercial content made based on content marked with this license would be subject to a flat % tax that should be applied to the product price which would be earmarked for a fund distributing to human creators (coders, writers, musicians etc.).

    • Trainguyrom@reddthat.com
      link
      fedilink
      English
      arrow-up
      9
      ·
      1 year ago

      I think the cleaner (and most likely) outcome is AI generated work is considered public domain, and since public domain content can already be edited and combined and arranged to create copyrighted content this would largely clear up the path for creators to use AI more prominently in their workflows

      • Iceblade@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Honestly, I’d personally prefer the latter, but there is the argument made by artists, coders and content creators. Their work is being scraped to train these AI’s, which in turn makes their future work less valuable. Hence, the thought of enforcing a tiny “royalty”/tax on commercial products based off of AI generated content and funneling that money back to human creators of intellectual works.

      • makingStuffForFun@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        1 year ago

        So I can make derivative works from commercial works, make something from that material, then release the result as public domain? I would think not.

    • kklusz@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      1 year ago

      What about LLM generated content that was then edited by a human? Surely authors shouldn’t lose copyright over an entire book just because they enlisted the help of LLMs for the first draft.

      • Cethin@lemmy.zip
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        1 year ago

        If you take open source code using GNU GPL and modify it, it retains the GNU GPL license. It’s like saying it’s fine to take a book and just change some words and it’s totally not plagerism.

  • pensivepangolin@lemmy.world
    link
    fedilink
    English
    arrow-up
    16
    ·
    1 year ago

    I think it’s the same reason the CEO’s of these corporations are clamoring about their own products being doomsday devices: it gives them massive power over crafting regulatory policy, thus letting them make sure it’s favorable to their business interests.

    Even more frustrating when you realize, and feel free to correct me if I’m wrong, these new “AI” programs and LLMs aren’t really novel in terms of theoretical approach: the real revolution is the amount of computing power and data to throw at them.

    • assassin_aragorn@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      15
      ·
      1 year ago

      The funniest thing I’ve seen on this is the ChatGPT CEO, Altman, talking about how he’s a bit afraid of what they’ve created and how it needs limitations – and then when the EU begins to look at regulations, he immediately rejects the concept, to the point of threatening to leave the European market. It’s incredibly transparent what they’re doing.

      Unfortunately I don’t know enough about the technology to say if the algorithms and concepts themselves are novel, but without a doubt they couldn’t exist without modern computing power capabilities.

      • Peruvian_Skies@kbin.social
        link
        fedilink
        arrow-up
        6
        arrow-down
        1
        ·
        1 year ago

        The concepts themselves are some 30 years old, but storage capacity and processing speed have only recently reached a point where generative AI outperforms competing solutions.

        But regarding the regulation thing, I don’t know what was said or proposed, and this is just me playing devil’s advocate: but could it be that the CEO simply doesn’t agree with the specifics of the proposed regulations while still believing that some other, different kind of regulation should exist?

        • rainh@kbin.social
          link
          fedilink
          arrow-up
          8
          ·
          1 year ago

          Certainly could be, but probably an optimistic take. Most likely they’re just trying to do what corporations have been doing for ages, which is to weaponize government policy to prevent competition. They don’t want restrictions that will materially impact their product, they want restrictions that will materially impact startups to make it more difficult for them to intrude on the established space.

          • jumperalex@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            1 year ago

            I think if you fed your response into ChatGPT and asked it to summarize in two words it would return,

            “Regulatory Capture”

    • ywein@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      LLMs are pretty novel. They are made possible by invention of the Transformer model, that operates significantly different compared to, say, RNN.

    • Phantom_Engineer@lemmy.ml
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      The fear mongering is pretty ridiculous.

      “AI could DESTROY HUMANITY. It’s like the ATOMIC BOMB! Look at it’s RAW POWER!”

      AI generates an image of cats playing canasta.

      “By God…”

    • assassinatedbyCIA@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      It also plays into the hype cycle they’re trying to create. Saying you’ve made an AI is more likely to capture the attention of the masses then saying you have a LLM. Ditto that point for the existential doomerism that they ceo’s have. Saying your tech is so powerful that it might lead to humanity’s extinction does wonders in building hype.

      • pensivepangolin@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Agreed. And all you really need to do is browse any of the headlines from even respectable news outlets to see how well it’s working. It’s just article after article uncritically parroting whatever claims these CEO’s make at face value at least 50% of the time. It’s mind-numbing.

    • RossoErcole@kbin.social
      link
      fedilink
      arrow-up
      0
      arrow-down
      1
      ·
      1 year ago

      We could say that the human brain isn’t novel in terms of biological composition: the real evolution is the size increase compared to the body.

      The fact that insects exist doesn’t make us less intelligent.

      But I agree with the sentiment of the argument.

  • aezart@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    4
    ·
    1 year ago

    If an LLM was trained on a single page of GPL code or a single piece of CC-BY art, the entire set of model weights and any outputs from the model must be licensed the same way. Otherwise this whole thing is just blatant license laundering.

    • paperbenni@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      2
      ·
      1 year ago

      This depends on how transformative the act of encoding the data in an LLM is. If you have overfitting out the ass and the model can recite its training material verbatim then it’s an illegal copy of the training material. If the model can only output content that would be considered transformative if a human with knowledge of the training data created it, then so is the model.

  • itsnotlupus@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    1 year ago

    I’ll note that there are plenty of models out there that aren’t LLMs and that are also being trained on large datasets gathered from public sources.

    Image generation models, music generation models, etc.
    Heck, it doesn’t even need to be about generation. Music recognition and image recognition models can also be trained on the same sort of datasets, and arguably come with similar IP right questions.

    It’s definitely a broader topic than just LLMs, and attempting to enumerate exhaustively the flavors of AIs/models/whatever that should be part of this discussion is fairly futile given the fast evolving nature of the field.

  • Chocrates@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    1 year ago

    both sound the same to me IMO. Private companies scraping ostensibly public data to sell it. No matter how you word it they are trying to monetize stuff that is out in the open.

    • Dran@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      edit-2
      1 year ago

      I don’t see why a single human should be able to profit off learning from others but a group of humans doing it for a company cannot. This is just how humanity advances at whatever scale.

      • Chocrates@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        1 year ago

        I had a comment about the morality of it at first but I pulled it out. This is not an easy question to answer. Corporations gate keeping knowledge seems weird and dystopian but the knowledge is out there and they are just making connections between it. It also touches on copyright and fair use.

        • Dran@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          I agree it’s much more complicated an issue than most people give it credit.

  • Zeth0s@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    1 year ago

    That’s absolutely not correct. AI is a field of computer science/scientific computing built on the idea that some capabilities of biological intelligences could be simulated or even reproduced “in silicon”, i.e. by using computers.

    Nowadays is an extremely broad term that covers a lot of computational methodologies. LLM in particular are a evolution of methods born to simulate and act as human neural network. Nowadays they work very differently, but they still provide great insights on how an “artificial” intellicenge can be built. It is only one small corner of what will be a real general artificial intelligence, and a small step in that direction.

    AI as a name is absolutely unrelated with how programs based on the methodologies are built.

    Human intelligences are in charge of all copyright part. AI and copyright are orthogonal, people are those who cannot tell the 2 and keep talking about AI.

    There is AI, and there is copyright, it is time for all of us to properly frame the discussion on “copyright discussion related to <company>'s product”

    • assassin_aragorn@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      1 year ago

      What I’m getting at moreso is that comparisons to humans for purposes of copyright law (e.g. likening it to students learning in school or reading library books) don’t hold water just because it’s called an AI. I don’t see that as an actual defence for these companies, and it seems to be somewhat prevalent.

      • Zeth0s@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        1 year ago

        You can absolutely compare AI with students. The problem is that, legally, in many western countries students still have to pay copyright holders of the books they use to learn.

        It is purely a copyright discussion. How far copyright applies? Shall the law distinguish between human learning and machine learning? Can we retroactively change copyright of material available online?

        For instance, copilot is more at risk than a LLM that learned from 4chan, because licenses are clearer there. Problem is that we have no idea on which data big llms were trained, to know if some copyright law already applies.

        At the end it is just a legal dispute on companies making money out of AI trained on data publicly available (but not necessarily copyright free).

        • assassin_aragorn@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          My argument is that an LLM here is reading the content for different reasons than a student would. The LLM uses it to generate text and answer user queries, for cash. The student uses it to learn their field of study, and then apply it to make money. The difference is that the student internalizes the concepts, while the LLM internalizes the text. If you used a different book that covered the same content, the LLM would generate different output, but the student would learn the same thing.

          I know it’s splitting hairs, but I think it’s an important point to consider.

          My take is that an LLM algorithm can’t freely consume any copyrighted work, even if it’s been reproduced online with the consent of the author. The company would need the permission of the author for the express purpose of training the AI. If there’s a copyright, it should apply.

          You have me thinking though about the student comparison. College students pay to attend lectures on material that can be found online or in their textbooks. Wouldn’t paying for any copyright material be analogous to this?

          • Zeth0s@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            edit-2
            1 year ago

            Students and LLM do the same with data, simply in a different way. LLM can learn more data, student can understand more concepts, logic and context.

            And students study to make money.

            Both LLMs and students map the data in some internal representation, that is however pretty different, because a biological mind is different from an AI.

            Regarding your last paragraph, this is exactly the point. What shall openai and Microsoft pay, as they are making a lot of money out of other people work? Currently it is unclear as openai hasn’t released what data they used, and because copyright laws do not cover generative AI. We need to wait for interpretation of existing laws and for new ones. But it will change soon in the future for sure

  • QHC@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    I think you are likely right, but it’s more general than just about training costs. The term “AI” carries a ton of baggage, both good and bad.

    To some extent, I think we also keep pushing back the boundary of what we consider “intelligence” as we learn more and better understand what we’re creating. I wonder if every future tech generation will continue this cycle until/unless humanity actually does create a general artificial intelligence–every iteration getting slightly closer but still falling short of “true” AI, then being looked at as a disappointment and not worthy of the term anymore. Rinse and repeat.

  • Dodecahedron December@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    In fairness, AI is a buzzword that came out well before LLMs. It’s used to mean “tHe cOmpUtER cAn tHink!”. We play against “AI” in games all the time, but they arent AI as we know it today.

    ML (machine learning) is a more accurate descriptor but blah doesn’t have the same pizzazz as AI does.

    The larger issue is that innovation is sometimes done for innovation’s sake. Profits gets mixed up there and a board has to show profits to shareholders and then you get VCs trying to “productize” and monetize everything.

    What’s more is there are only a handful of players in the AI space, but because they are giving API access to other companies, those companies are building more and more sketchy uses of that tech.

    It wouldn’t be a huge deal if LLMs trained on copywritten material and then gave the service away for free. As it stands, some LLMs are churning out work that could be protected under copywrite law by humans (AI work can’t be copywritten under US law), and turning a profit.

    I don’t think “it was AI” will hold up in court though. May need to do some more innovation.

    Also there are some LLMs being trained on public domain info, to avoid copywrite problems. But works go into the public domain after 70 years past the copywrite holder’s death (disney being the biggest extender of that rule), so your AI will be a tad out dated in it’s “knowledge”.

  • BURN@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    AI has been a blanket term for Machine Learning, LLMs, Decision Trees and every other form of “intelligence”.

    Unfortunately I think that genie is out of the bottle and it’s never going back in.

    • stewsters@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      And it has been the technical term used in academia since the 1950’s. If anyone is surprised by this usage then they have not studied it, only watched movies.

  • bioemerl@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    Both of those statements are reasonable. You shouldn’t have to pay to utilize anything you scrape from the internet, so long as you don’t violate copyright by redistributing it

  • DrQuint@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    If we’re unmasking tech, LLM’s right now are also all just Computer Vision models with a lot of more abstraction layers thrown at them. Nothing but fit assessment machine with a ludicrous amount of extra steps.

    I am convinced this is all pedantry, and these models are going to become the de facto basis for true AI at some point. It was already weird enough that this type of tech got discovered from the goal of checking if an image has a cat or not.

    • Geek_King@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      1 year ago

      Automated Teller Machine Machine, Personal Identification Number Number, Network Interface Card Card

      This has been a problem for as long as acronyms have existed (and yes it bothers me too).

  • Greenskye@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    We shouldn’t have to pay for the content we use to train and teach an AI

    Wait people think that sounds reasonable?

  • lolpostslol@kbin.social
    link
    fedilink
    arrow-up
    0
    arrow-down
    1
    ·
    1 year ago

    It’s just a happy coincidence for them, they call it AI because calling it “a search engine that steals stuff instead of linking to it and blends different sources together to look smarter” wouldn’t be as interesting to clueless financial markets people