I was fascinated to read this in the FT yesterday:
The dispute is a pretty straightforward one. The EU wants AI-produced documents to reference their sources so that the copyright of original producers and writers of research is not abused. No AI programs come close to doing so, at least in the way the EU desires.
I admit that copyright is a contentious issue. I have major reservations over a great deal of copyright law, especially about the length of time for which much copyright protection is deemed to exist. I think the balance of legal power about copyright has shifted against the public interest. But that said, it is important that those who write for a living and whose work is used by others do have a right to make a copyright claim if that is appropriate, or at the very least, to be cited if that is required.
Overall, then, I am on the side of the EU in this potential dispute. Just because a machine is plagiarising somebody else's work does not stop the activity from being described as plagiarism, and that fact matters.
Thanks for reading this post.
You can share this post on social media of your choice by clicking these icons:
You can subscribe to this blog's daily email here.
And if you would like to support this blog you can, here:
Our lack of ability in being unable to ascribe what is right or wrong to technology is as big a threat to us as carbon.
Infact – look – if a human being has created the AI, then as far as I am concerned, the creator is the plagiarist. This idea that technology is a ‘neutral’ is a lie.
The god in the machine here on earth, is man. That should stand as a warning.
It’s very obvious how right wing commentators pepper their analyses with assertions, claims etc and shy away from questions when someone asks for a source for that.
Even things where there is little evidence either way, banning takeaways fights obesity, planting trees fights climate, inequality leads to harms and all the rest, say ‘citation please’ and the right winger is gone like the night time sun.
Using Chat GPT, I built (I did little writing per se) a 12 page outline proposal (which included 7 annexes) for large-scale residential re-build (using straw) in Ukraine. It took me 20 minutes. Most of the annexes were lists of institutions (Question to Chat GPT: name the main universities that have undertaken work/research in the are of straw house construction) etc etc). Much of the doc was lists (or bullet points). Did it use other docs? possibly. Passed the whole thing to the Commission/Council. Dead silence, gosh – surely not!
I agree that in principle sources should be cited. Particularly when answering questions such as “Produce an outline proposal for Electricity Market Reform in the EU” – why bother watching comedy on TV when one has docs like this to read – tears of laughter – but yes it would help to have sources – if only to confirm suspicions as to where Chat-GPT obtained its wall-to-wall imbecility. Academe is somewhat concerned – re students and essays & quite rightly so.
Point? A.I. can be very useful to kick-off something (e.g. straw house proposal) and save a lot of time. Cite sources? yeah – because that is useful (particulalry in tech docs). I also agree with your comments about copyright shifting against the public interest.
I am using Chat GPT occassionally
I use Gramarly which appears to be I powered
I think AI useful
Bit it cannot be an excuse to trample rights
Sources are critical. Often the unintended consequences of the law defeats its purposes. This might be said of some copyright law, as you suggest Richard; but what I like about this introduction of law, is that it will only be adequately challenged, in court. AI would therefore have to reveal its sources, and how its works in open court.
At the moment AI is operating under the principles of Permissionless Innovation; and as Big Tech proved with the digital revolution, this allowed them to take over the world, and all our data; for nothing.
You are right that no AI comes close to being able to cite sources, and I suspect that the current way we build them makes that functionally impossible. I have seen two main kinds of “AI” language models. Ones that generate text from a large corpus of scraped data, and those built into search engines. Those that draw from search engines can list their sources easily, because the “knowledge” being manipulated is taking web search results and reformatting them.
But the former, which have massive datasets, I do not think could ever adequately state their sources. Because they are not fact models, they are just language models. My understanding is the only way they can reproduce consistent facts is if they were trained on enough text passages that state a relationship between certain words often enough that it becomes statistically probably to include in output. This is why it could accurately tell you the sky is blue, but if you ask it to plot a route between points or build an obscure recipe it starts generating nonsense invalid output. It would be impossible to list all the sources that say the sky is blue, because there would need to be so many of them for it to ever choose this as consistent output, and it could never list the sources for garbage output generated because there is no one source that ever made those claims.
If we want “AI” to accurately cite sources we will need a completely new approach to how models are built in the first place, because what we have right now simply cannot work that way.
I think this is the biggest risk of AI – not that it substitutes for humans, but that because humans haven’t specifically instructed it what to do those humans try to evade responsibility for the material produced. AI needs to be recognised a essentially a tool, with the person operating that tool taking legal responsibility for what is done with it.
In terms of documents I see a sort of similarity with the advent of word processors. Just because you can easily copy-and-paste in Microsoft Word doesn’t absolve you from the same responsibility for avoiding plagiarism you had when writing longhand in pen and ink. Someone using AI to write for them knows the computer has access to a huge dataset of text to copy-and-paste from, most of which the human won’t have ever read, but that doesn’t stop it being plagiarism.
I see the bigger problem being with AI decision-making, but it is essentially the same issue of defining who takes legal responsibility. We are all familiar with the computer-says-no frustration, but at the moment all the computer is doing is applying a corporately approved stress test to (say) whether the applicant is in the financial position to take on a mortgage without risk of default. The same company could feed an AI computer not with instructions but with records of thousands of previous clients position and whether they ultimately successfully maintained their payments – but the fact the company couldn’t then identify exactly what the stress test parameters were doesn’t stop them from being responsible for applying them.
If I produce art, or music that has been inspired by what I’ve seen in an art gallery, or heard on the radio, then my derived work is considered original, and it not plagiarised. Musicians often produce work in the style of someone else.
My understanding is that AI does the same. It “looks at” originally work, and “understands” how it is constructed, and applies those rules to new original work.#
If the EU is interested in new laws, then there must be money involved. That’s why copyright laws have been extended over the years, and I would guess that the big publishers are concerned. The rest of us are just fodder.
See another comment just made
It also occurs to me that if AI can produce work that can be copyrighted, these artificial systems will be able to create work at a vast rate.
Then if you produce your own original work, a publisher will easily be able to claim that your plagiarised an obscure AI-generated piece, and owe the money. That also does not seem right.
It seems very unlikely that AI can produce copyright work
This is a problem with “patent trolling” – Legal entities, with no intent to do anything truly creative, generate or acquire the rights to loads of really generic patents, then they go after people who may have inadvertently infringed on them. Right now it is driven by humans. I think this problem will only get worse if we let AI generated patents become a thing, as they could start to pre-empt inventions through mass produced generic patents.
The UK government recently led a consultation on this
https://www.gov.uk/government/consultations/artificial-intelligence-and-intellectual-property-call-for-views/government-response-to-call-for-views-on-artificial-intelligence-and-intellectual-property#executive-summary
I think this finding is key – “consensus that AI itself should not own intellectual property rights. But there were different opinions on whether works or inventions created by AI should be protected”.
My personal belief is that because AI often depends on prior work to train and build models, any AI driven creations that don’t involve substantial human interaction shouldn’t be copyrightable nor patentable.
For now I don’t think what we call AI is yet at the stage where it could produce work meaningful enough to be worthy of intellectual property rights, but ultimately it’s up to government what path they choose going forward, and it’s worth keeping an eye on.
Patents are not secured as easily as you imply
I wonder.
Given that ChatGPT doesn’t retain copies of any of the information used to train it (see https://ea.rna.nl/2023/05/11/where-are-gpt-and-friends-going/), it’s hard to think unlicensed copying or plagiarism (two different things) are involved.
ChatGPT makes things up. That’s all.
It’s different than a child claiming that the dog ate it’s homework in only one way. ChatGPT hopes you’ll accept what it says, but whereas the child would *think* the claim sounds plausible, ChatGPT has no concept of plausibility. It has no concept of anything.
This is why it’s controversial — we, as a race, are anthropomorphising a piece of software. Because ChatGPT’s outputs read/sound plausible, we go that one step further and (a) think it has applied a measure of plausibility to what it says, (b) does its best to close the gap between plausibility and correctness, and (c) would suffer some form of shame and seek to correct itself once it realises that the gap is too great for the person reading its responses.
None of that happens. ChatGPT applies *exactly* the same amount of judgement on what it spews out as rock does regarding the ground it has settled on.
The problem we are grappling with is not the manner of how it’s consuming information, but the credibility we give to its responses. If there’s something for the EU to fix, I think it’s that.
This is not because I don’t care about those who are concerned about what ChatGPT will do to their livelihoods but because, if we take a copyright approach to this, the risk of collateral damage is too great and the effect would be negligible (c.f. link taxes — using copyright to prop up the news industry when copying is not what’s causing it to fail.)