Have you ever wondered how accurate AI writing detection tools really are? Recently, I tested several texts, from simple to more complex, to see if they were flagged as AI-generated. The results were surprising, and it raised questions about the effectiveness of AI detection tools (AIDTs) and the extent to which we should rely on them.
🙊 Caveman English OK, Formal English No OK
Something as simple as stating the Pythagorean Theorem is automatically detected as AI-generated.
In order to determine the length of a side in a right-angled triangle, algebraic techniques must be applied to rearrange the variables in such a way that the square of the hypotenuse is expressed as the sum of the squares of the other two sides. By performing these mathematical manipulations, the resulting equation can be solved by taking the square root of both sides, thereby obtaining the length of the desired side. This process comes from the Pythagorean theorem, which states that the square of the hypotenuse is equal to the sum of the squares of the other two sides in a right-angled triangle.
To verify this, I tried to dumb down my language and make it sound casual by using this prompt:
If you want to get a length of a side in a right-angled triangle, rearrange the variables so that the square of the hypotenuse is shown as the sum of the squares of the other two sides. Doing this gives you an equation that can be solved by taking the square root of both sides. After this, you can get the length of the side that you want. This universal fact is called the "Pythagorean theorem". It means that the square of the hypotenuse is equal to the sum of the squares of the remaining two sides.
I literally couldn't make this sound any simpler, so I asked ChatGPT to say what I just said, but in a way that sounds like a "caveman".
After running it through AI, it detected it as ZERO percent AI-generated.
1
📝 Proofreading? No way Jose.
Grammarly, which is a proofreading app capable of changing entire paragraphs, is often detected by AIDTs as using AI-generated text. This is not entirely without its merits. Grammarly itself has a blog post does admit to using AI systems to improve their proofreading capabilities.
There are several posts saying that there are mixed results with using Grammarly, and that using it will yield a "partially AI" detection.
This begs the question, to what extent is it OK to use Microsoft Word's very own proofreader (called Editor), and where is the fine-line of being excessive, to the point that spellchecking is considered as "using AI"?
📜 The US Constitution? The founding fathers were obviously AI.
To make things more interesting, I ran text from random websites, legal documents and even Google Scholar articles. They were all detected as using AI in some shape or form. I understand why "random websites" may have the likelihood, but for more formally written texts?
One thing that surprised me was this Reddit post. Is the US Constitution so perfectly written that an AI detector that is hundreds of years in the future thinks it's still AI?
💡 What we could do for now
For students, do NOT be discouraged by this tool and do not use it as a way to "dodge" your teacher. Own your work, show your work, and if you happen to have a stubborn teacher who wants to make you admit otherwise, do NOT give in just so you can avoid a fight. Escalate to the head of your school if you need to. Use other school documentation like your course syllabus or something your teacher wrote to prove that the website shoots out false positives while agnostic of where the text source was from.
And as for teachers, if your colleagues happened to share something with you that is in no way as "official" as a website like "TurnItIn.com" don't assume a website like GPTZero is telling you the absolute truth. As I have demonstrated how easy it is to get a false positive, I recommend not using AIDTs as a justification to give your students a zero grade. AI-Generated text is currently cutting-edge technology, and to be able to detect it, at least at the time of this post, is considered bleeding-edge technology, making the tool error-prone.