A Guide to AI Detectors

Generative AI has genuinely changed the game of education. It is creating a bifurcation of educational tools: the good and the bad. On the one hand, students are getting access to tools that help guide them through their learning journey. Tools that help assess them when they are doing their work and offer helpful nudges when needed. Students are gaining access to digital personal assistants, tutors, and resources that are laser focused on each student's learning needs. They quickly calculate and generate learning paths, prepare learning materials, and assess learning so that the student can quickly master the class objectives. On the other hand, though, students are dipping their toes…well some just dive right in.. to the turbulent world of using AI with a kaleidoscope of intent. Some wish to skirt the system and have the computer author their essay. Others want to have their original work checked for accuracy or grammatical structure. And there is a spectrum of AI uses in between.

As educators how do we resolve the conflict between original academic work and flagrant or sometimes masked use of AI for cheating?

Enter the world of AI detectors. Let's dive in.

What is an AI detector and how does it work?

An AI detector is a tool that uses machine learning algorithms to determine the source of a text. Its purpose is to identify whether a piece of content was generated by an AI or by a human. Let’s explore how these detectors work:

Language Models and Predictability:
- AI detectors are usually based on language models similar to those used in AI writing tools.
- These models ask a fundamental question: “Is this the sort of thing that I would have written?”
- If the answer is “yes,” the detector concludes that the text is probably AI-generated.
- Specifically, the models look for two key factors in a text: perplexity and burstiness.
Perplexity:
- Perplexity measures how unpredictable a text is. It gauges how likely it is to confuse the average reader or read unnaturally.
- AI language models aim for low perplexity, resulting in texts that make sense and read smoothly but are also more predictable.
- Human writing tends to have higher perplexity due to creative language choices and occasional typos.
Burstiness:
- Burstiness refers to unusual patterns in a text. AI-generated content can exhibit overly consistent or oddly structured language.
- Detectors capitalize on these disparities, using them as telltale signs of non-human intervention.
Reliability and Challenges:
- AI detectors are still experimental and considered somewhat unreliable.
- They face challenges such as false positives/negatives, dependence on training data, and vulnerability to adversarial attacks.
Applications:
- Educators use AI detectors to verify student work authenticity.
- Moderators employ them to remove fake product reviews and spam content.

Remember, while AI detectors play a crucial role, they’re not infallible. Their reliability continues to evolve as research and development progress1 2 3 4 5.

Source: "Understanding AI Detectors: Functionality and Limitations." Conversation with Microsoft Copilot. [March 26, 2024]. Original content provided by Microsoft Copilot during a chat session.

So now that we know a little about how AI detectors work, let's ask if they are accurate. Let's ask the expert.

From OpenAI:

Do AI detectors work?

In short, no, not in our experience. Our research into detectors didn't show them to be reliable enough given that educators could be making judgments about students with potentially lasting consequences. While other developers have released detection tools, we cannot comment on their utility.
Additionally, ChatGPT has no “knowledge” of what content could be AI-generated. It will sometimes make up responses to questions like “did you write this [essay]?” or “could this have been written by AI?” These responses are random and have no basis in fact.
To elaborate on our research into the shortcomings of detectors, one of our key findings was that these tools sometimes suggest that human-written content was generated by AI.
- When we at OpenAI tried to train an AI-generated content detector, we found that it labeled human-written text like Shakespeare and the Declaration of Independence as AI-generated.
- There were also indications that it could disproportionately impact students who had learned or were learning English as a second language and students [emphasis added] whose writing was particularly formulaic or concise.
- Even if these tools could accurately identify AI-generated content (which they cannot yet), students can make small edits to evade detection.

Source: OpenAI

In diving into the research, I have found a few things out about AI detectors that we should all know. First, Declaration of Independence was flagged as AI-generated! Where are my conspiracy theorists? Ancient aliens...anyone? ;)

Ok, all joking aside, I'll keep this concise and let you dive into the resources for additional detail. Or, you can take me to a coffee shop and chat my ear off...I'm motivated by caffeine.

Accuracy

There are so many ways to measure accuracy. Let me summarize some that I've found useful.

Turnitin is the most accurate AI Detector [2], at about 77% accurate at discovering AI content. [1]
Turnitin's false positive rate depends on the study. Turnitin states that they falsely label human text as AI generated "less than 1% of the time" [3]. However, additional internal Turnitin studies found it to be more like 1.3-1.4%
Turitin's accuracy decreases in a mixed human/AI content or disguised content presentation and tested at about 63% accuracy [1]
For online tools, “Tests completed, by van Oijen (2023), showed that the overall accuracy of tools in detecting AI-generated text reached only 27.9%, and the best tool achieved a maximum of 50% accuracy, while the tools reached an accuracy of almost 83% in detecting human-written content. The author concluded that detection tools for AI-generated text are "no better than random classifiers" (van Oijen 2023).”[2]. Please don't use GPTZero. [me]
Turnitin does not check for AI in submissions under 300 words because of the “slightly higher-than-comfortable false positive rate on short submissions (fewer than 300 words).” [3] AI detectors are not a good solution for discussion board posts. Also, papers must not exceed 15,000 words.
According to Turnitin "In order to maintain this low rate of 1% for false positives, there is a chance that we might miss 15% of AI written text in a document. We're comfortable with that since we do not want to incorrectly highlight human-written text as AI-written. For example, if we identify that 50% of a document is likely written by an AI tool, it could contain as much as 65% AI writing." [5] This seems arbitrary to me. Why 15%?

Bias

Cornell University did find that GPT detectors are biased against non-native English writers. [4]

Other things

Testing of Detection Tools for AI-Generated Text, from Cornell University, is worth a read.
A positive approach to academic integrity is worth a read.

The use of AI detectors can very likely fall under Fair Use when we use it for student work. In fact, Turnitin did face a lawsuit regarding the unauthorized use and storage of student work. The court upheld this practice as fair use. [6] It's still an icky business model. And, I bet they have used those tens of thousands of papers to train their AI model. This is just a guess and my opinion only, though they mentioned training their AI model on 80 thousand academic papers. Hmmmm. I wonder where they got those!

However, I would caution anyone from putting non-student copy written data into a third-party AI detector without consent. That could lead to question the intent or your motivations for doing so, which could lead to a harassment claim.

Conclusion and some friendly advice

AI detectors are actually really good at finding AI content. However, they are often wrong. For example, let’s say you have 5 classes in which you have 2 assignments that are checked for AI. In those 5 classes, you have an average of 30 students. So, you have 150 total students submitting 2 papers each, for a total of 300 papers. If you check all of those papers via Turnitin AI Detection, about 1.4% of them will be flagged as AI, when, in fact, the student did not use AI for that flagged section. Statistically you can expect that 4 times you will have AI flagged content is actually original content (300x0.014=4.2). So please be careful with those partial AI detected papers. They may be legitimate.

Remember that mixed human AI content is the hardest to identify correctly. And, know that there are AI tools available to make AI content less detectable. This leads to an equity disparity from those who can afford these tools and those who cannot.

Consider having an AI policy in your syllabus. Please talk to your students about the importance of their learning. Remind them why they are doing what they are doing and how it will help them in the future.

Create an environment of trust. People do not learn well when they are being judged, looked down upon, or when they feel untrusted. When you find that AI has been likely used, please have a conversation with your students about it. This is a teachable moment. At the Community College level, we do not have the luxury of presorted successful students. We have quite a few students who never thought they would go to college and feel unworthy of it. Studies show that if students fail their first college class, they are unlikely to continue. I would argue that Community College faculty have an additional obligation to teach our students several things about life and academia.

Talk to them. Ask them why they used AI. If they say they haven't, ask them about their paper. When they have a hard time describing what they wrote, talk about the importance of learning.
Let them know they are worth our time and investment in them.
They are expected to complete our learning assessments and adhere to the rules of our classroom. The rules are guide rails, intended to help them succeed. They ought not be punishments (in my opinion).
If they make a mistake, consider giving them the opportunity to learn from that mistake and try again. Failure might be what they need, sure. We all learn by failing, evaluating our failure, finding an alternate solution, and trying again. If we don't have a trusted environment that celebrates this type of failure, the learning is superficial. If you're having a hard time with this, consider attending @One's amazing Equitable Grading Strategies professional development class.
Hold them to a high standard. Offer them a hand when they have fallen.
Focus on Learning Objective mastery.
Assign authentic assessments and metacognitive assessments, like peer reviews. Some of our colleagues have even had greater success using AI in their assignments while having students review that content, evaluate it, edit it, find the bias, chase down resources, etc. In an online class, you can even have students, one week, do a video of a life experience they had that is related to your topic or just aimed at getting to know them. The next week, you could have them incorporate that life experience into a fellow student's experience. It's more work for an AI to do that than it's worth.

We all have our own threshold for when a student needs to be referred to the Student Conduct office. However, AI detectors should not be the sole reason for your assertion of academic dishonesty. Many students have been falsely accused of using AI when, in fact, they were able to prove that they had not used it.

AI tools are getting better and better. AI takes whatever data you give it. ChatGPT is trained on the entire internet. But you could take, for example, only your previous documents and train a special AI on your personal writing style. Once you've done that, the statistical probabilities that AI detectors look for will be different. This may lead to less detectable content. And, anyone can pull the writing style of authors or bloggers, freely, off of the internet and train an AI on those.

There is good news! AI tools are being created to sherpa students through writing assignments, while creating receipts along the way. They will help with time management, structure, brainstorming, grammar, finding sources, citing sources, and much more.

Check out what LAUSD is doing with their new AI tool, which provides "resources, support, and tools to fast-track student achievement beyond the school day."

Also check out how the Khan Academy is using AI for a virtual tutor.

Please let me know your thoughts and strategies for detecting AI! Consider checking out some assessment ideas or submit your own so that we can include them on this website!

You can always contact me by clicking on the Contact Us link at the bottom of every page.

Sources:

[1] Evaluating the Effectiveness of Turnitin’s AI Writing Indicator Model
[2] International Journal for Educational Integrity
[3] Turnitin, no statistically significant bias against English Language Learners
[4] Cornell University
[5] Turnitin
[6] Fair Use Case

Page updated

Google Sites

Report abuse