Detecting AI

On Monday, March 9, 2020, Governor Mike DeWine announced that three cases of COVID-19 had been confirmed in Ohio. Two days later, he reported that Ohio had four cases.

I was confused. Does that mean Ohio now had seven cases, or does the second announcement include the three previously reported? When I figured out that seven was the total, I knew we had a problem. As this played out, the total number of cases was going to be much less important than the number of new cases reported each day. And they weren’t reporting that number. If you want to know how many new cases there are today, you have to know how many total cases there were yesterday, and subtract that from today’s total.

So that’s what I did. I created a spreadsheet. Every day, I added the new numbers. Eventually, I expanded it to include the reports for my county, the zip codes covered by my school district, and the cases actually reported by the schools. This gave us lots of analytic data. We could compare the school district’s case trends to the county and to the state. We could look at 14-day averages, and recognize trends in the data. We could see spikes before they happened. We could predict school cases and take proactive measures to respond. We used this information, along with the expertise of health care officials and professionals, to inform some of the back-to-school policies that we implemented in the 20-21 and 21-22 school years.

I kept tracking the numbers for two years. By then, we had ubiquitous testing. Vaccines were readily available to anyone who wanted them. In general, people were much less frightened about the virus, and were less likely to seek health care when they experienced Covid-like symptoms. That also meant that the officially reported cases were no longer representative of the actual numbers. If I’m not feeling well, I have three options: I can just stay home and wait it out like the flu. I can take an at-home Covid test. If it’s positive, I know to isolate to keep from spreading it, but otherwise the treatment is really the same. Or, I can go see a health care professional. They’re going to test me and, again, unless I have life-threatening symptoms, I’ll be sent home and told to isolate and rest. Only one of those gets reported as a confirmed case. I realized that my spreadsheet was grossly understating the number of actual cases, because most cases weren’t being reported. So I stopped tracking.

I reasoned that my data could be falsely used to claim that the pandemic was over, and that we could just go back to our pre-Covid lives. Sometimes having no information is better than having inaccurate information, because numbers and graphs tend to carry a lot of credibility, even if they’re wrong.

A few weeks ago, I was asked about apps that detect artificial intelligence use in student writing. I asked a couple generative AI tools how to detect this. Both indicated that there are no foolproof ways to detect AI use, and suggested looking for inconsistencies in tone, writing style, and factual accuracy. I also asked if it could generate text that would fool AI detectors. After insisting that it would be unethical to do so, both tools indicated that they could fool AI detection algorithms if they weren’t hampered by ethical guardrails.

Late last year, Forbes evaluated AI content detection tools. The best one on their list had a 90% accuracy rate. That’s pretty good, right? I give it a snippet of text, and it says “I’m 90% sure that this text is student (or AI) written.” We can live with that.

But what if I give it two samples? The likelihood it correctly classifies each of them is 90%, but the likelihood that it gets BOTH correct is 90% x 90% = 81%. That’s not as great. And if I give it 20 samples because the whole class did the assignment, I’m down to 90%^20 = 12.16%. If I have my students complete a writing assignment, and I submit their work to the best AI detector available, there’s a 12% chance it’ll correctly determine which ones are student-written and which ones used AI.

That might be okay if this is used for formative feedback. The student could use that information to help become a better writer. They might learn to stop writing like a robot. But if we’re using it in a summative way to punish students for cheating, it’s not nearly reliable enough. I think this is another case where not having the data is better than having inaccurate data.

The AI tools themselves provided some guidance on detecting AI. Rely on human judgement. If you’re a seventh grade language arts teacher, you know what seventh grade writing looks like. Look for inconsistencies in writing style or voice, which AI has trouble getting right. And use all of the intermediate pieces of the writing assignment as part of the assessment. If the final work doesn’t follow naturally from the outline and rough drafts, there’s a problem.

AI is a lot like calculators. There was a time when math teachers were worried that the world was going to end because students had access to calculators. Ultimately, we changed both what and how we teach in math class. We don’t spend quite as much time on computational algorithms, and that allows us to spend more time on interesting problems and conceptual understanding. AI may change where we place emphasis on various aspects of writing instruction, but ultimately it will help our students become better writers.