Skip to content Skip to footer

AI Content Detection: Bard Vs ChatGPT Vs Claude

Researchers recently conducted a study to explore the capabilities of AI content detection, specifically focusing on three prominent AI models: Bard, ChatGPT, and Claude. This investigation aimed to understand if AI models have an advantage in detecting their own generated content compared to content created by other AI models.

Introduction

The study was carried out by researchers from the Department of Computer Science, Lyle School of Engineering at Southern Methodist University. They delved into the realm of AI content detection, a critical area of research given the proliferation of AI-generated content in various domains.

Detecting AI-Generated Content

AI content detection primarily revolves around identifying the unique “artifacts” that arise from the underlying transformer technology in AI models. These artifacts can be distinct for each AI model due to differences in training data and fine-tuning. Consequently, an AI model might excel at detecting its own content because it can recognize its specific artifacts better than those of other models.

Methodology

The researchers tested three AI models:

  • ChatGPT-3.5 by OpenAI
  • Bard by Google
  • Claude by Anthropic

All these models were from the September 2023 versions. The researchers created a dataset comprising fifty different topics and provided identical prompts to each AI model to generate essays. Additionally, they paraphrased the original content and collected fifty human-generated essays on the same topics from the BBC.

Zero-Shot Prompting for Self-Detection

To evaluate self-detection, the researchers employed zero-shot prompting. They tasked each AI model with detecting whether a given text matched its writing pattern and word choice. The results from this self-detection process were then compared with the performance of an AI detection tool called ZeroGPT.

Results: Self-Detection

The initial findings from self-detection were intriguing. Bard and ChatGPT demonstrated relatively higher success rates at detecting their own content. However, Claude, the AI model from Anthropic, posed a unique challenge. Claude struggled to self-detect its own content, performing significantly worse than Bard and ChatGPT.

Quality vs. Detectability

One hypothesis proposed by the researchers was that Claude’s content contained fewer detectable artifacts, suggesting a higher quality of output in terms of minimizing AI artifacts. This raised the question of whether content that is harder to detect is inherently of better quality.

Results: Self-Detecting Paraphrased Content

The researchers extended their investigation to self-detecting paraphrased content. It was expected that AI models would excel at this task because the artifacts present in the original content should carry over to the paraphrased versions.

However, the results were surprising. Bard demonstrated the ability to self-detect paraphrased content effectively. In contrast, ChatGPT struggled to self-detect paraphrased content, performing only slightly better than random guessing. Claude, surprisingly, was able to self-detect paraphrased content, even though it couldn’t self-detect its original essays.

Unpredictable Outcomes

The outcomes of these tests were largely unpredictable, especially in the case of Claude. This unpredictability continued as the researchers examined how well the AI models detected each other’s content.

Results: AI Models Detecting Each Other’s Content

The final test involved evaluating how well each AI model could detect content generated by the other AI models. It was expected that Bard, which potentially generated more artifacts, would be easier to detect.

The results indicated that Bard-generated content was indeed the easiest to detect by the other AI models. However, detecting content generated by ChatGPT and Claude proved challenging for all three models.

Conclusions and Takeaways

In conclusion, this study highlights the complexity of AI content detection. Bard exhibited a relatively high success rate in detecting its own and paraphrased content. ChatGPT, while able to detect its own content, struggled with paraphrased versions. Claude, although unable to reliably self-detect its original content, surprisingly self-detected paraphrased content.

The results emphasize that detecting AI-generated content remains a challenging task, and the differences in detection capabilities among AI models are noteworthy. Further research in this area could benefit from larger datasets, diverse AI-generated text, and a comparison with more AI detectors. Additionally, studying how prompt engineering influences detection levels could provide valuable insights.

This study sheds light on the intricate relationship between AI models, their artifacts, and their ability to self-detect, offering valuable insights into the evolving field of AI content detection.

Leave a comment