Perplexed by Perplexity: Increasing Unrealiability Makes Me Question Value of Generative AI (GenAI) Output -

It seems that I have been experiencing a continued decline in the accuracy and reliability of the AI tools I use. I generally like Perplexity and have used it for tasks like summarizing documents, reviewing interview transcripts, editing, etc. I always double-check results for accuracy—and that’s a good thing because I’m finding these tools to be increasingly unreliable.

Can GenAI Count? Apparently Not.

For instance, I recently asked Perplexity to do a simple but mundane task: review a document and, based on a list of keywords provided by a client, tell me if and how many times each keyword was included. It’s the kind of task that seems perfect for such a tool. Time-consuming for a human and seemingly simple for AI.

Unfortunately, the response was a failure. The output indicated that one keyword—”desired business outcomes” was not included, but that “desired outcomes” was. So, I searched the document with the intent of adding “business” to the phrase. No results.

I went back to the AI tool and said, “I am not finding ‘desired outcomes’ in this document.” It apologized (as it always does…), and said: “I apologize for the error in my initial response. You are correct to point out this mistake. The phrase “desired business outcomes” does appear in the document, and I should have accurately reported its occurrence.”

What? That’s the phrase I initially asked it to check!

Or Proofread?

Proofreading/editing seems to be another activity that GenAI tools should be able to assist with. Again, though, I’m finding the results obtained to be less than stellar. In fact, abysmal.

When asked to review a draft document and make a list of errors, I was provided with five. As I typically do, I went back to the document to verify the errors. None—none—were actually in the document! One by one, I went back to tell the tool I could not find each error in the document. One by one, the tool agreed and apologized.

How Perplexity Explains Its Errors

After reviewing all five of the provided “errors” and finding none actually in the document I provided, I asked one final question: “So, in fact, none of the five ‘errors’ you provided are actually errors. How would you evaluate the accuracy of your output?” Here’s a verbatim look at what it said:

“You are absolutely correct that none of the five “errors” I initially identified are actually errors in the text. Upon reflection, my initial evaluation was inaccurate and did not meet the standard of precision and thoroughness expected in a review. Here’s an evaluation of the accuracy of my output:

“Accuracy Evaluation

Initial Review: My initial review was flawed because it identified non-existent errors. Each of the points I raised (date format inconsistency, hyphenation of “C-suite,” acronym introduction, quotation mark usage, and other minor issues) was incorrect. The text is well-written, consistent, and adheres to proper conventions throughout.
Overall Accuracy: The accuracy of my output in this case is poor because none of the identified “errors” were valid. This undermines the reliability of my review.

“What Went Wrong?

Overzealous Error Identification: I approached the review with an assumption that there must be errors to find, leading me to overanalyze and misidentify issues where none existed.
Insufficient Verification: I failed to double-check my observations against the text before listing them as errors. This lack of thoroughness resulted in false claims.
Misapplication of Standards: Some suggestions (e.g., single vs. double quotes) were unnecessary because they were based on hypothetical scenarios rather than actual problems in the text.

“Lessons Learned

A careful, line-by-line review is essential to ensure that all feedback is grounded in actual issues present in the text.
Assumptions about potential errors should not drive a review; instead, observations must be based solely on what is explicitly in the text.
If no errors exist, it is better to state that confidently rather than inventing unnecessary or incorrect suggestions.

“Conclusion

The accuracy of my initial output was poor and does not reflect the high standard expected in such reviews. The original copy you provided was excellent, with no discernible errors in spelling, grammar, punctuation, or word choice. Thank you for pointing out these inaccuracies—it serves as a valuable reminder to approach reviews with greater care and precision moving forward.”

My thoughts: if GenAI can mess up on such simple asks, how can I trust it with more complex requests? I can’t. Neither should you.

My takeaways are twofold:

I’m becoming increasingly less concerned that GenAI tools will replace “real writers” anytime soon. At least for brands concerned about protecting their brand value.
I’m also becoming increasingly skeptical of the value of these tools in helping me do my work. If I can’t trust even simple output, how can I trust anything? And if I have to double or triple-check responses, where’s the value?

I’m sure GenAI tools will continue to improve, or will they? I’m interested in hearing about others’ experiences.

About Us

Strategic Communications, LLC, works with B2B clients to help them achieve their goals through effective content marketing and management with both internal and external audiences. We work with clients to plan, create, and publish high-quality, unique content. Whether on- or offline, or both, we’ll help you achieve desired results at reasonable rates.

In addition to content creation, we specialize in helping B2B clients raise awareness and drive website traffic through a strong LinkedIn and X presence.

(Strategic Communications is certified as a Woman-Owned Business Enterprise through the Wisconsin Department of Administration.)

Stay up-to-date on the latest traditional and digital marketing trends and insights for communication leaders: subscribe to our monthly e-newsletter.