I receive this from my friend Enrico who authorized me to post it:

Personally, after 6 months of intensive experimentation on Open AI embeddings and ChatGpt api , I find the answers are quite predictable and very sensitive to how the question is posed.

By operating carefully and methodically on prompt grounding and prompt engineering, a good deal of the magic that seems to appear on first attempts fades away. These are still excellent and useful tools, but even RAG is not exempt from “hallucinations” even if they occur on a smaller scale and are less egregious.

On articles texts, the really qualifying points are almost never picked up: in 90% of my tests, the answers given by both chatGPT and our RAG-based programs were found to be NOT satisfactory to those who had written the article, but in 90% of the cases those answers were judged favorably by those who had NOT read the article or had skimmed over it

Working on the prompts, the situation somewhat improves, but we are still far from the answers that would be given by a person who understands what she is reading. More importantly, they cannot be trusted.

The same situation applies to code generation: almost never the proposed code is an optimal solution in the eyes of an experienced programmer, but it represents a good advice for someone who is not familiar with the language.

In fact, LLMs talk convincingly about things they don’t understand, to people who don’t know what they are talking about, just as, unfortunately, so many humans do on social media at this time in history.

The difference is that among humans there is always at least one person who really understands what they are saying; among LLMs I’m afraid this is not the case.

I would say they are tools in line with the zeitgeist.

About the fact that “LLMs talk convincingly about things they don’t understand to people who don’t know what they’re talking about” I made a similar point in a previous post “ChatGPT and the “Report effect”. Report is a popular tv news show in Italy that usually produces the impression that, for topics one knows well, the content is trivial and/or unreliable.

