Diving Deeper into AI Package Hallucinations

Sappiamo che gli LLM generano artefatti (“allucinazioni” nel linguaggio della metafora del cervello, che la stessa OpenAi che ha inventato il termine dice essere un termine sbagliato e vuorviante, ma tant’e’)

Quindi non ci fidiamo dei dettagli delle cose che dice un LLM, le verifichiamo

Risulta che qualcuno  si è accorto, usando un LLM per generare codice, che alcuni nomi di pacchetti erano artefatti, inesistenti. E allora cosa ha fatto ? ha creato un pacchetto vuoto con quel nome. e… sorpresa! e’ stato scaricato 30k volte.

E poi, cercando i riferimenti al pacchetto fake su github, ha scoperto che tante grandi aziende ne raccomandano l’utilizzo.

Per cui le raccomandazioni di buon senso che ci sono sotto, per mitigare il rischio di installarsi software pericoloso.

Questo potrebbe essere il più grande danno fatto dagli LLM ad oggi.

Source: Lasso Security

Heads up: Hallucinated packages in the wild?

During our research encountered an interesting python hallucinated package called “huggingface-cli”.‍‍

Hallucinated Package Case Study

I have decided to upload an empty package by the same name and see what happens. In order to verify the number of real downloads I have uploaded a dummy package called: “blabladsa123” in order to verify how many of the downloads are scanners.‍

The results are astonishing. In three months the fake and empty package got more than 30k authentic downloads! (and still counting).‍‍

Our Hallucinated Package Adoption

In addition, we conducted a search on GitHub to determine whether this package was utilized within other companies repositories. Our findings revealed that several large companies either use or recommend this package in their repositories. For instance, instructions for installing this package can be found in the README of a repository dedicated to research conducted by Alibaba.‍‍

Recommendations (What’s Next)

I’ll divide my recommendation into two main points:‍​

– First, exercise caution when relying on Large Language Models (LLMs). If you’re presented with an answer from an LLM and you’re not entirely certain of its accuracy—particularly concerning software packages—make sure to conduct thorough cross-verification to ensure the information is accurate and reliable.‍​

– Secondly, adopt a cautious approach to using Open Source Software (OSS). Should you encounter a package you’re unfamiliar with, visit the package’s repository to gather essential information about it. Evaluate the size of its community, its maintenance record, any known vulnerabilities, and the overall engagement it receives, indicated by stars and commits. Also, consider the date it was published and be on the lookout for anything that appears suspicious. Before integrating the package into a production environment, it’s prudent to perform a comprehensive security scan.

Continua qui: Diving Deeper into AI Package Hallucinations

If you like this post, please consider sharing it.

Leave a Comment

Your email address will not be published. Required fields are marked *