{"id":7742,"date":"2026-04-08T11:00:00","date_gmt":"2026-04-08T11:00:00","guid":{"rendered":"https:\/\/cybersecurityinfocus.com\/?p=7742"},"modified":"2026-04-08T11:00:00","modified_gmt":"2026-04-08T11:00:00","slug":"llm-generated-passwords-are-indefensible-your-codebase-may-already-prove-it","status":"publish","type":"post","link":"https:\/\/cybersecurityinfocus.com\/?p=7742","title":{"rendered":"LLM-generated passwords are indefensible. Your codebase may already prove it"},"content":{"rendered":"<div>\n<div class=\"grid grid--cols-10@md grid--cols-8@lg article-column\">\n<div class=\"col-12 col-10@md col-6@lg col-start-3@lg\">\n<div class=\"article-column__content\">\n<div class=\"container\"><\/div>\n<p>Two independent research programs, one from AI security firm Irregular, one from Kaspersky, have now converged on the same conclusion: Every frontier LLM generates structurally predictable passwords that standard entropy meters catastrophically overrate. AI coding agents are autonomously embedding those credentials in production infrastructure, and conventional secret scanners have no mechanism to detect them.<\/p>\n<p>As a security professional who has spent considerable time scrutinizing how generative AI integrates into enterprise development workflows, I confess that the quantification of what I already suspected still gave me pause. Irregular, an AI security evaluation firm, prompted Claude Opus 4.6 to generate passwords in 50 independent sessions. Only 30 distinct strings emerged from those 50 attempts. One specific sequence, <a href=\"https:\/\/www.irregular.com\/publications\/vibe-password-generation\">G7$kL9#mQ2&amp;xP4!w<\/a>, recurred 18 times, a repetition rate of 36 percent. Over a genuinely uniform distribution across a 94-character printable ASCII alphabet, the probability of any specific 16-character sequence appearing even twice in 50 draws approaches the vanishingly infinitesimal. The model is not generating passwords; it is retrieving them.<\/p>\n<p>That distinction is the crux of an emerging and underappreciated threat class. LLM-generated passwords satisfy every superficial heuristic we have trained practitioners to apply requisite length, case heterogeneity, numerical and symbolic admixture, absence of recognizable dictionary fragments. Automated checkers consistently rate them excellent. The peril is not in how they appear to tools designed for a different threat model; it is in how they function against an adversary who understands the distributional peculiarities of autoregressive generation.<\/p>\n<h2 class=\"wp-block-heading\">The architectural incompatibility<\/h2>\n<p>The root pathology is architectural rather than configural, a distinction of considerable practical significance because it forecloses remediation through tuning. A <a href=\"https:\/\/csrc.nist.gov\/pubs\/sp\/800\/90\/a\/r1\/final\">cryptographically secure pseudorandom number generator<\/a> (CSPRNG), as mandated by NIST SP 800-90A Rev. 1 for all security-sensitive entropy generation, produces each character with statistically equal probability drawn from a truly uniform distribution. No character is preferentially weighted. No positional bias exists. Every token is independent of every antecedent token.<\/p>\n<p>Large language models operate on a fundamentally antithetical principle. They are trained to assign maximal probability to the most plausible successor token given an accumulated context, a mechanism that is simultaneously the source of their remarkable generative fluency and their categorical unsuitability for cryptographic applications. When prompted to produce a password, an LLM draws upon its internalized distributional knowledge of what human-generated passwords characteristically look like: The prevalence of uppercase initiation, the clustering of numerals in medial positions, the predilection for terminal exclamation marks. These are not aberrations; they are the faithful expression of training-corpus statistics.<\/p>\n<p>Irregular\u2019s research quantifies this chasm using Shannon entropy applied to observed character-frequency distributions across generation corpora. A 16-character password drawn from a genuine CSPRNG over the full 94-character ASCII set carries approximately 98 bits of entropy by this measure. <a href=\"https:\/\/www.irregular.com\/publications\/vibe-password-generation\">Claude Opus 4.6 achieves roughly 27 bits<\/a>, a deficit of approximately 72 percent relative to the cryptographic baseline. GPT-5.2\u2019s 20-character passwords, evaluated via the log-probability method, exhibit entropy closer to 20 bits. Conventional strength estimators, including the widely deployed <a href=\"https:\/\/github.com\/dropbox\/zxcvbn\">zxcvbn library<\/a>, characterize these same passwords at 98 to 100 bits. The divergence is not marginal; it is nearly an order of magnitude.<\/p>\n<h2 class=\"wp-block-heading\">Temperature is not a remedy<\/h2>\n<p>A reflexive objection from practitioners familiar with LLM configuration holds that increasing sampling temperature would attenuate these distributional biases by flattening the probability landscape from which characters are drawn. Irregular\u2019s empirical results are unambiguous in refuting this intuition. Testing conducted at temperature 1.0, the maximum setting on Claude, produces no statistically meaningful improvement in effective entropy. The character-position biases are encoded in model weights, not in sampling parameters, and temperature modulation operates downstream of those weight-instantiated distributions.<\/p>\n<p>Separately, <a href=\"https:\/\/www.kaspersky.com\/blog\/international-password-day-2025\/53355\/\">Kaspersky\u2019s Data Science Team Lead Alexey Antonov<\/a> conducted a complementary investigation analyzing 1,000 passwords generated by ChatGPT, Meta\u2019s Llama, and DeepSeek. The character-frequency histograms disclosed pronounced non-uniformity across all three models: ChatGPT exhibits a systematic preference for the characters x, p, and L; Llama for the hash symbol and the letter p; DeepSeek for t and w. At temperature 0.0, Claude produces the identical string on every invocation. These findings are consistent across different model families and measurement methodologies, corroborating the structural rather than incidental nature of the vulnerability.<\/p>\n<p>The practical corollary is that an adversary who has identified the LLM used to generate a target credential need not attempt exhaustive brute-force against a 94^16 keyspace. They can construct a model-specific attack dictionary, ordering candidates by their empirical generation frequency, and execute a probabilistically optimized search against a keyspace several orders of magnitude smaller. Kaspersky\u2019s cracking tests found that 88 percent of DeepSeek passwords and 87 percent of Llama passwords failed to withstand targeted attack, as did 33 percent of ChatGPT passwords, all using standard GPU hardware.<\/p>\n<h2 class=\"wp-block-heading\">The agentic injection problem<\/h2>\n<p>The portion of this problem amenable to user education, practitioners being counselled not to solicit passwords from conversational AI interfaces, represents a fraction of the aggregate exposure. The more consequential and considerably less tractable vector is autonomous credential generation by AI coding agents embedded in professional development toolchains.<\/p>\n<p>When an AI coding agent such as <a href=\"https:\/\/github.blog\/2023-07-28-smarter-more-efficient-coding-github-copilot-goes-beyond-codex-with-improved-ai-model\/\">GitHub Copilot<\/a>, Claude Code, or an analogous instrument receives a task specification entailing database initialization, containerized service configuration, or API bootstrapping, it generates credentials as a functional prerequisite of task completion. No explicit instruction to produce a password is required; the agent infers necessity from context. The resulting credential is embedded in a Docker Compose environment variable, a .env configuration file, or a Kubernetes secret manifest and is committed to version control by a developer whose attentional resources are directed at functional correctness, not credential provenance.<\/p>\n<p>The <a href=\"https:\/\/genai.owasp.org\/llm-top-10\/\">OWASP Top 10 for LLM Applications 2025<\/a> designates insecure output handling as a critical risk category, one that encompasses precisely this failure mode, wherein LLM-generated content is consumed without appropriate validation by downstream systems and processes. The credential thus introduced is not flagged by <a href=\"https:\/\/github.com\/gitleaks\/gitleaks\">Gitleaks<\/a> or <a href=\"https:\/\/github.com\/trufflesecurity\/trufflehog\">Trufflehog<\/a>, because those tools employ pattern-matching against known secret formats and have no capacity to evaluate the character-position entropy distribution that distinguishes a CSPRNG-derived credential from an LLM-derived one.<\/p>\n<h2 class=\"wp-block-heading\">Organizational response priorities<\/h2>\n<p>The remediation landscape is tractable for organizations prepared to act methodically. The following priorities are sequenced by immediacy of risk reduction.<\/p>\n<p>Conduct a retrospective audit of all AI-assisted repositories dating to early 2023, when agentic coding tools achieved widespread enterprise adoption. Particular scrutiny should be directed at configuration files, Docker Compose YAML, and .env entries. Credentials exhibiting LLM-characteristic distributional signatures, consistent uppercase initialization, medial numeral clustering, terminal special characters, warrant investigation regardless of their apparent complexity.<\/p>\n<p>Rotate every credential whose provenance cannot be affirmatively traced to a CSPRNG invocation. The canonical CSPRNG interfaces, Python\u2019s secrets.token_urlsafe(), openssl rand -base64, \/dev\/urandom, are the only acceptable sources. An audit trail establishing provenance is operationally valuable; absent such a trail, the presumption should favor rotation.<\/p>\n<p>Amend AI coding tool system prompts and secure development guidelines to mandate explicit CSPRNG invocation for all credential generation. The instruction must be categorical: The agent generates no password strings; it calls the appropriate platform function. This single-sentence policy amendment, consistently enforced, prevents the class of agentic injection at its origination point.<\/p>\n<p>Augment static secret scanning with entropy-aware analysis capable of evaluating character-position distributions rather than merely pattern-matching against known formats. This capability gap is currently the central technical challenge in operationalizing detection for this threat class.<\/p>\n<p>Escalate to LLM vendors through enterprise agreement channels. The architectural fix, routing password generation requests to a CSPRNG backend rather than processing them through the autoregressive generation pipeline, is an engineering decision available to AI providers. <a href=\"https:\/\/pages.nist.gov\/800-63-4\/sp800-63b\/passwords\/\">NIST SP 800-63B Revision 4<\/a>, released in August 2025, establishes unambiguous guidance on entropy requirements for authentication credentials. Vendor accountability to that standard is a legitimate contractual expectation.<\/p>\n<h2 class=\"wp-block-heading\">The broader epistemological challenge<\/h2>\n<p>The phenomenon of LLM-generated passwords, now being called \u2018vibe passwords\u2019 in security community discourse, an appellation that captures the verisimilitude without the substance, is a specific instantiation of a broader epistemological challenge that will recur as AI-generated content becomes more deeply entangled with security-sensitive infrastructure. The training objective that makes large language models extraordinarily capable of producing contextually appropriate, humanistically plausible outputs is structurally incompatible with the mathematical requirements of cryptographic security, which demand genuine unpredictability precisely where pattern and plausibility offer no traction.<\/p>\n<p>The diagnostic tools and remediation pathways exist. What the security community requires, with some urgency, is the systematic awareness that the problem has already propagated into production environments at a scale that warrants immediate and deliberate organizational response, not anticipatory policy, but retrospective investigation.<\/p>\n<p><strong>This article is published as part of the Foundry Expert Contributor Network.<\/strong><br \/><strong><a href=\"https:\/\/www.csoonline.com\/expert-contributor-network\/\">Want to join?<\/a><\/strong><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>Two independent research programs, one from AI security firm Irregular, one from Kaspersky, have now converged on the same conclusion: Every frontier LLM generates structurally predictable passwords that standard entropy meters catastrophically overrate. AI coding agents are autonomously embedding those credentials in production infrastructure, and conventional secret scanners have no mechanism to detect them. As [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":7743,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-7742","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-education"],"_links":{"self":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/7742"}],"collection":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7742"}],"version-history":[{"count":0,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/posts\/7742\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=\/wp\/v2\/media\/7743"}],"wp:attachment":[{"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cybersecurityinfocus.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}