Two independent datasets published in the last week of June put numbers on a shift most Swiss engineering organisations have felt but few have measured. A Flux survey of engineering leaders finds that nearly half of organisations now run AI-generated code in production, with fewer than 5% having no adoption plans at all. Cobalt's AI and Pentesting Pulse Report 2026, drawn from actual penetration-test findings rather than self-reporting, shows what that code and its surrounding features look like under adversarial pressure: AI and LLM pentests produce high-risk findings at 2.7 times the rate of every other system class, and the fix rate for those findings is the worst of any asset category tested. Read together, the two reports describe a production landscape whose risk profile has changed faster than the controls around it — a gap with specific consequences for Swiss firms whose software falls under FINMA governance expectations, the nDSG, or the EU's regulatory reach.
Adoption Outran the Feedback Loop
The Flux data shows why adoption keeps accelerating: two-thirds of organisations report higher productivity and a similar share cite faster prototyping. But the survey contains a corrective buried in the expectations gap — nearly half of non-adopters expect AI to reduce error rates, while only one-third of actual users observe that benefit. The productivity gains are real; the quality gains are largely assumed. Meanwhile the human control layer is saturating: roughly one in ten developers already spends over 40% of their time on code review, and about a third of engineering leaders say they cannot keep pace with the volume of weekly changes. The report's most operationally significant finding is which changes slip through when reviewers saturate: security modifications, dependency shifts, and performance-impacting alterations — precisely the categories where a subtle defect becomes an exploitable one.
This is the structural problem beneath the tooling debate. AI assistants multiply the volume of code flowing toward production while leaving review capacity flat. Organisations have responded — more than 80% have adjusted development and release processes, and close to half have bought code-quality analysis tooling — but a process adjusted around a saturated bottleneck is still bottlenecked. Nearly two-thirds of surveyed leaders believe AI could match human review performance, and 76% want tooling that specifically reduces AI-code risk, which signals where the market is heading: AI reviewing AI, with humans auditing the loop rather than every diff.
What the Pentest Data Actually Shows
Cobalt's report replaces anxiety with a measured baseline, and the baseline is poor. Roughly one in three findings from AI and LLM feature pentests is classified high-risk, against about one in eight for conventional systems. The vulnerability mix is a compounding of old and new: AI features inherit the classic web weaknesses — injection, broken authentication, insecure output handling — while adding prompt injection and model-level denial of service, classes for which most SDLC checklists have no entry. Mozilla's 0DIN research this same week demonstrated the practical end of that spectrum, showing that a malicious repository can hijack an AI coding agent into compromising a developer workstation without containing a single line of conventionally malicious code.
The remediation figures are the report's real warning for regulated firms. The resolution rate for serious AI findings fell to 38.4% — meaning roughly two of every three critical AI vulnerabilities identified in testing remain unfixed, the worst remediation performance of any asset class. Cobalt attributes the backlog to scarce dual-competence staff, dependence on model vendors for fixes the customer cannot apply, and immature security processes in fast-moving AI projects. And the incident data identifies the largest single cause of confirmed AI security incidents as none of the exotic new attack classes: shadow AI — employees feeding sensitive data into unapproved tools — accounts for 44% of incidents, comfortably ahead of data poisoning and supply-chain vectors.
The Swiss Compliance Angle Is Already Concrete
For Swiss organisations this is not a future-regulation problem. FINMA's supervisory expectations on AI governance require banks and insurers to inventory their AI applications, assign accountability, and demonstrate risk-appropriate controls — an expectation that plainly covers AI-generated code shipped into client-facing systems and AI features bolted onto banking platforms. The nDSG applies in full to personal data leaking through an insecure AI feature or a shadow-AI paste into a public chatbot, and the FDPIC has shown no appetite for treating AI incidents as a special category deserving leniency. Swiss firms serving the EU market add the AI Act's obligations for general-purpose and high-risk systems, whose next application deadline lands in August — weeks away, not quarters.
The practical translation: a Swiss bank whose developers merge AI-generated changes into payment code, whose product team ships an LLM-powered client assistant, and whose staff quietly use consumer AI tools are running three distinct risk programmes' worth of exposure, usually under one unwritten policy. The Cobalt numbers say the pentest findings will come; the Flux numbers say the review layer will not catch everything first; and the 38.4% remediation rate says the findings will accumulate unless someone owns them with the same discipline applied to any other critical vulnerability class.
◆ Key Takeaway
The debate over whether AI-generated code is production-ready is over — half the industry already ships it. The data now shows the cost side: AI features fail pentests at 2.7x the baseline rate, only 38% of serious findings get fixed, and shadow AI causes 44% of confirmed incidents. For Swiss regulated firms, the control framework — inventory, review gates, remediation ownership — is a current supervisory expectation, not a roadmap item.
- Tag AI-generated code at commit time. Provenance metadata (tool, model, human reviewer) makes AI-origin code auditable, lets you measure its defect rate against human-written code, and answers the inventory question FINMA-aligned governance will ask.
- Gate high-risk change categories for mandatory senior review. Security-relevant modifications, dependency changes, and authentication or crypto code — the categories the Flux data says slip through — should never merge on an AI review alone.
- Add prompt injection and insecure output handling to your SDLC checklists. Test every LLM feature against the OWASP LLM Top 10 before release, and include AI features explicitly in annual pentest scope rather than treating them as part of the web estate.
- Set a remediation SLA specifically for AI findings. The 38.4% industry fix rate is a benchmark to beat, not accept; track AI-finding closure separately and escalate vendor-dependent fixes through contract channels with deadlines.
- Attack shadow AI with provision, not prohibition. A sanctioned, logged, enterprise-grade AI tool with clear data rules removes the incentive that drives 44% of incidents; a ban simply moves usage to personal devices.
- Sandbox AI coding agents. Agents that read repositories and execute commands need the containment 0DIN's research demands: isolated execution environments, allowlisted tools, no ambient credentials, and human approval for state-changing actions.
- Report AI-code risk to the same committee that owns operational risk. Adoption metrics, review-coverage rates, pentest findings, and remediation SLAs belong in existing governance — creating a separate AI committee usually means creating a separate blind spot.
The next phase is predictable from the incentives. AI-assisted development will keep expanding because the productivity gains are real and measured; the review bottleneck will push organisations toward AI-driven review of AI-written code; and the security tooling market will chase the 76% of leaders asking for exactly that. What will separate Swiss organisations is not adoption speed but whether the control loop — provenance, gated review, AI-aware testing, owned remediation — matures at the same rate as the code volume. The firms that build that loop now will treat next year's supervisory questionnaire as paperwork. The ones that let the 38% fix rate become their own number will meet it as a finding.