AI use case

GitHub Copilot Secret Scanning: AI-Powered Generic Password Detection

GitHub built Copilot Secret Scanning, an AI-powered feature in GitHub Secret Protection that detects generic passwords in codebases — a problem traditional regex-based scanners …

Browse catalog

At a glance

Core facts from this catalog record. Primary narrative lives in the hero above; full raw fields follow in the next section.

Company/Organization: GitHub
Industry: Application Software
Location: San Francisco

Record fields

Every column from the source row, in stable order. URLs open in a new tab.

Title

GitHub Copilot Secret Scanning: AI-Powered Generic Password Detection

Content

GitHub built Copilot Secret Scanning, an AI-powered feature within GitHub Secret Protection that uses large language models to detect generic passwords in codebases — a class of credentials that traditional regex-based scanners could not identify reliably. The feature, which became generally available in October 2024, is now deployed across nearly 35% of all GitHub Secret Protection repositories and has achieved up to 94% reduction in false positives across organizations. GitHub's first iteration used GPT-3.5-Turbo with few-shot prompting — a resource-efficient model choice for running detection at scale. The architecture routes potential-secret findings through an LLM prompt that includes the file's source code, location, and usage context, with a strict JSON output specification for automated processing. Private preview participants surfaced a problem: while the model worked well on conventional coding patterns, it failed spectacularly on unconventional file types and structures not typically seen in LLM training data. The team enhanced their offline evaluation framework by adding real customer reports and using GPT-4 to generate test cases from existing secret scanning alerts in open source repositories, then ran systematic experiments across models (GPT-3.5-Turbo, GPT-4, GPT-4-Turbo, GPT-4o-mini) and prompting strategies (few-shot, zero-shot, Fill-in-the-Middle, Chain-of-Thought). In collaboration with Microsoft, the team adopted the MetaReflection technique, a novel offline reinforcement learning approach that synthesizes experiential learnings into a hybrid Chain-of-Thought and few-shot prompt, improving precision with only a small recall penalty. GitHub also tried voting (asking the model the same question many times) for more deterministic responses and used GPT-4 as a confirming scanner for candidates found by GPT-3.5-Turbo, with the final production stack combining all these techniques. The most impactful scale engineering was a workload-aware request management system. Secret Scanning must process both incoming Git pushes and full Git history scans, and traffic patterns differ: pushes correlate with working hours, history scans correlate with discrete events like enabling the feature on a new organization. Per-workload rate limits led to one workload hitting limits while another had idle capacity. GitHub built a weighted fair-priority queue that lets resources flow between workloads, drawing on YouTube's Doorman and GitHub's own Freno. The algorithm was so effective that GitHub re-used it in Copilot Autofix and security campaigns. Confidence for general availability came from a mirror testing framework: GitHub rescanned the public preview repositories with the latest improvements to measure impact on real alert volumes and false positive rates, without exposing users to unreleased behavior. The result was a 94% drop in false positives with very few missing real passwords.

Continue exploring AI deployments in the catalog.

Back to use cases