Ship with a reviewer
that finds real bugs.
Factory reviews every pull request against a rigorous bug-detection rubric. Findings come with P0-P3 severity, inline suggestions, and an approval when the diff is clean. Runs automatically in CI or locally with the /review command.
Wire it in once. Review every PR from then on.
Installed with a single `/install-code-review` command. Factory's GitHub App and GitLab workflow handle the rest.
01
Triggers on every PR
Runs on opened, synchronize, reopened, and ready-for-review events. Drafts are skipped so nothing fires during exploration.
02
Analyzes the diff
Fetches the PR diff, existing comments, and traces changed data flows across auth, validation, database, network, and filesystem boundaries.
03
Comments inline, approves clean diffs
Posts comments on problematic lines with severity, reasoning, and a suggested fix. Submits an approval when no issues are found.
Bugs that would have caused an incident. Nothing that slows the review down.
The reviewer is calibrated to surface clear, actionable defects. It skips stylistic concerns, minor optimizations, and architectural opinions so reviews don't turn into noise.
- >
Dead or unreachable code
- >
Broken control flow (missing break, fallthrough bugs)
- >
Async / await mistakes
- >
Null or undefined dereferences
- >
Resource leaks
- >
SQL and XSS injection
- >
Missing error handling
- >
Off-by-one errors
- >
Race conditions
Four levels. Matched to how your team triages.
Every finding ships with a priority so the review is actionable at a glance. The same rubric applies to standard code review and the dedicated security pass.
P0
Critical
Blocks release or operations. RCE, hardcoded prod secret, auth bypass, unauthenticated admin endpoint.
P1
High
Address next cycle. SQL injection behind auth, stored XSS, sensitive-data IDOR, very new dependency.
P2
Medium
Fix eventually. CSRF on state-changing ops, information disclosure, prompt injection behind auth.
P3
Low
Nice-to-have. Minor security hardening with a concrete but low-impact exploit path.
A finding only ships when all eight are true.
This is why our review stays signal-dense instead of drowning authors in opinions. Every comment is accompanied by clear reasoning, an appropriate severity, a concrete location, and a suggested fix where one is available.
01
Meaningful impact
Affects accuracy, performance, security, or maintainability.
02
Discrete and actionable
Clear, specific issue with a clear fix. No vague hand-waves.
03
Appropriate rigor
Fix doesn't demand more rigor than the rest of the codebase.
04
Introduced in changes
The bug was added in the reviewed diff, not pre-existing.
05
Worth fixing
The author would likely fix it if made aware.
06
No unstated assumptions
Based on verifiable facts about the code, not speculation.
07
Provably affected
We can identify specific affected code, not theoretical scenarios.
08
Not intentional
Clearly not a deliberate design choice.
Dial thoroughness to match the risk of the repo.
Pick the depth per workflow, or override model and reasoning effort directly. Run deep reviews on your shared services and shallow reviews on internal tooling.
Default preset
Deep
High reasoning effort catches subtle bugs across control flow, concurrency, and security boundaries. Best for production code and security-sensitive repositories.
- >
Higher-reasoning frontier model
- >
Two-pass candidate + validation loop
- >
Full diff and cross-file traces
with:
automatic_review: true
review_depth: deepFast + cost-efficient
Shallow
Surface-level review for high-volume repositories, draft PRs, or teams watching spend. Fast feedback with lower cost per PR.
- >
Lower-latency model
- >
Skims for common bug classes
- >
Ideal for pre-merge early checks
with:
automatic_review: true
review_depth: shallowA dedicated security pass on every PR.
Enable a two-pass security workflow that traces untrusted input across trust boundaries, validates exploitability, and reports only findings with a realistic path to impact.
Triggers
@droid securityon-demand review of this PR@droid security --fullfull-repo audit opens a PR with the reportautomatic_security_review: trueevery non-draft PR, no comment required
STRIDE threat modeling
Spoofing, tampering, repudiation, disclosure, DoS, elevation of privilege.
OWASP Top 10:2021
Access control, crypto, injection, misconfig, auth, SSRF, logging, integrity.
OWASP LLM Top 10
Prompt injection, sensitive disclosure, insecure output, excessive agency, vector weaknesses.
Supply-chain analysis
Typosquatting, install scripts, overly broad ranges, newly published packages.
Pipeline
Candidate generation → validation. Re-checks every candidate for reachability, exploitability, and existing controls before posting it. See Security Review for the full methodology.
Review anything before it ships.
Same rubric as the CI reviewer, running in your terminal. Use it as a pre-PR smoke test, a WIP check, or to dig into a teammate's commit after the fact.
droid
> /reviewAgainst a base branch
PR-style review comparing your branch to any local or remote target. Ideal before opening a PR.
Uncommitted changes
Reviews staged, unstaged, and untracked files in your working directory. Fast sanity check before committing.
A specific commit
Browse recent commits, pick one, and get a review of just that change set.
Custom instructions
Define your own review criteria - e.g. "focus on performance regressions and unnecessary re-renders".
Make it your reviewer.
The default rubric is strong out of the box. When you need repository-specific rules, layer them in without forking the workflow.
Repo-specific guidelines
Drop a SKILL.md at .factory/skills/review-guidelines/. It is automatically injected into every review run - no workflow edits needed.
Model and reasoning overrides
Override review_model or reasoning_effort per workflow. Use a smaller model for high-volume repos, a heavier one for shared services.
Path filters and skip rules
Scope to src/, skip generated code or bot PRs, or add a [skip-review] title marker to opt-out of individual PRs.
Comment budgets
Tell the reviewer to submit at most N comments, prioritizing the most critical issues. Tune the signal density to match your team.
167 validated bugs. 50 PRs. 5 open-source codebases.
The Factory Review Benchmark scores frontier and open-source models on the real bugs that slipped through human review in Sentry, Grafana, Keycloak, Discourse, and Cal.com. F1 combines precision (fraction of comments that are real bugs) with recall (fraction of golden bugs caught).
Model
Mean F1
Cost / PR
01
GPT-5.2
60.5%
$1.25
02
Claude Opus 4.6
59.8%
$3.11
03
Claude Sonnet 4.6
57.9%
$1.15
Top 3 of 13 models benchmarked. Last updated April 2026.
Configure reviews without leaving your workflow.
Every knob is available as an input on the Factory Droid action.
Input
Default
automatic_reviewfalseAutomatically review PRs without requiring @droid review.
review_depthdeepPreset: deep (thorough) or shallow (fast).
review_modelfrom depthOverride model used for code review.
reasoning_effortfrom depthOverride reasoning effort.
include_suggestionstrueInclude code suggestion blocks in comments.
automatic_security_reviewfalseRun the security pass on every non-draft PR.
security_model""Override model for security candidate generation and full-repo scans.
security_severity_thresholdmediumFull-repo scans only: minimum severity to include in the report.
Install the reviewer. Let it catch what your team shouldn't have to.
Run `/install-code-review` in a Droid session to scaffold the workflow for GitHub or GitLab - or read the docs first.

