Factory.ai

Ship with a reviewer
that finds real bugs.

Wire it in once. Review every PR from then on.

Installed with a single `/install-code-review` command. Factory's GitHub App and GitLab workflow handle the rest.

01

Triggers on every PR

Runs on opened, synchronize, reopened, and ready-for-review events. Drafts are skipped so nothing fires during exploration.

02

Analyzes the diff

Fetches the PR diff, existing comments, and traces changed data flows across auth, validation, database, network, and filesystem boundaries.

03

Comments inline, approves clean diffs

Posts comments on problematic lines with severity, reasoning, and a suggested fix. Submits an approval when no issues are found.

Bugs that would have caused an incident. Nothing that slows the review down.

The reviewer is calibrated to surface clear, actionable defects. It skips stylistic concerns, minor optimizations, and architectural opinions so reviews don't turn into noise.

  • >

    Dead or unreachable code

  • >

    Broken control flow (missing break, fallthrough bugs)

  • >

    Async / await mistakes

  • >

    Null or undefined dereferences

  • >

    Resource leaks

  • >

    SQL and XSS injection

  • >

    Missing error handling

  • >

    Off-by-one errors

  • >

    Race conditions

Four levels. Matched to how your team triages.

Every finding ships with a priority so the review is actionable at a glance. The same rubric applies to standard code review and the dedicated security pass.

P0

Critical

Blocks release or operations. RCE, hardcoded prod secret, auth bypass, unauthenticated admin endpoint.

P1

High

Address next cycle. SQL injection behind auth, stored XSS, sensitive-data IDOR, very new dependency.

P2

Medium

Fix eventually. CSRF on state-changing ops, information disclosure, prompt injection behind auth.

P3

Low

Nice-to-have. Minor security hardening with a concrete but low-impact exploit path.

A finding only ships when all eight are true.

This is why our review stays signal-dense instead of drowning authors in opinions. Every comment is accompanied by clear reasoning, an appropriate severity, a concrete location, and a suggested fix where one is available.

01

Meaningful impact

Affects accuracy, performance, security, or maintainability.

02

Discrete and actionable

Clear, specific issue with a clear fix. No vague hand-waves.

03

Appropriate rigor

Fix doesn't demand more rigor than the rest of the codebase.

04

Introduced in changes

The bug was added in the reviewed diff, not pre-existing.

05

Worth fixing

The author would likely fix it if made aware.

06

No unstated assumptions

Based on verifiable facts about the code, not speculation.

07

Provably affected

We can identify specific affected code, not theoretical scenarios.

08

Not intentional

Clearly not a deliberate design choice.

Dial thoroughness to match the risk of the repo.

Pick the depth per workflow, or override model and reasoning effort directly. Run deep reviews on your shared services and shallow reviews on internal tooling.

Default preset

Deep

High reasoning effort catches subtle bugs across control flow, concurrency, and security boundaries. Best for production code and security-sensitive repositories.

  • >

    Higher-reasoning frontier model

  • >

    Two-pass candidate + validation loop

  • >

    Full diff and cross-file traces

with:
  automatic_review: true
  review_depth: deep

Fast + cost-efficient

Shallow

Surface-level review for high-volume repositories, draft PRs, or teams watching spend. Fast feedback with lower cost per PR.

  • >

    Lower-latency model

  • >

    Skims for common bug classes

  • >

    Ideal for pre-merge early checks

with:
  automatic_review: true
  review_depth: shallow

A dedicated security pass on every PR.

Enable a two-pass security workflow that traces untrusted input across trust boundaries, validates exploitability, and reports only findings with a realistic path to impact.

Triggers

  • @droid securityon-demand review of this PR
  • @droid security --fullfull-repo audit opens a PR with the report
  • automatic_security_review: trueevery non-draft PR, no comment required

STRIDE threat modeling

Spoofing, tampering, repudiation, disclosure, DoS, elevation of privilege.

OWASP Top 10:2021

Access control, crypto, injection, misconfig, auth, SSRF, logging, integrity.

OWASP LLM Top 10

Prompt injection, sensitive disclosure, insecure output, excessive agency, vector weaknesses.

Supply-chain analysis

Typosquatting, install scripts, overly broad ranges, newly published packages.

Pipeline

Candidate generation → validation. Re-checks every candidate for reachability, exploitability, and existing controls before posting it. See Security Review for the full methodology.

Review anything before it ships.

Same rubric as the CI reviewer, running in your terminal. Use it as a pre-PR smoke test, a WIP check, or to dig into a teammate's commit after the fact.

droid
> /review

Against a base branch

PR-style review comparing your branch to any local or remote target. Ideal before opening a PR.

Uncommitted changes

Reviews staged, unstaged, and untracked files in your working directory. Fast sanity check before committing.

A specific commit

Browse recent commits, pick one, and get a review of just that change set.

Custom instructions

Define your own review criteria - e.g. "focus on performance regressions and unnecessary re-renders".

Make it your reviewer.

The default rubric is strong out of the box. When you need repository-specific rules, layer them in without forking the workflow.

Repo-specific guidelines

Drop a SKILL.md at .factory/skills/review-guidelines/. It is automatically injected into every review run - no workflow edits needed.

Model and reasoning overrides

Override review_model or reasoning_effort per workflow. Use a smaller model for high-volume repos, a heavier one for shared services.

Path filters and skip rules

Scope to src/, skip generated code or bot PRs, or add a [skip-review] title marker to opt-out of individual PRs.

Comment budgets

Tell the reviewer to submit at most N comments, prioritizing the most critical issues. Tune the signal density to match your team.

167 validated bugs. 50 PRs. 5 open-source codebases.

The Factory Review Benchmark scores frontier and open-source models on the real bugs that slipped through human review in Sentry, Grafana, Keycloak, Discourse, and Cal.com. F1 combines precision (fraction of comments that are real bugs) with recall (fraction of golden bugs caught).

Model

Mean F1

Cost / PR

01

GPT-5.2

60.5%

$1.25

02

Claude Opus 4.6

59.8%

$3.11

03

Claude Sonnet 4.6

57.9%

$1.15

Top 3 of 13 models benchmarked. Last updated April 2026.

Configure reviews without leaving your workflow.

Every knob is available as an input on the Factory Droid action.

Input

Default

automatic_reviewfalse

Automatically review PRs without requiring @droid review.

review_depthdeep

Preset: deep (thorough) or shallow (fast).

review_modelfrom depth

Override model used for code review.

reasoning_effortfrom depth

Override reasoning effort.

include_suggestionstrue

Include code suggestion blocks in comments.

automatic_security_reviewfalse

Run the security pass on every non-draft PR.

security_model""

Override model for security candidate generation and full-repo scans.

security_severity_thresholdmedium

Full-repo scans only: minimum severity to include in the report.

Install the reviewer. Let it catch what your team shouldn't have to.

Run `/install-code-review` in a Droid session to scaffold the workflow for GitHub or GitLab - or read the docs first.

start building

Ready to build the software of the future?

Start building

Arrow Right Icon