Methodology

Data Source#

GH Archive stores all public GitHub events in BigQuery. We query these daily tables to count how AI code review bots interact with pull requests. The AI Share percentage and the weekly time-series charts (AI Share, Total Volume) are computed entirely from this BigQuery data — no GitHub API calls are involved, and we collect all public events (not a sample). Product-level rankings and totals also include emoji reaction reviews discovered via the GitHub API, which capture bot activity invisible to GH Archive.

Additional metadata — repository stars, primary languages, comment reactions (👍/👎), and emoji-based review signals (🎉) — comes from the GitHub REST API via a separate enrichment pipeline. This data powers per-product detail pages, language breakdowns, and reaction sentiment (👍 Rate), but does not feed into the AI Share calculation.

Note: Only public repositories are included. Activity on private repos is invisible.

Interpreting the Numbers#

Review counts measure activity volume, not review quality or depth. Two bots with the same number of reviews may be doing very different things — and a direct comparison of their counts can be misleading. Here's why:

Bots have different scopes. Some bots do comprehensive code review covering style, bugs, security, and performance in a single pass. Others focus narrowly — bug prediction only, security scanning only, or enforcing team-specific linting rules. A bug-prediction bot that flags one critical defect is doing fundamentally different work than a style bot that flags 20 formatting issues, even though the latter generates 20× more review comment events. The same applies to a security scanner that catches a single vulnerability versus a bot that leaves comments on every function missing a docstring. Volume says nothing about severity or value.

Not every review is a code review. Bots generate events for operational reasons that have nothing to do with reviewing your code. A bot might post a review to tell you you've exceeded your usage quota, or respond to your reply explaining why a flagged issue might be a false positive. These are legitimate GitHub events that appear in our counts, but they're administrative overhead, not code analysis.

Some counts include benchmarking and testing. A few products run large-scale public evaluations — reviewing thousands of PRs across open-source repositories to benchmark their analysis engine. These reviews are real GitHub events and are counted in our data. There's no reliable way to distinguish a “benchmark run” from organic usage in GH Archive data, so these inflate the product's review counts.

Comment volume reflects bot design, not thoroughness. A bot that posts one summary comment per PR generates far fewer events than one that posts individual inline comments for each finding. A bot configured with strict custom rules for a large monorepo will generate more events per PR than the same bot with default settings on a small project. Volume is a function of configuration and design philosophy as much as adoption.

The bottom line: use the numbers to understand adoption trends and relative growth over time — not to judge which bot gives “better” reviews. For that, you'd need to evaluate the actual content and accuracy of their suggestions, which is beyond what event counting can tell you.

What Counts as a “Review”#

We track four types of GitHub signals that indicate a bot participated in code review:

1. Reviews (PullRequestReviewEvent)

Fired when a review is submitted — approve, request changes, or comment. This is the primary metric used for rankings. Even a silent approval (no comment body) generates this event.

2. Review Comments (PullRequestReviewCommentEvent)

Fired for each inline comment on a PR diff. A single review submission can contain many inline comments, each generating a separate event. This gives a more granular view of how verbose a bot's feedback is.

3. PR Comments (IssueCommentEvent on PRs)

Top-level comments posted on pull requests (not inline on diffs). Many bots use these for summaries, walkthrough guides, or analysis reports rather than the formal review API. In GitHub's data model, PRs are issues — so IssueCommentEvent fires for both. We filter to only include comments on pull requests.

4. Emoji Reactions on PRs

Some bots signal approval by adding emoji reactions to PR descriptions — for example, a 🎉 reaction indicates a bot has reviewed and approved the PR. GitHub's Events API has no event type for reactions, so these are invisible in GH Archive. We discover them by scanning PRs via the GitHub Reactions API and checking whether a tracked bot left a 🎉. Only hooray (🎉) counts as a review signal — other reactions like 👀 indicate the PR is still being reviewed.

To avoid double-counting, a reaction review is only counted if the bot has no other activity on that PR (no review event, no comments). This captures bots like Sentry that add 🎉 when they review a PR and find no issues — a deliberate low-noise approach that avoids leaving a formal review or comment.

Collecting this data requires individual GitHub API calls for each discovered PR, which can take days at scale. Check the /status page for current progress.

The AI share percentage on the home page uses a simple formula, computed separately for each event type (reviews, review comments, PR comments):

AI Share % = tracked_bot_events / (tracked_bot_events + non_bot_events) × 100

Numerator (tracked bot events): Only events from the bot accounts we explicitly track (see Bot Registry). If an AI code review tool isn't in our registry, its activity does not count as “AI.”

Denominator: The sum of tracked bot events and non-bot events — i.e., the total pool of activity. The non-bot portion is calculated by taking all public events and excluding our tracked bots and any GitHub account with a [bot] suffix. This means non-AI automation bots (like dependabot[bot] or renovate[bot]) are excluded from both the AI count and the human count, so they don't inflate either side.

Some tracked bots use regular user accounts without the [bot] suffix — for example, GitHub Copilot operates as both copilot-pull-request-reviewer[bot] and the regular account Copilot. These non-bot accounts are explicitly excluded from the human count in the BigQuery query, so they correctly count as AI activity rather than inflating the human side.

No double counting: Each event type is counted independently — the AI Share and Total Volume charts let you toggle between Reviews, Review Comments, and PR Comments. A bot that submits 1 review with 5 inline comments contributes 1 to the Reviews metric and 5 to the Review Comments metric, but these are never combined. The same counting logic applies to both the bot and non-bot sides, so the ratio is apples-to-apples.

Events, not unique PRs: We count events, not unique pull requests — if a bot comments twice on the same PR (e.g., once when the PR is opened and again on a new commit push), that's two events. This means both sides of the ratio scale with activity intensity, not just reach. A bot that runs on every commit to a PR generates more events than one that runs once, and similarly, a human reviewer who leaves multiple rounds of feedback generates more events than one who reviews once. Because the same counting applies to both sides, the AI Share percentage is a somewhat fair comparison — but with an important caveat: bots are automated and typically re-run on every commit pushed to a PR, generating a new review each time, whereas human reviewers usually review once or twice and don't re-review on every push.This asymmetry means event-based counting inherently amplifies bot activity relative to human activity, and the AI Share percentage likely overstates the true share of PRs that receive AI review. It measures share of review activity (volume of events), not share of PRs reviewed. We don't currently have a “per run” metric (where a run is a single invocation of a bot, whether triggered by a PR opening, a new commit, or an @mention). Even with per-event timestamps from GH Archive and enriched comment timestamps from the GitHub API, reliably grouping events into runs would require heuristics (e.g., time-window clustering) that are fragile across different bot behaviors. Our pipeline aggregates events to weekly buckets for trend analysis, which precludes run-level detection.

This means the percentage represents “share of non-bot public GitHub code review activity attributable to tracked AI bots.” The true share of AI-assisted reviews may be higher, since we miss untracked tools and AI tools operating through regular user accounts rather than GitHub App bot accounts. Private repos are also invisible, though their AI adoption rate may differ from public repos.

How Bots Differ#

Not all bots use the same mix of event types. This affects how they rank depending on which metric you look at. For example:

Some bots (like GitHub Copilot) use the formal review API almost exclusively — they show up strongly in Reviews and Review Comments but produce few or no PR Comments.
CodeRabbit posts walkthrough summaries as top-level PR comments alongside inline review comments, so it generates significant activity across all three event types.
Sentry posts inline comments pointing out bugs on specific lines (Review Comments), but when it reviews a PR and finds nothing, it signals this with a 🎉 emoji reaction and a CI status check — neither of which produces a trackable event in GH Archive. This means some of Sentry's review activity is invisible to our BigQuery-based data — until we enrich it with GitHub API calls (see Emoji Reactions on PRs above for how we recover these).

You can see the exact event-type breakdown for each product on its detail page.

How Rankings Work#

Products are ranked by growth rate rather than absolute volume. A product with fewer total reviews but rapid adoption will rank higher than a larger product with flat or declining growth.

Growth is calculated by comparing review volume in the most recent 12-week window against the previous 12-week window:

Growth % = (recent_12w_reviews − previous_12w_reviews) / previous_12w_reviews × 100

This means a product that doubled its review count from one quarter to the next shows +100% growth, regardless of whether that's 1,000 → 2,000 or 100,000 → 200,000 reviews.

Minimum Baseline & “New” Products

Growth percentages require a meaningful baseline to be useful. A product that goes from 5 to 50 reviews shows +900% growth — technically correct but misleading when compared to established products. To prevent this, we require at least 100 reviews in the previous 12-week window before calculating a growth percentage.

Products below this threshold that still have recent activity display a New badge instead of a growth percentage. These are recently launched tools still building their initial user base. Once they accumulate enough review history, the badge is replaced with a real growth rate.

Products with zero reviews in the last 12 weeks are automatically detected as Inactive — no manual status change is needed. This catches tools that have stopped operating even if they haven't been formally retired. Their historical data is preserved and they reactivate automatically if activity resumes.

Additionally, growth is capped at ±999% to prevent extreme outliers from distorting rankings. In practice, this cap rarely triggers — the baseline threshold handles the most common case.

Ranking Order

We chose growth over absolute volume because it makes the ranking more dynamic, giving credit to fast-growing tools over older, established ones — while the 12-week window keeps it stable enough to avoid noise from week-to-week fluctuations.

New products (with the New badge) have a growth of 0% for ranking purposes, placing them alongside products with neutral growth — above retired or declining tools, but below products with established positive growth trends. The detailed comparison table lets you sort by any metric — including total reviews, repos, and organizations — if you prefer a different ranking.

The default “Top 10” product selection in the filter bar also uses growth rate, so newly emerging tools appear by default alongside established ones.

PR Profile & Merge Characteristics#

Each product's detail page and the comparison table show characteristics of pull requests the bot has reviewed: average size (additions, deletions, files changed), merge rate, and time to merge.

This data comes from a separate enrichment step — the pipeline fetches PR metadata from the GitHub REST API for PRs discovered via GH Archive. It is not derived from GH Archive events directly.

Important caveats

Progressive enrichment. We discover every PR where a tracked bot left an event in GH Archive, then fetch metadata via the GitHub API. Until enrichment completes, statistics are based on the subset already enriched — check the /status page for current progress. The count is shown alongside each stat (“based on X PRs”).
Correlation, not causation. A bot reviewing a PR does not mean it influenced the merge rate or time to merge. These stats describe the kind of PRs the bot reviews — not the bot's impact on outcomes.
Merge rate is the percentage of enriched PRs in MERGED state (vs. CLOSED without merge or still OPEN).
Time to merge is the average hours between PR creation and merge, computed only for merged PRs. Products where no enriched PRs have been merged show “—”.
Products with fewer than 10 enriched PRs are excluded from the comparison table to avoid misleading statistics from insufficient data.

👍 Rate & Reaction Data#

When someone reads a bot's inline review comment on GitHub, they can react with emoji — including 👍 and 👎. This includes human developers, but also coding agents and automation that can be instructed to react to review comments. We track these reactions and compute two metrics:

👍 Rate = thumbs_up / (thumbs_up + thumbs_down) × 100

Of all 👍 and 👎 reactions on a bot's comments, what percentage are 👍? Higher means people who react tend to agree with the bot's suggestions.

Reaction Rate = comments_with_reactions / total_comments × 100

What percentage of a bot's comments received any 👍 or 👎 reaction at all? This gives context to the 👍 Rate — a 95% 👍 Rate means something very different if 0.5% vs. 10% of comments get reactions.

Important caveats

Most comments get zero reactions. The Reaction Rate shows what percentage actually do. Without it, the 👍 Rate is uninterpretable — a bot could show 100% 👍 Rate based on just a handful of reactions across thousands of comments.
Minimum threshold. Bots with fewer than 30 total 👍+👎 reactions show “—” instead of a percentage. Below this threshold the data is too sparse to be meaningful.
Selection bias. People who take the time to react are not representative of all readers. Happy users might 👍; annoyed users might just ignore the comment. Or vice versa. The signal is noisy.
Not a quality metric. A 👍 could mean “good catch, I'll fix it” or just “thanks for reviewing.” A 👎 could mean “bad suggestion” or “I disagree with the approach.” Neither tells you whether the code was actually changed.
Large PRs may be incomplete. The GitHub API returns at most 100 review threads per request. For the rare PR with more than 100 threads, we save the first 100 and move on — any bot comments beyond that are missed. This means reaction counts for those PRs are undercounted. In practice, very few PRs hit this limit.
No fix rate. We don't track whether a bot's suggestion was addressed by a subsequent commit. That would require analyzing commit diffs relative to comment content — a much harder problem we don't attempt. The 👍 Rate measures reaction sentiment, not whether suggestions are acted on.

What's NOT Tracked#

Private repositories

GH Archive only captures public GitHub events. Bots may be far more active on private repos, especially in enterprise settings.

Check runs and status checks

Some tools post analysis results as CI check runs or commit statuses (CheckRunEvent/CheckSuiteEvent/StatusEvent). These are not tracked. This affects even bots we do track — for example, Sentry posts a status check when it reviews a PR and finds no issues, so those “clean” reviews are invisible in our data. Tools like SonarQube and DeepSource report exclusively through check runs and are not tracked at all.

Bot-created pull requests

AI tools like Devin, Sweep, and Seer by Sentry create pull requests rather than review them. PullRequestEvent is a different signal and is not tracked.

Non-bot accounts

Some AI tools operate through regular GitHub user accounts rather than App bot accounts. These are not distinguishable from human users in GH Archive data. Where we know about these accounts, we track them explicitly — for example, GitHub Copilot appears as both copilot-pull-request-reviewer[bot] and the regular user account Copilot, and we include both in our tracking. Any non-bot accounts we don't know about are counted as human activity.

Untracked bot accounts

We maintain a curated registry of AI code review bot accounts. Any bot not in this registry is excluded from the AI share numerator. If it has a [bot] suffix, it's also excluded from the denominator (so it doesn't affect the percentage either way). If it uses a regular user account, it falls into the human count.

Products vs. Bots#

A product is a company or tool (e.g., “Qodo”), while a bot is a specific GitHub App account (e.g., qodo-merge-pro[bot]). Some products operate multiple bot accounts:

Qodo: codium-pr-agent[bot], qodo-merge[bot], qodo-merge-pro[bot]
Sentry: sentry[bot], seer-by-sentry[bot], codecov-ai[bot]
LinearB: gitstream-cm[bot], linearb[bot]

Product-level rankings aggregate activity across all of a product's bot accounts.

Bot Registry#

The canonical list of tracked bots lives in pipeline/src/bots.ts in our GitHub repository. This file defines every product and its associated bot accounts — adding a new bot is as simple as adding an entry there.

We also run an automated discovery process (discover-bots) that scans GH Archive for new bot accounts performing code reviews on public repositories. Discovered candidates are reviewed manually before being added to the registry. This helps us stay up-to-date as new AI code review tools emerge.

Product descriptions and comparison blurbs were generated by Claude (Opus 4.6) based on research of each product's public website and documentation. They aim to highlight what makes each tool's approach to code review distinctive — for example, whether it's a dedicated reviewer, part of a broader platform, or focused on a specific aspect like production safety or refactoring. If you spot an inaccuracy or want to suggest a better description, open a pull request or use the Feedback widget below.

Known Data Gaps#

GH Archive is our sole source for trend data, and it has known data-collection issues that affect our charts. These are upstream problems we cannot fix — the raw event counts in BigQuery are lower than reality for the affected periods.

May 24, 2025 — permanent ~35% drop in captured events

Starting May 24, 2025 the number of events captured by GH Archive dropped by roughly 35% and has not recovered. The GH Archive crawler only fetches the first page of the GitHub Events API. Analysis of event IDs shows the archive has always missed some events, but the miss rate increased sharply on this date — likely due to a server-side change at GitHub. This is tracked in gharchive.org issue 310 (open, unresolved).

Impact: All absolute event counts (bot and human) after May 24 are ~35% lower than they should be. Because both sides are affected proportionally, the AI Share percentage remains approximately correct — ratios are preserved even when the underlying counts are undercounted.

Oct 9–14, 2025 — near-total outage (5 days)

GitHub introduced a cache on the Events API that caused certain API tokens — including the one used by the GH Archive crawler — to see stale data. Event capture dropped from ~2.7 million events/day to ~18,000 events/day (a 99% reduction) for five days. GitHub Support confirmed the issue and disabled the cache. Normal collection resumed October 15.

Impact: The weeks of October 6 and October 13 show dramatically lower counts, visible as a sharp dip in all volume charts. The AI Share percentage for those weeks is also unreliable since the missing events may not be evenly distributed between bot and human activity.

These gaps are inherent to GH Archive. We do not attempt to interpolate, estimate, or backfill the missing data — what you see in the charts is exactly what GH Archive captured. If GH Archive recovers to full event coverage in the future, our next pipeline backfill will automatically reflect the corrected data.

Comparison with Other Trackers#

If you've seen different rankings on other trackers, it's likely because:

Different time windows: We show all-time cumulative totals by default. Other trackers may show rolling 7-day or 30-day windows, which favors bots with recent surges.
Different event types: Some trackers only count PullRequestReviewEvent. We track all four signal types separately (including emoji reactions), giving a more complete picture.
Different bot coverage: We track dozens of products and bot accounts. Other trackers may include different sets.

Who's Behind This#

I'm Bruno Garcia. I work at Sentry on the code review part of Seer, Sentry's AI debugging agent. Seer does root cause analysis, code review, and fix generation — using the context Sentry already has about your running application (errors, traces, logs, profiles) to find and fix bugs. The code review piece specifically looks at pull requests and predicts issues before they ship.

This is a personal project, not a Sentry product. I have an obvious bias — Sentry is one of the tracked bots — so I want to be upfront about that. The data is public, the code is public, and every bot gets the same treatment.

The motivation is simple: I wanted to understand how this space is evolving. Back in 2018 when I joined Sentry, I built NuGet Trends (now part of the .NET Foundation) to track adoption of the Sentry .NET SDK I was working on. Same idea here — scratch your own itch, make the data available, and maybe it's useful to others too.

If you spot a missing bot, have questions about the methodology, or just want to say hi: open an issue, or find me on X / Bluesky.

View data collection status →