Methodology

Korpora ships three products with three methodologies. The cross-channel agent mindshare report is the flagship. The MCP server agent-selection battery and the MCP server agent-readiness audit are free secondary utilities aimed at developers shipping MCP servers. All three methodologies prioritize mechanical verifiability over editorial judgment.

Cross-channel agent mindshare report

The flagship product. Measures how a brand surfaces in AI assistant recommendations against its real competitors, then triangulates that against the traditional buyer channels every CI tool currently tracks. Output is a ~10-page report with foundation data, agent layer findings, conversion ratios, per-rival capability map, and an engineering-actionable fix list.

Continuous re-measurement

Every brand we report on is re-measured on every major model release (Claude, GPT, Codex). Training cutoffs shift, agent rankings shift, conversion ratios shift; a one-shot report goes stale within months of the next model generation. Korpora treats measurement as an ongoing series rather than a single-point snapshot, and reports surface deltas between the most recent snapshot and the prior one so the reader can see what moved when a new model shipped.

Foundation channels

For each subject we pull traditional buyer signals across up to seven channels: Google monthly search volume (from Google Keyword Planner data), Reddit conversation across the buyer's vertical-specific subreddits (via Reddit's public JSON endpoints with ScraperAPI fallback), X engagement on brand-tagged tweets via Xquik, Hacker News stories and comments via the Algolia search API, GitHub repositories matching brand keywords via the Octokit search API, arXiv papers matching brand keywords via the arXiv API, and G2 reviews where the category is contested. The G2 channel is included only when the competitive set has measurable presence; categories where no competitor has reviews are skipped rather than reported as ties, because a 0-vs-0 row signals 'wrong channel for this category' rather than 'tied positioning.' Channel selection per subject reflects where the buyer-discovery surfaces actually live: a B2B SaaS subject will typically use G2 + Reddit + Twitter heavily, a developer-tools subject will weight GitHub + Hacker News + Reddit heavily, an academic-adjacent subject will weight arXiv + GitHub heavily.

Agent layer measurement

12-48 install-decision queries (depending on subject scope) covering four framings: direct comparison, use-case specific, open-ended discovery, and differentiation-shaped. Each query runs against Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.5, and GPT-5.3-Codex using a recommendation-prompted format: 'A user asks [query]. Give your top recommendations. Name 2-4 specific products.' Raw responses captured verbatim. Brand extraction via case-insensitive regex with per-brand override matchers for tricky cases (multi-word names, brand names that collide with common English words).

The agent layer as universal aggregator

LLM training corpora are sourced from open foundation channels: Reddit, Hacker News, GitHub READMEs and source, academic papers, public blogs, X. The agent layer is therefore not a separate universe from the foundation channels we measure; it is a weighted aggregation of them, with weights that vary per model family (Claude weights X less than GPT does; Codex weights GitHub heavier than both). Two implications follow. First, optimizing for AI mindshare is upstream of, not in competition with, foundation channel work; the same investments serve both. Second, only the indexable subset of foundation channels feeds the aggregation: gated, login-walled, and video-without-transcript content does not propagate, so the AI layer's universality is universal across the open subset only, not across every channel a brand might invest in.

Training cutoffs and the content deadline

Every model in any report has a training data cutoff documented in the provider's model card. The current battery's cutoffs: Claude Sonnet 4.6 and Opus 4.5 at August 2025, Claude Sonnet 4.5 and Haiku 4.5 at July 2025, OpenAI GPT-5.5 at December 1 2025, GPT-5.3-Codex at August 31 2025. The battery-wide aggregate cutoff is the latest of these, December 1 2025. Three things follow. First, conversion ratios in our reports measure historical training efficiency on a 6-12 month lag rather than real-time, because foundation activity in the months after cutoff is not yet in any measured corpus. Second, the lag is also a deadline rather than a drag. Content shipped today is what trains the next model generation, expected late 2026 or early 2027. Brands that delay foundation-channel investment until next-gen models ship will be 12-18 months behind, because the content needed to influence those models had to enter the corpus before its training cutoff. This narrow window is precisely why the fix-list recommendations in each report are time-sensitive in a way that traditional CI recommendations are not. Acting on them now compounds into the next training cycle ahead of competitors who wait for visible mindshare changes before investing. Third, our measurement protocol uses direct API calls without web search tools enabled. This matches the experience of buyers using Claude Code, Codex CLI, API integrations, and recommendation-shaped queries on Claude.ai or ChatGPT.com which typically do not auto-trigger search. Web-search-enabled queries on consumer surfaces blend training corpus with live results and surface recent news events for recency-sensitive question types. Our methodology measures the durable training-corpus layer, not the blended-live layer; a future methodology variant could measure both as contrasting signals.

Layer 1 measurement: customer-side only, no data egress

For brands that ship MCP servers, Layer 1 measurement (real inference-time tool invocations) runs entirely on the customer side. We do not accept Layer 1 data from customers in this iteration. The recipe is a copy-paste pattern the customer adds to their own MCP server: wrap each tool handler with a counter, accumulate daily aggregates in-memory per (date, tool, model), expose the aggregates to whatever internal dashboard or analytics the customer already runs. No per-event, per-session, per-user, or per-query data is captured at any point. No data is sent to us at any point: the recipe contains zero network calls. Customer's legal team can audit the ~50 lines of pattern code in a single sitting; no vendor dependency, no supply chain, no update risk. We retain a reference implementation in the public repo (@agentbff/layer1-reference) for code review, but it is explicitly not a recommended dependency. Customers who want to share Layer 1 aggregates with us can do so via whatever explicit out-of-band mechanism they choose; there is no protocol or auth path that lets a customer push data to our servers.

Layer 1 recipe (TypeScript)

The wrap pattern in under 50 lines. Copy into your MCP server, modify freely, no dependency on us. Aggregate shape: per (date × tool_name × model_label) → { invocation_count, success_count }. Read your aggregates via getAggregates() and integrate with your own analytics, internal dashboard, or whatever fits your stack. The pattern works equally well in Python, Go, or any language; the data shape is the only contract that matters. Aggregates stay in your process; nothing reaches Korpora unless you choose to share specific numbers via whatever channel you prefer.

// Drop this into your MCP server (or any TypeScript service that
// invokes tools on behalf of agents). Zero dependencies, zero
// network calls, zero persistence. Aggregates live in memory; you
// decide what to do with them.

type Bucket = {
  date: string;        // YYYY-MM-DD
  tool: string;
  model: string;       // "(unknown)" if your transport doesn't surface it
  invocations: number;
  successes: number;
};

const buckets = new Map<string, Bucket>();

function dayKey(d = new Date()): string {
  return d.toISOString().slice(0, 10);
}

function record(tool: string, model: string | null, success: boolean) {
  const date = dayKey();
  const m = model ?? "(unknown)";
  const key = `${date}|${tool}|${m}`;
  const b = buckets.get(key) ?? { date, tool, model: m, invocations: 0, successes: 0 };
  b.invocations++;
  if (success) b.successes++;
  buckets.set(key, b);
}

export function wrap<TArgs extends unknown[], TResult>(
  toolName: string,
  resolveModel: (args: TArgs) => string | null,
  handler: (...args: TArgs) => Promise<TResult>,
): (...args: TArgs) => Promise<TResult> {
  return async (...args) => {
    const model = resolveModel(args);
    try {
      const out = await handler(...args);
      record(toolName, model, true);
      return out;
    } catch (e) {
      record(toolName, model, false);
      throw e;
    }
  };
}

export function getAggregates(): Bucket[] {
  return [...buckets.values()].map((b) => ({ ...b }));
}

// Usage in your MCP server:
//
//   server.registerTool("submit_signal", spec, wrap(
//     "submit_signal",
//     (args) => null, // or extract model label from your transport
//     async (args) => {
//       // your actual handler
//       return result;
//     },
//   ));
//
// Read aggregates whenever you want:
//
//   const today = getAggregates().filter((b) => b.date === dayKey());
//   console.log(today);
//
// Share with us by including the numbers in your next report-feedback
// email, dropping them into a shared doc, or just keeping them
// internal — your call.

MCP-shipping brands and the three layers of agent effect

For brands that ship Model Context Protocol servers, the relationship between brand actions and agent-layer outcomes runs through three distinct layers, each with different timing properties. Layer one is inference-time direct effect: when a user runs Claude Desktop, Codex CLI, Cursor, or any MCP-aware client with the subject's MCP installed, the agent loads the MCP's tool descriptions and capability summary fresh into every session. This effect is immediate and dominates agent behavior around the subject for users who have the MCP configured, regardless of when the MCP shipped or what the model was trained on. Layer two is web-search-enabled inference: on agent surfaces with web search auto-triggered, queries about the subject's MCP can surface live registry listings, GitHub repos, blog posts, and recent discussions even when the MCP itself was never in the training corpus. Layer three is training-corpus meta-content: announcements, GitHub commit history, README updates, HN and Reddit discussion of the launch, dotfile repos where users commit Claude Desktop or Cursor configs listing the subject's MCP, cross-references from MCP catalog pages. This propagates to the next training cycle on the same 6-12 month lag as foundation channels. Our current methodology measures layer three only, with direct API calls and no MCPs installed in the test environment. We systematically under-credit MCP-shipping brands relative to their actual agent-layer presence today, especially when the MCP shipped after the current model training cutoffs. The reported mindshare reflects what the corpus captured, not the layer-one effect on installed users or the layer-two effect on web-search-enabled queries. A future methodology variant could measure layer one explicitly by running the same query battery with the subject's MCP installed in the test environment, then computing the delta against the MCP-off baseline; that delta would quantify MCP installation lift, a metric that does not exist anywhere else in the CI landscape.

Conversion ratios (cutoff-split)

The metric that matters more than mention volume. We compute mindshare per Reddit mention, mindshare per X engagement, mindshare per HN thread: percentage points of AI mindshare divided by the count of foundation-channel mentions that fall BEFORE the battery's training cutoff. Splitting at the cutoff is essential because current AI mindshare reflects a corpus snapshot from the cutoff date; foundation activity after the cutoff cannot have affected the measurement. Older versions of this report divided by the full 12-month rolling count, which conflated causal pre-cutoff content with post-cutoff content that cannot yet be in the corpus, and consistently underrepresented true conversion efficiency. Reports now show both the true historical ratio (using pre-cutoff foundation count, the methodologically correct denominator) and the legacy ratio (full 12-month count, for comparison). A brand with technical-content density typically converts each mention into 3-15x more AI mindshare than a brand with promotional-content density. This ratio quantifies the moat: a brand winning conversion-per-mention has structural advantage that survives competitor volume increases.

Foundation velocity (next-cycle leading indicator)

Every scraper captures the original-content timestamp on every record (Reddit created_utc, Xquik tweet createdAt, HN created_at_i, G2 review date). We use these to compute a per-brand per-channel velocity ratio: foundation activity rate in the 6 months ending at cutoff compared against foundation activity rate from cutoff to report date, normalized to monthly rates. A velocity ratio above 1.0 means the brand is accelerating into the next training cycle. Below 1.0 means decelerating. Reports suppress the velocity number when the pre-cutoff sample count is below 10, displaying "insufficient pre-cutoff data" instead, and suppress the true historical conversion ratio when the pre-cutoff count is below 5, so small-sample noise never produces a misleading large ratio. Velocity also reports a 95% confidence interval using the log-normal approximation for ratios of two Poisson rates; CIs that straddle 1.0 mean the direction is uncertain at the current sample size. Brands with velocity > 1.0 on the highest-corpus-weight channels are the ones to bet on for next-cycle gains.

Currently-measured model cutoffs

Sourced from official provider documentation. Updated when new models ship. Rendered from MODEL_CUTOFFS_MS in the code, so this list is always in sync with the assembler.

Cutoff (UTC)	Model
2025-12-01	gpt-5.5 (and 2 aliases)
2025-08-31	codex (and 5 aliases)
2025-07-31	haiku (and 2 aliases)

Per-channel corpus weights

Weights applied to per-channel pre-cutoff counts when computing the weighted composite conversion ratio. A weight of 1.0 is the reference baseline; above means higher per-record corpus contribution, below means lower. Calibrated estimates anchored on public corpus composition, not measured constants. Rendered from CHANNEL_WEIGHTS in code.

Channel	Weight	Rationale
hackernews	4.0×	HN is disproportionately represented in technical LLM training data. Threads are public, indexable, signal-dense, and frequently cross-referenced by other corpus sources. A single front-page HN story typically has 5-10x the corpus footprint of a Reddit post with equivalent engagement.
reddit	1.5×	Reddit is a heavy contributor to LLM corpora (the Pushshift dump alone trained multiple foundation models). Per-record weight is moderate because the corpus only captures a subset of moderated subreddits at high quality, and discussion length varies.
twitter	0.5×	Twitter / X has lower per-record corpus weight than Reddit or HN. Tweets are short, often ephemeral, and less reliably crawled. Some threads make it into corpora via quote-summarization in news articles or aggregator sites, but the long tail does not.
g2	0.3×	G2 reviews live behind a single domain that is crawlable but not heavily over-represented in known training corpora. Useful as a market-perception signal but a small contributor to per-record AI mindshare conversion.
google-trends	0.0×	Google Trends produces data points (search interest over time), not publishable content. Trend data does not feed AI training corpora directly. Excluded from weighted composite (weight 0).

Contested-channel-only principle

Channels where no competitor in the set has measurable presence are excluded from the report rather than reported as 0-vs-0 ties. A G2 review tie at zero, or a TechCrunch mention tie at zero, tells the reader nothing about positioning. It tells them the channel isn't a buyer-discovery surface for this category. We surface that exclusion explicitly in the methodology section of each report so the reader knows what was probed and dropped.

Per-framing analysis

A brand's headline mention share averages across query types that may favor or disfavor that brand independently. Splitting by framing (direct comparison, use-case, discovery, differentiation) reveals where a brand is genuinely competitive versus where its overall share is artifact. Discovery-query mindshare in particular tends to surface invisible category gaps for brands that win specific-use-case queries.

Wilson 95% confidence intervals

Every per-query and aggregate share carries a Wilson 95% CI. Point estimates at n=4 per query are directional; CIs make the uncertainty explicit. Aggregate mindshare across 48 observations typically has CI width of ±10-15pp; per-query shares have CI width up to ±50pp. The CI structure makes any number in the report defensibly challengeable by the reader's engineering team.

Per-rival deep dives

For each meaningful rival in the agent layer (typically the top 2-3 by mention share), the report includes a capability map: what they claim publicly, what they ship, named customer logos, public outcome metrics, and what content surfaces they're investing in. The map compares each rival against the subject on the same dimensions so the reader can see exactly where the moat or gap is.

Engineering-actionable fix list

6-8 prioritized fixes with target metrics, not generic advice. Each fix specifies the concrete content piece, specific subreddit or publication to invest in, framing tweak with verbatim suggestion, or specific ship event the subject should run. Each carries an expected impact metric (e.g., 'discovery-query mindshare from 41% to >=65% over 12 months') and rough effort estimate.

MCP server agent-selection battery

Free secondary product for MCP server maintainers. The same query battery approach as the cross-channel report, but narrowed to MCP server selection specifically and delivered as a free 1-2 page focused Discovery Report on the single intent where the server has the most leverage.

Subject investigation

We pull your server's tool surface (directly from the MCP endpoint for hosted servers, or from the repo for GitHub-distributed ones), README, install method, recent merged PRs over 90 days, open milestones, and changelog. The roadmap signal is structured into an ambition vector that grounds later recommendations.

Intent derivation

A Sonnet-class model reads your tool surface and produces 8-14 owned intents (the kinds of queries your server is built to serve) plus 8-14 adjacent intents (intents an agent might route to you even though you're not built for them). All intents are derived in your buyer's language, not from a fixed taxonomy.

Competitor discovery, verified

For each intent, we discover 3-6 candidate competitor MCP servers and verify every one against the GitHub API before adding to the pool. No hallucinated rivals reach the selection battery.

Open-world chooser, then closed-list battery

Every query first goes through an open-world chooser (no candidate list) that decides whether an MCP tool is even the right primitive for that query. Tool-irrelevant queries (answerable from training, ~10-30% of generated queries) are filtered out so they don't dilute the closed-list signal. Surviving queries go through the closed-list chooser with the discovered competitor pool.

Multi-model, multi-sample

The closed-list battery runs across multiple model families (Claude Sonnet, Haiku, Opus, and OpenAI GPT-5-class). Sampling happens at production-default temperature=1.0 to match what real Claude Desktop, Claude Code, Cursor, and Codex CLI users experience, including the inherent stochasticity. Aggregate win rates over 40+ queries are stable to ±3-5pp; per-query picks have a documented ~23% noise floor that is part of what we measure.

Leverage scoring

A composite score across five signals picks the single intent for the free Discovery Report deep dive: roadmap alignment (does this gap map to active or stated ambition?), pool-served quality (is the closed-list signal trustworthy?), sweet-spot loss magnitude (room to move without being structurally lost), tool-use rate (are agents reaching for tools on these queries at all?), and competitor concentration (is loss going to one targetable rival?).

MCP server agent-readiness audit

Free instant utility for MCP server maintainers. Different question than the selection battery above: this measures whether an agent CAN trust your server (build quality, discoverability, install ergonomics), not whether agents actually DO pick it. A static rubric over your repository, fully reproducible and mechanically verifiable.

Composite weights

MCP servers

Discoverability	15%
Installation	20%
Tool definitions	25%
Documentation	15%
Trust & adoption	25%

Discoverability (15%)

Can an agent find your server and tell what it does? We check for a clear repository description, GitHub topics, a name or topics that identify it as an MCP server, and a one-line summary at the top of the README.

Installation (20%)

Can an agent or its user actually wire it up? We check for a copy-pasteable MCP client config block in the README, an explicit install or run command (npx, uvx, pip, docker), a package manifest, and documented transport and environment setup.

Tool definitions (25%)

Can an agent parse the tools and pick the right one? We check for a server.json manifest, a README section that enumerates every tool, descriptions detailed enough to convey what each tool does and when to call it, and usage examples. This is the heaviest category: tool definitions are what an agent reads to decide.

Documentation (15%)

Is there enough for an agent to understand and succeed with the server? We check README depth, a clear statement of what the server is for, section structure, and versioned releases or a changelog.

Trust & adoption (25%)

Can an agent trust the server enough to recommend and run it? We check for an open-source license, appearances in public claude_desktop_config.json and mcp.json files, recent activity, multiple contributors, stars, tests or CI, tagged releases, and discussion on Hacker News or Reddit. Appearances in real config files are the closest thing to a verified adoption signal.

What we deliberately do not measure

Editorial quality. Too subjective and gameable.
Self-reported usage. Vendor claims without independent verification are noise.
Sentiment polarity inside AI recommendation text. We measure mention share, not whether the model said nice things. Inspection of every report's verbatim quotes confirms framing in context, but the headline numbers do not weight by sentiment.
Whether the tools work at runtime. The audit cannot execute the server; it measures how well the server is presented for an agent to choose, not runtime correctness.

Disputes

Found a methodology gap or want to challenge a signal definition? Methodology is meant to be argued with. Reach out at hello@korpora.ai and we'll document the change.