Become a design partnerApply now

← All posts

Replay Time Travelogue: Improving Nadia’s “Debugging with AI” Results Using Replay MCP

Mark Erikson·
StoryCase studies

In my previous post How Replay MCP Helped Find a React Bug Faster Than Dan Abramov Did, I showed how agents were able to successfully identify and fix a difficult React internals bug using just Replay recordings of the reproduction, and how different prompts and instructions resulted in faster analysis results.

However, most of us aren’t actually working on React’s internals :) We’re using React in apps , and dealing with app-level concerns: page loading times, network requests, good UX, and data correctness. What happens when you use AI to debug those kinds of problems? Can Replay MCP make agents better at finding real-world app bugs?

Background: Nadia’s “Debugging with AI” Example

Nadia Makarevich is author of the book Advanced React, as well as the excellent blog Developer Way, where she’s published numerous posts on React’s behavior and usage. She recently wrote a post titled “Debugging with AI: Can It Replace an Experienced Developer?”, where she tested LLM debugging abilities against an example Next.js app that had several representative bugs inserted to recreate problems she’d seen in real apps. She talked through her instructions to the LLM, how well it did solving each bug, and described where it came up with insufficient answers or just failed to solve the problem at all.

Nadia used Opus 4.5 to do realistic “blind” investigations on three different app bugs: a Zod schema error, a double loading skeleton, and a Next.js redirect error. She graded the LLM’s results in three ways:

  1. Did the LLM fix the problem?
  2. Did the LLM identify the root cause of the problem correctly?
  3. Did the LLM fix the root cause of the problem correctly?

Her results and final analysis indicated that while the LLM had some success, it struggled to understand the deeper architectural issues and either failed to identify the root cause or gave misleading answers:

User Page is Broken Double skeletons problem Weird Redirect Error
Did LLM fix the problem?
Did LLM identify the root cause of the problem correctly?
Did LLM fix the root cause of the problem correctly?

As she concluded:

💡 So, what's the verdict? Can AI replace an experienced developer for debugging?
No, of course not. AI is very good at pattern recognition, and it can be very, very useful. It will do exceptionally well on standard stuff: schema validation mishaps, forgotten null checks, common runtime errors, you've seen it for yourself already

But even if the fix works, I always retrace it step by step and make sure it's the actual root cause. If the fix doesn't work, however, I almost never iterate with the LLM, other than asking it to find me something in the repo. As you've seen, half of its responses in this case will be very confident hallucinations.

When the problem requires actual understanding of why the system behaves the way it does, or how it should behave, especially from a future or user perspective, the AI fails.

And the skill here isn't knowing how to prompt better. It's knowing when to stop prompting and start thinking.

I took this as a challenge :) I wanted to see if adding Replay recordings and Replay MCP to the mix would help an agent solve these problems any better. I also wanted to see if improving prompts and providing more context could produce better results.

Agentic Time-Travel Debugging with Replay MCP

Let’s step back and recap what Replay MCP actually is.

Replay is a time-travel debugger for web apps. The Replay recording browser captures DVR-style recordings of an entire web app. Because we capture the entire browser’s behavior and inputs, we can replay the app’s execution exactly the way it ran during the original recording. Unlike session replay or prod monitoring tools, Replay lets you do time-travel debugging and inspect the app’s behavior at any point in time: see which lines of code executed, evaluate log statements for every time a line ran, view the DOM tree and React component tree at any point in time, and much more. This makes it possible to investigate and solve bugs in ways no other tool can.

Replay DevTools is our debugging UI for humans - browser devtools with time-travel built in. Replay MCP gives agents those same time-travel debugging capabilities. Agents can open a recording and use the MCP tools to investigate the same way a human would: looking at console messages, adding logpoints to evaluate expressions each time a line of code ran, getting screenshots and stack traces, and getting framework-specific insights into libraries like React, Redux, Zustand, and TanStack Query. This means agents can now do the investigation work for you automatically!

Investigation Process and Setup

For this experiment, I used my own personal agent setup: OpenCode 1.4 and Opus 4.6. I have some file search and context management plugins enabled, but otherwise no specific skills or custom behaviors.

I wanted to run this as a proper experiment: how much do prompts and investigation instructions matter, and how much does Replay MCP help? Bug #1 was pretty trivial, but for Bug #2 and Bug #3, I kicked off four parallel agent investigation sessions. Each agent session was given the same access to the Replay recordings of the bug and a local copy of the demo app source, but with varying prompt instructions:

  1. Light details, bug repro, only told to “write a bug report with a root cause and suggested fix”
  2. Same repro steps, but a detailed 8-step investigation methodology including explaining why the problem is happening
  3. Additional context on the app’s architecture and patterns
  4. Additional relevant framework skills or concepts

How would they do? What differences would we see between them in results or investigation times?

Bug #1: Zod Schema Error in a User Profile

Bug #1 was pretty trivial. Navigate to /users/1 , and there’s a Zod schema error shown in both a UI error overlay and in the console messages. Nadia’s initial prompt was:

💡 When I navigate to the /users/1 route, the page shows a "Something went wrong" error. In the browser console, I see the following errors [copy-paste the entire error stack]. Server logs are clean. Can you fix it?

Her agent found this error immediately and correctly identified the fix: making the user address fields optional, and handling the type changes when rendering user.address .

My agent was given somewhat more detailed instructions on investigating and using the MCP, but this one was a gimme anyway. It called the RecordingOverview MCP tool, immediately saw the Zod error, used Screenshot to confirm the UI overlay, then used the NetworkRequest tool to check the actual API response and confirmed the address and phone fields were missing despite the schema marking them as required. It nailed the issue and cause in about 2 minutes.

Bug #1: Agent prompt

Task: Debug a Bug in a Replay Recording

Recording ID: b8c1cf71-0c97-40c2-8b46-93704e6e8407

You have access to a Replay.io recording of a Next.js application that has a bug. The bug manifests after the user navigates to the "User Profile" page.

Your Mission

Investigate the recording to:

  1. Identify what the bug is — what goes wrong from the user's perspective?
  2. Find the root cause — trace it back to the specific code/logic error
  3. Suggest a fix — what code change would resolve this?

Approach

Start with RecordingOverview to get the lay of the land, then use whatever Replay tools make sense:

  • Screenshot to see what the user saw
  • UserInteractions to understand what they did
  • ConsoleMessages for errors/warnings
  • NetworkRequest for failed/slow requests
  • ReactComponentTree / ReactRenders for component-level issues
  • UncaughtException / ReactException for crashes
  • SearchSources / ReadSource / DescribePoint / Evaluate for code-level investigation

Work methodically. Don't assume what the bug is — discover it from the recording evidence.

Deliverable

Write up your findings:

  1. Bug description — what the user experiences
  2. Root cause — the specific code/logic error, with evidence from the recording
  3. Suggested fix — concrete code change
  4. Key evidence trail — the chain of observations that led to the diagnosis

Keep it concise but thorough. This is a demo of Replay's debugging capabilities.

Bug #1: Agent final analysis

As expected — API routes run on the server and aren't in the browser recording. But we already have the response body, which is the definitive evidence.

Here's the complete bug analysis:


Bug Report

1. Bug Description

When the user navigates to the User Profile page (by clicking around ~7s into the recording), the page crashes and shows a Next.js error boundary instead of the user's profile. The user sees an error screen rather than their profile information.

2. Root Cause

Schema-API mismatch. The UserSchema Zod validator in src/lib/schemas.ts:238-255 requires fields that the /api/user endpoint doesn't return.

The schema expects (among other fields):

  • phone: z.string() (line 242) — required string
  • address: z.object({...}) (line 246-252) — required object with street, city, state, postalCode, country

The API returns:

{
  "id": 1,
  "name": "John Doe",
  "email": "john.doe@example.com",
  "department": "Engineering",
  "avatarUrl": null,
  "role": "admin",
  "joinedAt": "2023-03-15T09:00:00Z",
  "lastLoginAt": "2024-01-28T14:32:00Z"
}

No phone field. No address field.

The crash chain:

  1. User navigates to the profile page
  2. UserPage component (line 56) calls useUser() → TanStack Query calls fetchUser()
  3. fetchUser() in src/queries/useUser.ts:11 calls UserSchema.parse(data) on the API response
  4. Zod throws ZodError with two issues: phone expected string got undefined, address expected object got undefined
  5. TanStack Query surfaces this as error in the query result
  6. UserPage (line 62-63) checks if (error) and re-throws it: throw error
  7. Next.js ErrorBoundaryHandler catches the thrown error and renders the error screen
  8. Lines 66+ (actual profile rendering) execute 0 times — never reached

3. Suggested Fix

Option A: Fix the API — Add the missing phone and address fields to the /api/user response:

// In the API route handler for /api/user
return Response.json({
  id: 1,
  name: "John Doe",
  email: "john.doe@example.com",
  phone: "+1-555-0123",            // ADD
  department: "Engineering",
  avatarUrl: null,
  role: "admin",
  address: {                        // ADD
    street: "123 Main St",
    city: "San Francisco",
    state: "CA",
    postalCode: "94105",
    country: "US",
  },
  joinedAt: "2023-03-15T09:00:00Z",
  lastLoginAt: "2024-01-28T14:32:00Z",
});

Option B: Fix the schema — Make phone and address optional in UserSchema so it tolerates the current API response:

// src/lib/schemas.ts
export const UserSchema = z.object({
  id: z.number(),
  name: z.string(),
  email: z.string().email(),
  phone: z.string().optional(),              // was: z.string()
  avatarUrl: z.string().url().nullable(),
  role: z.enum(['admin', 'manager', 'user']),
  department: z.string().nullable(),
  address: z.object({                        // was: z.object({...})
    street: z.string(),
    city: z.string(),
    state: z.string(),
    postalCode: z.string(),
    country: z.string(),
  }).optional(),                             // ADD .optional()
  joinedAt: z.string(),
  lastLoginAt: z.string().nullable(),
});

Option A is better if the data should exist. Option B is better if the data is genuinely optional.

4. Key Evidence Trail

Step Tool Finding
1 RecordingOverview ZodError for phone and address fields in console errors at ~8.2s
2 ConsoleMessages (errors) Error originates in fetchUser at useUser.ts:15, caught by <UserPage> error boundary
3 UserInteractions User clicks at ~7s (navigating to profile page)
4 NetworkRequest (detail for /api/user) Response body has no phone or address fields, returns 200 OK
5 ReadSource (schemas.ts:238) UserSchema requires phone: z.string() and address: z.object({...}) — both non-optional
6 ReadSource (useUser.ts) fetchUser() calls UserSchema.parse(data) which throws on the missing fields
7 ReadSource (UserPage.tsx:55) Component re-throws the query error at line 63; lines 66+ (profile rendering) have 0 execution hits — never reached
8 Screenshot at 15s Confirms user sees an error boundary crash screen, not the profile

Bug #2: User Page Double Loading Skeleton

After fixing the first issue, Nadia described the second bug in the app: throttle the network connection, navigate from the root page to a user profile, and see two different loading skeleton flash on screen before the final content is displayed. But, loading the user profile directly only shows one skeleton.

Nadia’s prompt was:

💡 When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

She reported that her agent came up with a variety of answers, several of which involved switching from TanStack Query’s useQuery hook to useSuspenseQuery . That answer is half right, but switching to useSuspenseQuery uncovered a second problem.

The double skeletons had to do with mixing Next’s loading.tsx page Suspense boundary, with an app-level skeleton component based on the TSQ isLoading flag. However, just switching to useSuspenseQuery introduces a hydration mismatch error down the line. The real fix is architectural - actually prefetching data on the server, and hydrating the query hook on the client side with the same data.

Given that, I wanted to see: would an agent with Replay MCP actually find the real root cause and suggest server prefetching as the full solution?

I kicked off three agent sessions: Nadia’s baseline prompt, additional investigative method instructions, and method + some app architecture context. The results were partially an improvement on Nadia’s agent - here’s how they did.

Bug #2 Agent #1: Basic Prompt

Agent #1 started by inspecting the Replay recording to understand the overall behavior. It called RecordingOverview to get the initial details, UserInteractions to get a list of clicks, and and made numerous Screenshot to get visuals on the navigation. It then used ListSources and SearchSources to identify the Users page routes and checked for Suspense handling.

A ReadSource call for UserPage.tsx returned hit counts showing it had rendered twice, once with isLoading: true. The agent then made multiple ReactRenders tools calls to check the actual render sequence and triggers. The render details showed a mixture of Suspense and loading-state based causes, confirming the mismatch. Finally, it was able to read the local repo files on disk to confirm the available hooks.

Overall, this agent did a reasonable actual investigation, came up with real evidence, wrapped up the search and reported. The answer was directionally correct, just not complete enough.

Total time: 5 minutes

Bug #2 Agent #1 Prompt

💡 Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

Your Mission

Investigate the recording to:

  1. Understand what's happening in both scenarios (navigation vs refresh)

  2. Find the root cause of the double skeleton

  3. Suggest a fix

Tools Available

  • Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

  • The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording

Deliverable

Write up your findings:

  1. Bug description — what the user experiences

  2. Root cause — the specific code/logic error, with evidence

  3. Suggested fix — concrete code change

  4. Key evidence trail — the observations that led to your diagnosis

Bug #2 Agent #1 Solution

💡 Bug Report: Double Loading Skeleton on Client-Side Navigation to User Page

1. Bug Description

When navigating from the root page (/) to the User page (/users/[id]) via client-side navigation, the user sees two different loading skeletons in sequence:

  1. First, a simpler skeleton from loading.tsx (avatar circle + name bar + content area)

  2. Then, a more detailed skeleton from UserPageSkeleton inside the component (avatar + name + badge + separator + grid)

On a direct page refresh of the User page URL, only one skeleton appears. The visual "flash" between two different skeleton designs during navigation is jarring.

2. Root Cause

There are two independent loading mechanisms that fire sequentially during client-side navigation:

Mechanism 1 — Next.js loading.tsx Suspense boundary (src/app/users/[id]/loading.tsx)

The page component at src/app/users/[id]/page.tsx:5-10 is an async server component:

export default async function Page({ params }: { params: Promise<{ id: string }> }) {  
  const { id } = await params;  // <-- this `await` causes suspension  
  return <UserPage userId={id} />;  
}  

During client-side navigation, Next.js needs to fetch the server component payload. The await params causes the route segment to suspend, which triggers the loading.tsx Suspense fallback. This renders Skeleton #1 (commit 135 at 8.89s — RenderSuspended trigger, 145 DOM mutations).

Mechanism 2 — Component-level isLoading guard (src/components/pages/UserPage.tsx:57-59)

const { data: user, isLoading, error } = useUser();  // useQuery, NOT useSuspenseQuery  
  
if (isLoading) {  
  return <UserPageSkeleton />;  // <-- Skeleton #2  
}  

When the Suspense boundary resolves (server component payload arrives), UserPage mounts. The useQuery hook starts fetching /api/user and immediately returns isLoading: true, rendering Skeleton #2 (commit 136 at 9.02s — SuspenseResumed trigger).

Why refresh works fine: On a full page refresh, the async server component resolves server-side before sending HTML. The client only sees the useQuery loading state — there's no sequential Suspense → component loading transition.

3. Suggested Fix

The cleanest fix is to switch from useQuery to useSuspenseQuery so that the data fetching integrates with the existing Suspense boundary. The useUserSuspense hook already exists in src/queries/useUser.ts:22-27 but is unused.

In src/components/pages/UserPage.tsx****:

- import { useUser } from '@/queries/useUser';  
+ import { useUserSuspense } from '@/queries/useUser';  
  
  export function UserPage({ userId }: UserPageProps) {  
-   const { data: user, isLoading, error } = useUser();  
-  
-   if (isLoading) {  
-     return <UserPageSkeleton />;  
-   }  
-  
-   if (error) {  
-     throw error;  
-   }  
+   const { data: user } = useUserSuspense();  

With useSuspenseQuery:

  • The component suspends while data is loading, keeping the loading.tsx skeleton visible

  • When data arrives, the component renders with the actual user data — no intermediate skeleton

  • Error handling is automatic (suspense queries throw to the nearest error boundary)

  • The UserPageSkeleton function and isLoading branch can be removed entirely (or kept for other uses)

  • Both navigation and refresh paths show a single, consistent loading skeleton from loading.tsx

Alternative fix (if you want to keep useQuery): Delete src/app/users/[id]/loading.tsx so only the component-level skeleton appears. But this is worse because you'd lose the instant loading feedback during navigation — there'd be a blank delay before the component even mounts.

4. Key Evidence Trail

Step Evidence Tool/Source
Identified two skeletons React commits 135 + 136 in the recording ReactRenders commits mode, time range 8.5-10s
Commit 135: RenderSuspended trigger loading.tsx fallback shown with 145 DOM mutations at 8.89s ReactRenders commit detail
Commit 136: SuspenseResumed trigger UserPage mounts, renders UserPageSkeleton (8 Skeleton children) at 9.02s ReactRenders commit detail
Commit 137: Data arrives UserPage re-renders with actual user data at 9.72s ReactRenders commit detail
useQuery vs useSuspenseQuery useUser() uses useQuery — doesn't integrate with Suspense Source: src/queries/useUser.ts
Unused suspense hook exists useUserSuspense() already defined but not imported Source: src/queries/useUser.ts:22-27
page.tsx is async await params causes server component to suspend during navigation Source: src/app/users/[id]/page.tsx

Bug #2 Agent #2: Investigative Process

Agent #2 was given a more detailed 8-step investigative process to follow. How much of a difference would that make?

Agent #2 started with the same initial tool calls: RecordingOverview to orient itself, UserInteractions and Screenshot to understand the usage. It also used NetworkRequest to check the timestamps of calls to /api/user and /api/current-user.

After calling ListSources to check the files in the recording, it opted to read them all from the local filesystem.

From there, it used ReactRenders to focus in on the critical Suspense-related renders and causes, but also started reading the various Next routes to see how they handled loading patterns. It identified that most routes did have an isLoading check, but only the users/[id] route had a loading.tsx file included, explaining why the bug is specific to the User page.

This agent came up with much better explanations and relevant details in its report. It identified a waterfall request happening, and found the loading skeleton mismatch. Its primary suggestion was the same incomplete useSuspenseQuery fix, but it did explicitly say “Ideal optimization would be server-side prefetching to eliminate the waterfall”.

Total time: 9 minutes

Bug #2 Agent #2 Prompt

💡 Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

Your Mission

Investigate the recording to find the root cause and suggest a complete fix.

Tools Available

  • Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

  • The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording

Investigation Process

Follow this structured approach:

  1. Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.

  2. Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.

  3. Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.

  4. Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.

  5. Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:

  6. Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.

  7. Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used?

  8. Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?

Deliverable

Write up your findings:

  1. Bug description — what the user experiences in each scenario

  2. Root cause analysis — the specific code/logic/architectural error, with evidence from the recording

  3. Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.

  4. Key evidence trail — the chain of observations that led to your diagnosis

  5. Remaining concerns — anything that your fix doesn't address, or that you're uncertain about

Bug #2 Agent #2 Solution

💡 Bug Report: Double Loading Skeleton on User Page Navigation

1. Bug Description

What the user sees:

  • Navigation (root → User page): Two visually distinct loading skeletons appear in sequence:

  • Page refresh on User page: Only one loading skeleton appears before the data renders.

The two skeletons have different layouts, sizes, and levels of detail, making the transition jarring.

2. Root Cause Analysis

The bug is a conflict between two independent loading mechanisms that both activate during client-side navigation but not during SSR/refresh.

The two loading sources:

# Source File Trigger
1 Next.js Suspense boundary src/app/users/[id]/loading.tsx Route navigation triggers Suspense while RSC payload + JS chunks load
2 Client-side isLoading check src/components/pages/UserPage.tsx:63-65 useUser() (non-suspense useQuery) starts fetching /api/user after mount

Why navigation shows two skeletons:

  1. User clicks "User" link → Next.js starts client-side route transition

  2. Commit 135 (8.89s): Next.js hits the Suspense boundary at users/[id]/loading.tsx, renders Skeleton #1 while loading the RSC payload and JS chunks

  3. Commit 136 (9.02s): RSC resolves, UserPage component mounts. But useUser() uses useQuery (not suspense-aware), so it returns isLoading: true immediately and renders Skeleton #2 (UserPageSkeleton)

  4. Commit 137 (9.72s): /api/user response arrives (fetched at 9172ms, took 548ms), UserPage re-renders with actual data

Why refresh shows one skeleton:

On a full page load, Next.js SSR renders the server component synchronously. The loading.tsx Suspense boundary either doesn't trigger or resolves instantly during SSR. The client hydrates with UserPage already mounted, and only the client-side useQuery loading state is visible.

The architectural error: UserPage uses useQuery (non-suspense) despite having a Next.js loading.tsx Suspense boundary in the route. This creates two competing loading state mechanisms. The codebase already has useUserSuspense() (using useSuspenseQuery) in src/queries/useUser.ts — it's just not being used.

3. Suggested Fix

Primary fix: Switch UserPage to use useSuspenseQuery and remove the manual loading state handling. This lets the existing loading.tsx Suspense boundary handle all loading UI.

src/components/pages/UserPage.tsx****:

- import { useUser } from '@/queries/useUser';  
+ import { useUserSuspense } from '@/queries/useUser';  
  
  export function UserPage({ userId }: UserPageProps) {  
-   const { data: user, isLoading, error } = useUser();  
-  
-   if (isLoading) {  
-     return <UserPageSkeleton />;  
-   }  
-  
-   if (error) {  
-     throw error;  
-   }  
+   const { data: user } = useUserSuspense();  
  
    if (!user) {  

The UserPageSkeleton function (lines 27-55) can also be removed from this file since it's no longer used — the loading.tsx skeleton handles that role.

Additional consideration: The loading.tsx skeleton is currently simpler/smaller than UserPageSkeleton. You may want to update src/app/users/[id]/loading.tsx to match the richer skeleton layout (24px avatar, separator, 2-column grid) so the loading state better matches the final rendered page.

Why this is the correct approach:

  • Other query files (useOrder.ts, useProduct.ts) also export both useX() and useXSuspense() variants — but those pages don't have loading.tsx files, so the non-suspense version works fine

  • The users/[id]/loading.tsx file exists specifically to be the loading UI for this route — it should be the single source of loading state

  • useSuspenseQuery integrates with React Suspense natively, which is exactly how Next.js loading.tsx works (it wraps the page in a <Suspense> boundary)

4. Key Evidence Trail

  1. Screenshots at 8742ms vs 9187ms showed two visually different skeletons appearing in sequence

  2. React render commits 135→136→137 showed the exact sequence: RenderSuspended (loading.tsx) → SuspenseResumed + UserPageSkeleton mount → StoreRerender with data

  3. Network timeline showed /api/user fetch doesn't start until 9172ms (~580ms after click), proving the client-side fetch only begins after the component mounts

  4. Source code confirmed useUser() uses useQuery while useUserSuspense() using useSuspenseQuery exists but is unused

  5. File comparison showed users/[id]/loading.tsx exists but orders/[id]/loading.tsx and products/[id]/loading.tsx do not — explaining why this bug is unique to the User page

5. Remaining Concerns

  1. Waterfall still exists with useSuspenseQuery****: The /api/user fetch still won't start until the client component mounts after RSC resolution. During the Suspense phase, the loading.tsx skeleton shows while the RSC payload loads, then Suspense resolves, UserPage mounts, useSuspenseQuery triggers the fetch, and Suspense suspends again while the fetch completes. This means the loading.tsx skeleton appears for the full duration (RSC + API fetch), which is better UX (one consistent skeleton) but the fetch itself is still sequential after RSC.

  2. Ideal optimization would be server-side prefetching: The best Next.js pattern would be to prefetch user data on the server (e.g., using TanStack Query's prefetchQuery in the server component at users/[id]/page.tsx) and dehydrate it. This would eliminate the client-side fetch waterfall entirely — the data would already be in the cache when useSuspenseQuery runs, so no second Suspense suspension would occur.

  3. loading.tsx skeleton fidelity: The current loading.tsx skeleton is less detailed than UserPageSkeleton. After the fix, users will see the simpler skeleton for the entire loading duration, which might feel like a regression in loading UI quality. Consider updating loading.tsx to match the richer layout.

Bug #2 Agent #3: App Architecture Summary

Agent #3 was given the same investigative steps, but also a few bullet points summarizing the known packages and structure of this app (Next.js + TanStack Query, mix of RSCs and hook loading).

Most of the investigative flow was the same: RecordingOverview, interactions, screenshots, sources, network requests. It then specifically browsed most of the query hook definitions and confirmed that Suspense-enabled query hooks existed, as well as checking more screenshots to confirm the loading skeleton visuals.

This agent came up with the fastest response. It still recommended useSuspenseQuery, but did also list server prefetching as a follow-up step. It also flagged OrderPage and ProductPage as having the same problematic pattern, and that the UserPageSkeleton component would be dead and could be removed.

Total time: 4 minutes

Bug #3 Agent #3 Prompt

💡 Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?

Your Mission

Investigate the recording to find the root cause and suggest a complete fix.

Tools Available

  • Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

  • The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording

Application Context

Pre-analysis of the repository shows:

  • Framework: Next.js 14+ using the App Router (app/ directory)

  • Data fetching: TanStack Query (React Query) for client-side data management

  • Component model: Mix of React Server Components (RSC) and Client Components

  • Loading states: Route-level loading.tsx files provide framework-managed loading UI

  • Route structure: The User Profile page is at src/app/users/[id]/, with its own loading.tsx

  • Query hooks: The app defines both standard and suspense-aware variants of its query hooks (e.g. useUser and useUserSuspense)

Investigation Process

Follow this structured approach:

  1. Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.

  2. Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.

  3. Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.

  4. Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.

  5. Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:

  6. Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.

  7. Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used? Pay particular attention to the boundary between server and client — where does data originate, and is it being passed efficiently across that boundary?

  8. Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?

Deliverable

Write up your findings:

  1. Bug description — what the user experiences in each scenario

  2. Root cause analysis — the specific code/logic/architectural error, with evidence from the recording

  3. Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.

  4. Key evidence trail — the chain of observations that led to your diagnosis

  5. Remaining concerns — anything that your fix doesn't address, or that you're uncertain about

Bug #2 Agent #3 Solution

💡 Bug Report: Double Loading Skeleton on Navigation to User Page

1. Bug Description

What the user experiences:

  • Navigation (root → User page): Two visually distinct loading skeletons appear in sequence:

  • Hard refresh on User page: Only one skeleton appears (the UserPageSkeleton from useUser()'s loading state). The transition is clean.

2. Root Cause Analysis

The problem has two layers:

Layer 1: Mismatched Suspense boundary (wrong skeleton appears)

During client-side navigation, Next.js lazy-loads the page component's JS chunk. While the chunk for app/users/[id]/page.js downloads (~323ms), React suspends. The framework looks for the nearest Suspense boundary with a fallback — but the route-level app/users/[id]/loading.tsx hasn't loaded yet either. So React falls back to app/loading.tsx (the root loading file), which shows a dashboard-style skeleton for a user profile page. This is visually jarring and contextually wrong.

Evidence from React commits:

  • Commit 135 (8.89s): RenderSuspended — React.lazy init for page.js chunk. 145 DOM mutations — this is the heavy swap to the root skeleton.

  • Commit 136 (9.02s): SuspenseResumed — chunk loaded, UserPage mounts.

Layer 2: Non-suspense data fetching creates a second loading phase

UserPage uses useUser() — a standard useQuery hook that returns { isLoading: true } initially. The component manually checks isLoading and renders <UserPageSkeleton />. This creates a sequential waterfall:

Navigation click (8.59s)  
  → RSC navigation (8.67s)  
    → Page JS chunk download (323ms)  
      → Root skeleton shown (Skeleton #1)  
        → UserPage mounts (9.02s)  
          → useUser() starts fetch  
            → UserPageSkeleton shown (Skeleton #2, 500ms API delay)  
              → Data arrives, final render (9.72s)  

On hard refresh, the JS chunk is included in the initial SSR payload, so the chunk-loading Suspense phase is skipped entirely. The user only sees the useUser() loading state.

Evidence from network timeline:

  • /api/user fetch doesn't start until 9172ms — 505ms after the navigation click

  • The 500ms API delay is on top of the chunk-loading waterfall

3. Suggested Fix

The app already has the correct solution defined but unused. Every query hook file exports a useSuspenseQuery variant. The fix is to use it.

Primary fix: Switch UserPage to use useUserSuspense()

src/components/pages/UserPage.tsx****:

// Change import:  
- import { useUser } from '@/queries/useUser';  
+ import { useUserSuspense } from '@/queries/useUser';  
  
// Change hook usage in UserPage component:  
- const { data: user, isLoading, error } = useUser();  
-  
- if (isLoading) {  
-   return <UserPageSkeleton />;  
- }  
-  
- if (error) {  
-   throw error;  
- }  
-  
- if (!user) {  
+ const { data: user } = useUserSuspense();  
+  
+ if (!user) {  

This change:

  1. Makes data fetching participate in React Suspense

  2. Next.js's app/users/[id]/loading.tsx becomes the single loading boundary for both chunk loading AND data fetching

  3. The user sees one contextually-correct skeleton (user profile card) instead of two mismatched ones

  4. The UserPageSkeleton inline component and isLoading check become unnecessary (can be removed)

  5. Error handling is automatic — useSuspenseQuery throws on error, which React error boundaries catch

Same pattern applies to other pages

OrderPage and ProductPage have the identical issue. They should also switch to their suspense variants (useOrderSuspense, useProductSuspense).

4. Key Evidence Trail

  1. Screenshots at 9063ms vs 9187ms showed two visually distinct skeletons in rapid succession

  2. React render commits 135-137 confirmed the three-phase sequence: suspend (chunk) → resume (mount + fetch) → data arrives

  3. Network timeline showed /api/user starting at 9172ms, 505ms after click — confirming the waterfall

  4. Source code revealed useUser() (non-suspense) being used despite useUserSuspense() being defined in the same file

  5. app/loading.tsx confirmed the root skeleton is a dashboard layout — wrong for the user page context

  6. All query hooks export unused suspense variants — the codebase was designed for this pattern but didn't follow through

5. Remaining Concerns

  • Other pages: OrderPage and ProductPage use the same anti-pattern (useOrder/useProduct + isLoading + manual skeleton). They'll exhibit the same double-skeleton bug on client-side navigation. The fix should be applied consistently.

  • Server-side prefetching: Even with useSuspenseQuery, the data fetch still starts client-side after the component mounts. The ideal Next.js pattern would be to prefetch on the server (in the RSC page component) and dehydrate into the client, so the TanStack Query cache is warm before the client component mounts. This would eliminate the data-fetching loading state entirely on navigation. But switching to useSuspenseQuery is the correct first step — it fixes the double-skeleton bug and properly integrates with framework loading boundaries.

  • UserPageSkeleton component: After the fix, the inline UserPageSkeleton in UserPage.tsx becomes dead code. The route-level app/users/[id]/loading.tsx handles all loading UI. The inline skeleton should be removed to avoid confusion.

Bug #2: Initial Results

After running those three sessions, I was pleased to see that the agents had at least found the initial problems and identified the same useSuspenseQuery fix that Nadia’s agent had suggested. But, as she pointed out, that “fix” is incomplete and actually leads to hydration errors.

Meanwhile, it was clear that adding more codebase context and better instructions produces higher quality answers and faster results. Even just adding a few bullet points describing the rough loading patterns used in the app let Agent #3 zero in on the problem much faster.

So, what if I tried giving the agent more info on the actual technologies used in this app? More specifically, what if I gave it access to the official SKILL.md files for Next.js and TanStack Query? How would that change the results? Would the agent be able to identify the real root cause and suggest server prefetching as an actual solution?

Bug #2 Agent #4: Skill Files

I spun up Agent #4 with the same instructions as before (bug report, investigation steps, app context), but added in two extra sections: “CI context” saying this was a manual repro of a known E2E test failure and listing project dependencies, and specific pointers to the SKILL files for TanStack Query, Next.js App Router data fetching, Next.js Suspense boundaries, and Vercel’s “React best practices”.

This agent started by actively reading the skill files. From there, most of the investigation followed the same paths: Replay MCP tools to scan the recording, get screenshots, network requests, sources, and React renders. It also cross-checked all of the query hook files and the different app pages.

However, this time the result was different. While it still recommended useSuspenseQuery as the primary fix, the agent explicitly recommended “Deeper fix: eliminate the waterfall with server-side prefetching” and gave a copy-pasteable example showing use of queryClient.prefetchQuery() in the server component that exactly matched Nadia’s recommended fix! That is the correct answer!

This agent also identified several other concerns, including the Order and Product pages having the same issue, the query key not including userId, and other loading state issues.

Total time: 5 minutes

Bug #2 Agent #4 Prompt

💡 Task: Debug a Bug in a Replay Recording

Recording ID: c832545f-a41b-4cd0-9e59-e229b7ca0b7f

You have access to a Replay.io recording of a web application that has a bug.

Bug report from user:

When I'm on the root page and navigate to the User page, I see two different
loading skeletons appear. However, when I just refresh the User page, I see
only one. Why is that and how to fix it?

Your Mission

Investigate the recording to find the root cause and suggest a complete fix.

Tools Available

  • Replay MCP tools for analyzing the recording (use replay_ prefixed tools)

  • The application source code is available at ~/projects/temp/debugging-with-ai
    if you need to read files that aren't accessible in the recording

CI Context

This was reported from a navigation E2E test for the User Profile page that
detected unexpected loading state behavior. The recording captures a manual
reproduction of the issue.

Key dependencies:

  • Next.js ^15.5.12 (App Router)

  • React ^19

  • @tanstack/react-query ^5.90.16

  • TypeScript ^5.3.0

Application Context

Pre-analysis of the repository shows:

  • Framework: Next.js 14+ using the App Router (app/ directory)

  • Data fetching: TanStack Query (React Query) for client-side data management

  • Component model: Mix of React Server Components (RSC) and Client Components

  • Loading states: Route-level loading.tsx files provide framework-managed
    loading UI

  • Route structure: The User Profile page is at src/app/users/[id]/, with
    its own loading.tsx

  • Query hooks: The app defines both standard and suspense-aware variants of
    its query hooks (e.g. useUser and useUserSuspense)

Technology Skills

You have been provided with best-practices skills for the technologies in this
app. Read the smaller skill files before starting your investigation — they
contain framework-specific patterns and idioms that may be relevant:

  • ~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query
    patterns including SSR, hydration, and query hook variants

  • ~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App
    Router data fetching patterns and decision trees

  • ~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense
    boundary and CSR bailout rules

  • ~/projects/temp/react-skills/skills/react-best-practices.md — Comprehensive
    React best practices (LARGE file, ~3800 lines). Do NOT read this fully — scan
    headers with rg "^#{1,3} " first, then read only sections relevant to the
    issue you're investigating.

Investigation Process

Follow this structured approach:

  1. Establish the user experience. Use screenshots and interaction timeline
    to understand exactly what happens visually in both scenarios (navigation vs
    refresh). Pin down timestamps.

  2. Map the data flow. Identify what data the page needs, where it comes
    from, when fetches start relative to navigation, and what components consume
    the results. Use network requests + source code to build this picture.

  3. Build the render timeline. Use React render analysis to understand what
    components mount, unmount, and re-render — in what order, triggered by what.
    Compare the component render sequence between the two scenarios if possible.

  4. Compare the two scenarios explicitly. Navigation and refresh produce
    different results. What's concretely different in the execution path? Don't
    just explain one path — explain why they diverge.

  5. Find the proximate cause, then go deeper. After identifying the immediate
    code-level issue, ask yourself: "Does my proposed fix fully resolve the
    problem, or does it just reduce the symptom?" Consider:

  6. Check for patterns. Look at how other pages/routes in the app handle data
    fetching and loading states. If there's a pattern that works correctly
    elsewhere, the fix should align with that pattern.

  7. Consider framework idioms. Is the code using the framework's intended
    data flow patterns? If the framework provides a mechanism for this exact
    scenario, is it being used? Pay particular attention to the boundary between
    server and client — where does data originate, and is it being passed
    efficiently across that boundary?

  8. Red-team your conclusion. Before presenting your fix, challenge it: What
    would still be suboptimal after applying it? Is there a remaining waterfall
    or unnecessary loading state? Would an expert reviewing this fix say "yes,
    but you should also..."?

Deliverable

Write up your findings:

  1. Bug description — what the user experiences in each scenario

  2. Root cause analysis — the specific code/logic/architectural error, with
    evidence from the recording

  3. Suggested fix — concrete code change(s). If there are multiple layers to
    the fix (immediate + architectural), describe both.

  4. Key evidence trail — the chain of observations that led to your diagnosis

  5. Remaining concerns — anything that your fix doesn't address, or that
    you're uncertain about

Bug #2 Agent #4 Solution

💡 Bug Report: Double Loading Skeleton on User Page Navigation

1. Bug Description

What the user sees:

  • Navigating to the User page (e.g., from dashboard): Two visually distinct loading skeletons appear in sequence — a simpler card skeleton (~130ms), then a more detailed profile skeleton (~700ms) — before the actual content renders.

  • Refreshing the User page directly: Only one skeleton (the detailed profile skeleton) appears, then content renders.

The two skeletons have different layouts (different card structure, different number of skeleton elements), creating a jarring visual transition during navigation.

2. Root Cause Analysis

The bug stems from two independent, overlapping loading mechanisms competing for the same UI slot:

Loading mechanism #1: Next.js route-level loading.tsx

src/app/users/[id]/loading.tsx provides a Suspense fallback for the route segment. During client-side navigation, Next.js lazy-loads the page chunk via React.lazy. While the JS chunk is downloading (323ms), React suspends and shows this loading.tsx skeleton.

Loading mechanism #2: Component-level isLoading check

UserPage (a 'use client' component) uses useUser() — which is standard useQuery, NOT useSuspenseQuery. This means:

  • The component renders immediately with isLoading: true

  • It manually returns <UserPageSkeleton /> during loading

  • When the fetch completes, it re-renders with actual data

Why navigation shows two skeletons:

Time Event Visible
8588ms User clicks navigation link
8667ms RSC navigation fetch starts
8764ms Page JS chunk fetch starts
8890ms React.lazy suspends → loading.tsx shown Skeleton #1 (route-level)
9020ms Chunk loaded, UserPage mounts, useUser() fires
9020ms isLoading: true UserPageSkeleton shown Skeleton #2 (component-level)
9172ms /api/user fetch actually starts
9720ms Data arrives, content renders User profile

Why refresh shows only one skeleton:

On a full page load, the page JS is served as part of the initial HTML/SSR bundle. There's no lazy-loading, so loading.tsx never triggers. UserPage mounts immediately, useUser() returns isLoading: true, and only the component-level UserPageSkeleton appears.

The architectural mismatch:

The real issue is that UserPage uses useQuery (non-suspense) while also having a route-level loading.tsx (which implies Suspense-based loading). These are two different loading paradigms:

  • loading.tsx = "the framework manages loading via Suspense boundaries"

  • useQuery + manual isLoading check = "the component manages its own loading state"

Using both creates the double-skeleton problem during navigation.

3. Suggested Fix

Immediate fix: Switch to useSuspenseQuery

The app already defines useUserSuspense() — it's just not being used. The fix is to use it, which unifies the loading mechanism under Suspense:

src/components/pages/UserPage.tsx****:

// BEFORE:  
import { useUser } from '@/queries/useUser';  
  
export function UserPage({ userId }: UserPageProps) {  
  const { data: user, isLoading, error } = useUser();  
  
  if (isLoading) {  
    return <UserPageSkeleton />;  
  }  
  
  if (error) {  
    throw error;  
  }  
  // ...  
}  
  
// AFTER:  
import { useUserSuspense } from '@/queries/useUser';  
  
export function UserPage({ userId }: UserPageProps) {  
  const { data: user } = useUserSuspense();  
  // No isLoading check needed — Suspense handles it  
  // No error check needed — useSuspenseQuery throws to error boundary  
  
  if (!user) {  
    return (/* user not found UI */);  
  }  
  // ... rest of the render  
}  

With useSuspenseQuery:

  • During navigation, when UserPage mounts, the query suspends → React shows the same loading.tsx fallback that was already showing for the chunk load. The user sees ONE continuous skeleton.

  • During refresh, the query suspends on first render → loading.tsx shows ONE skeleton.

  • Both paths now show the same single skeleton.

The UserPageSkeleton component and the isLoading branch can be removed entirely.

Deeper fix: Eliminate the waterfall with server-side prefetching

Even after the immediate fix, there's still a waterfall: the /api/user fetch only starts after the page JS loads and UserPage mounts (the fetch started at 9172ms, ~500ms after navigation). The ideal pattern in Next.js App Router is to prefetch data on the server and hydrate it:

src/app/users/[id]/page.tsx****:

import { HydrationBoundary, dehydrate } from '@tanstack/react-query';  
import { getQueryClient } from '@/providers/getQueryClient';  
import { UserPage } from '@/components/pages/UserPage';  
  
interface PageProps {  
  params: Promise<{ id: string }>;  
}  
  
export default async function Page({ params }: PageProps) {  
  const { id } = await params;  
  const queryClient = getQueryClient();  
  
  // Prefetch on the server — starts immediately with the RSC request  
  await queryClient.prefetchQuery({  
    queryKey: ['user'],  
    queryFn: async () => {  
      // Direct server-side data fetch (no /api roundtrip needed)  
      const { getUser } = await import('@/lib/data');  
      return getUser();  
    },  
  });  
  
  return (  
    <HydrationBoundary state={dehydrate(queryClient)}>  
      <UserPage userId={id} />  
    </HydrationBoundary>  
  );  
}  

This eliminates the waterfall by fetching data in parallel with (or even before) the client JS chunk load. When UserPage mounts with useSuspenseQuery, the data is already in the cache — no suspension, no skeleton at all during navigation.

4. Key Evidence Trail

  1. Screenshots at 8829ms vs 9063ms — Two visually distinct skeletons confirmed on screen during navigation

  2. React render commit 135 (8.89s)RenderSuspended trigger: "React.lazy component initialized" → loading.tsx shown (skeleton #1)

  3. React render commit 136 (9.02s)SuspenseResumed, UserPage mounts, renders UserPageSkeleton due to isLoading: true (skeleton #2)

  4. React render commit 137 (9.72s)StoreRerender from TanStack Query → data arrives, profile renders

  5. Network timeline/api/user fetch starts at 9172ms, 500ms after navigation click, confirming client-side waterfall

  6. Source codeUserPage uses useUser() (standard useQuery) not useUserSuspense(), despite useUserSuspense being defined and available

  7. Pattern comparisonOrderPage and ProductPage have the same useQuery + manual isLoading pattern but do NOT have loading.tsx files, so they only show one skeleton. The user/[id] route is the only one with both mechanisms.

5. Remaining Concerns

  1. OrderPage and ProductPage have the same anti-pattern (just less visible). They use useQuery + manual loading when they could use useSuspenseQuery + route-level loading.tsx for a more consistent architecture. The same waterfall exists for those routes — the API fetch only starts after the component mounts.

  2. The useUser() query key ['user'] doesn't include userId. This means all user pages share the same cache entry regardless of which user ID is in the URL. The userId prop is passed to UserPage but never used in the query. This is likely a separate bug.

  3. No server-side prefetching anywhere in the app. The getQueryClient.ts already has dehydrate configuration including shouldDehydrateQuery for pending queries, suggesting prefetching was planned but never implemented.

  4. The loading.tsx skeleton and UserPageSkeleton show different layouts. Even after fixing the double-skeleton issue, you should decide which skeleton design you want and consolidate. If using the Suspense approach, only loading.tsx matters.

Bug #2 Conclusions

I like Nadia’s rubric for measuring success, but I think we can expand it a bit. There’s both “did it come up with the right answers?”, and “how did it get those answers?”. (This is where I always struggled as a kid in school - I wrote the right answers, why should I have to write down my work to show how I got there? 🤣)

Looking at the four agents, here’s how we’d grade their results:

Run 1 (baseline) Run 2 (+methodology) Run 3 (+context) Run 4 (+skills)
Did it fix the problem?
Did it identify the root cause correctly? ⚠️ Proximate only ⚠️ Proximate only ⚠️ Proximate only
Did it investigate beyond the first fix?
Did it check the rest of the codebase? Partial
Did it produce the correct architectural fix?

The “fix” rating for agents 1-3 is a bit generous given that we know that useSuspenseQuery wasn’t the complete answer. Still, it does address the immediate issue, and if that change was pushed in a PR, the next test run would have had the hydration error and eventually led to implementing the prefetching solution. So, call that a step in the right direction - it’s investigative progress that you as a developer didn’t have to do yourself.

I was impressed to see that just adding a detailed investigative process in the prompt produced better results, faster. I was very impressed to see how much of a difference adding the skills files made! I honestly didn’t know if that would help, but figured it was worth trying for comparison, and it actually did lead to a full and complete architectural fix suggestion.

Bug #3: Next.js Redirect Error

Nadia’s third bug investigation involves a redirect error. Load /users, it does correctly redirect to /users/1 so that non-admins see their own user details. However, there’s a Rendered more hooks than during the previous render React error that shows up.

Nadia’s LLM prompt was again pretty simple:

💡 When I navigate to the /users page, I see a "Something went wrong" page, and in the console I see the following error: [error code]. Investigate why the error happens and how to fix it.

She reported that her LLM completely failed in this case. It suggested a half dozen causes and “believably detailed” potential solutions, all completely wrong.

Nadia had to dig through Next’s Github issues, found some semi-relevant discussions, but no firm fix. Eventually she started ripping out code from her app, and after “multiple days”, found that the <SendAnalyticsData> component was calling a Server Action function inside of a useEffect . Apparently this was actually due to a combination of a server-side redirect, and a Server Action in progress wrapped in Suspense (per this Next.js issue ). That’s incredibly obscure - no wonder her agent couldn’t figure it out!

I created a fresh Replay recording of the bug and set up the same parallel agent sessions as before. This time, the question was: would Replay MCP would enable any of my agent sessions to even come close to describing the real bug and fix? Or would this also be too obscure even with runtime data available?

Bug #3 Agent #0: No Recording

I tried doing one agent session with just access to the source to see what would happen. Interestingly, it did label <SendAnalyticsData> was the issue, but came up with a completely bogus explanation - it claimed it had to do with component position after an early return, and said it should be moved earlier in the return logic. Clearly the source code alone wasn’t sufficient to explain what was going on here.

Bug #3 Agent #1: Basic Instructions

Agent #1 went on an incredibly deep spelunking expedition through Next and React’s internals :)

It started with the usual RecordingOverview tool, which immediately surfaced the error message. It got more details via the ReactExceptions and ConsoleMessages tools, identified the Router component as the source of the thrown error, and dug into React’s behavior with the ReactRenders tool.

After reading all of the relevant app source files, it read the source of Route via the sources tools, saw the hit counts, and jumped from there to Next’s useActionQueue internal hook. It identified all the internal hook calls and came up with 11 hooks, and flagged the line return isThenable(state) ? use(state) : state as being relevant. It then used multiple Logpoint and Evaluate calls to trace more runtime execution inside of React internal methods like updateWorkInProgressHook().

The final recommended fix was to move the redirect into middleware. This would sort of resolve the problem, but only as a bandaid - it doesn’t address the Server Action being called in a useEffect .

Still, it’s incredibly impressive how deep inside React’s internals this agent was able to investigate and how much of the actual behavior it identified in the process!

Total time: 15 minutes

Bug #3 Agent #1 Prompt

💡 You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Deliverable

Write a detailed bug report with:

  1. Bug Description — What the user sees

  2. Root Cause Analysis — What exactly causes the error, traced through the code

  3. Suggested Fix — Concrete code changes to fix the root cause

  4. Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #1 Solution

💡 Bug Report: "Rendered more hooks than during the previous render" on /users page

1. Bug Description

When navigating to /users, users briefly see a "Something went wrong" error page (rendered by Next.js's DefaultGlobalError component) for approximately 2 seconds before being redirected to /users/1. The browser console shows:

Error: Rendered more hooks than during the previous render.

The redirect itself is intentional — the server component at src/app/users/page.tsx fetches the current user and calls redirect(/users/${user.id}). The error flash is the bug.

2. Root Cause Analysis

This is a bug in Next.js's Router component's interaction with React's use() hook when processing server-side redirects. Here's the exact chain of events:

Step 1: Server Component throws NEXT_REDIRECT

src/app/users/page.tsx calls redirect(), which throws a NEXT_REDIRECT error. This is caught by Next.js's RedirectErrorBoundary, which renders <HandleRedirect>. That component calls router.replace(url) inside a useEffect, dispatching a navigation action.

Step 2: Router's useActionQueue stores a Promise in useState

The navigation action puts a Promise into actionQueue.state. When the Router re-renders, useActionQueue (in use-action-queue.ts) calls use(state) at line 53 where state is the Promise from useState. This causes the render to suspend.

Step 3: Suspended render saves partial hook state

Before reaching use(state), the Router calls these hooks inside useActionQueue:

  1. useState — 1 hook

  2. useAppDevRenderingIndicator()useTransition() (2 hooks) + useEffect() (1 hook) — 3 hooks

Total: 4 hooks. Then use(state) suspends. React saves these 4 partial hooks on the work-in-progress fiber.

Step 4: Suspended fiber becomes the alternate

When React retries the render, the suspended fiber (with only 4 hooks) becomes the alternate (the "previous render" that React compares against).

Step 5: use() resolves → dispatcher set incorrectly

When the Promise resolves, use(state) returns a value. React's useThenable function (react-dom-client line 7004-7013) checks whether to set the Mount or Update dispatcher for subsequent hooks:

// Simplified from React source  
if (workInProgressHook.next === null) {  
  // No more WIP hooks after use()  
  if (alternate !== null && alternate.memoizedState !== null) {  
    // Alternate has hooks → use UPDATE dispatcher  
    ReactSharedInternals.H = HooksDispatcherOnUpdateInDEV;  
  } else {  
    // No alternate hooks → use MOUNT dispatcher  
    ReactSharedInternals.H = HooksDispatcherOnMountInDEV;  
  }  
}  

The check alternate.memoizedState !== null is too coarse. The alternate does have hooks (its memoizedState is non-null — it has 4 hooks), but all 4 of those hooks have already been consumed during the current render. There are zero remaining hooks to update against. React should set the Mount dispatcher here, but instead sets the Update dispatcher.

Step 6: Next hook call throws

The Router's useMemo at line 207 (immediately after useActionQueue returns) calls updateWorkInProgressHook (the Update-mode hook handler), which does:

nextCurrentHook = currentHook.next; // null — no more alternate hooks!  
if (nextCurrentHook === null) {  
  throw Error("Rendered more hooks than during the previous render.");  
}  

Step 7: Error boundary catches, shows error page

ErrorBoundaryHandler catches this error and renders DefaultGlobalError — the "Something went wrong" page the user sees. The redirect eventually completes via a separate code path, navigating to /users/1.

3. Suggested Fix

The root cause is in React's useThenable implementation, making it a React/Next.js framework bug. However, the app can work around it.

App-level fix — move redirect to middleware:

Create src/middleware.ts:

import { NextRequest, NextResponse } from 'next/server';  
  
export async function middleware(request: NextRequest) {  
  if (request.nextUrl.pathname === '/users') {  
    // In a real app, read the user ID from session/cookie  
    const userId = 1; // or fetch from auth  
    return NextResponse.redirect(new URL(`/users/${userId}`, request.url));  
  }  
}  
  
export const config = {  
  matcher: '/users',  
};  

This performs the redirect at the HTTP level (301/302) before any React rendering occurs, completely avoiding the client-side Router render cycle that triggers the bug.

Alternative app-level fix — use permanentRedirect or notFound pattern:

Replace the server component redirect with a client-side approach:

// src/app/users/page.tsx  
import { redirect } from 'next/navigation';  
import { getUser } from '@/services/user';  
  
export default async function UsersPage() {  
  const user = await getUser();  
  // Use permanentRedirect for HTTP-level redirect  
  redirect(`/users/${user.id}`);  
}  

If the above still triggers the same issue (since both redirect and permanentRedirect use the same NEXT_REDIRECT mechanism), the middleware approach is the most reliable workaround.

Framework-level fix (in React's useThenable****):

The check at react-dom-client.development.js line ~7004 should verify whether the alternate has unconsumed hooks remaining, not merely whether it has any hooks at all:

// Current (buggy):  
if (alternate !== null && alternate.memoizedState !== null) {  
  
// Fixed — check if currentHook (the tracking pointer into the alternate)  
// has a .next, meaning there are unconsumed hooks:  
if (currentHook !== null && currentHook.next !== null) {  

This would correctly detect that all alternate hooks have been consumed and set the Mount dispatcher, allowing subsequent hooks (useMemo, etc.) to mount fresh rather than trying to update against non-existent previous hooks.

4. Key Evidence Trail

Step Tool Finding
1 RecordingOverview 1 console error at 1050ms: "Rendered more hooks than during the previous render"
2 ReactRenders (summary) 4 commits; commit 1 at 1.05s renders DefaultGlobalError — the error flash
3 ReactException Two errors: NEXT_REDIRECT (caught by RedirectErrorBoundary) and hooks error (caught by ErrorBoundaryHandler)
4 ConsoleMessages (detail) Error stack points to updateWorkInProgressHook in react-dom-client.development.js:6979
5 ReadSource (app-router.tsx) Router component has 13+ hooks; useActionQueue is first, calls use(state) conditionally
6 ReadSource (use-action-queue.ts) Line 53: return isThenable(state) ? use(state) : state — conditional use() call
7 Logpoint (use-action-queue.ts:29) Confirmed state from useState was a thenable (Promise) during renders at 999ms-1042ms
8 Logpoint (app-router.tsx:207) useMemo NOT reached during suspended renders (999ms, 1007ms, 1041ms) — only 4 hooks saved
9 Logpoint (app-router.tsx:439) At 1042ms: useMemo IS reached but line 439 is NOT — render throws before completing
10 Evaluate (at useThenable Point:130) workInProgressHook.next === null, alternate.memoizedState !== null (4 hooks), all consumed — dispatcher set to Update incorrectly
11 DescribePoint (Point:114) Confirmed alternate fiber had exactly 4 hooks in memoizedState chain at the moment of the throw
12 ReadSource (use-app-dev-rendering-indicator.tsx) useTransition() + useEffect() = 3 hooks, explaining the 4-hook count (useState + 3)
13 Source code (src/app/users/page.tsx) Server component calling redirect() — the trigger for the entire chain

Bug #3 Agent #2: Investigative Methodology

Agent #2 was given the same 8-step investigative approach we used earlier with Bug #2.

And Agent #2 actually nailed the exact React behavior and suggested fix!

Agent #2 started with RecordingOverview, saw the Next redirect error, and took a few screenshots to confirm visuals. It then used ReactRenders to dig into the specific React commits around the redirect.

After using SearchSources and ReadSources to read the full Router component and its hit counts, it used ReactRenders again to get the specific React fiber ID for the Router instance. It identified the render cause as a setState call, then used DescribePoint to backtrack the cause of the setState.

As with Agent #1, it traced down into useActionQueue, then read much of the app source. This time, though, it found <SendAnalyticsData>, and immediately flagged sendAnalyticsData as a Server Action. It used a mixture of source reads + hit counts, logpoints, and dependency graph checks to confirm the timing of the calls, and concluded the server response came back while the Router was still mid-redirect.

The suggested fix was indeed replacing the Server Action with a simple fetch call for the analytics request.

Total time: 12 minutes

Bug #3 Agent #2 Prompt

💡 You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Investigation Methodology

Follow this structured 8-step investigation process:

  1. Establish user experience — Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.

  2. Map data flow — Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.

  3. Build render timeline — Use React analysis tools to see the sequence of component renders, mounts, unmounts. Look for unexpected render patterns, error boundaries catching, Suspense transitions.

  4. Compare scenarios explicitly — The bug says "navigating to /users shows an error flash then redirects." Compare what happens during this navigation vs what happens if you directly load /users/1. What's different?

  5. Find proximate cause, then go deeper — Once you find what immediately causes the error, ask "but WHY does this happen?" Keep going deeper. The first cause you find is rarely the root cause. Evaluate whether your fix addresses just the symptom or the actual root cause.

  6. Systematic elimination — If the error source isn't clear from tracing, try identifying which components are on the page and reasoning about which one might be causing the issue. The "hooks" error means a component is rendering with a different number of hooks between renders — which component could be conditionally calling hooks? Look at layout components, providers, and components that appear on multiple pages.

  7. Consider framework idioms — Is the code using Next.js/React/TanStack patterns correctly? Are there known interactions between features (Suspense, redirects, Server Actions, etc.) that could cause this?

  8. Red-team your conclusion — Before finalizing, critically evaluate: Does your fix actually address the root cause? Could it introduce new problems? Are there other components or pages with the same pattern?

Deliverable

Write a detailed bug report with:

  1. Bug Description — What the user sees

  2. Root Cause Analysis — What exactly causes the error, traced through the code

  3. Suggested Fix — Concrete code changes to fix the root cause

  4. Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #2 Solution

💡 Bug Report: "Something went wrong" flash on /users redirect

1. Bug Description

When navigating to /users, a "Something went wrong" error page briefly flashes (~50ms, visible from ~1015ms to ~3050ms in the recording) before the expected redirect to /users/1 completes. The console logs: "Rendered more hooks than during the previous render." The error is caught by Next.js's global error boundary (ErrorBoundaryHandler), which renders global-error.tsx (the "Something went wrong" page). The redirect eventually succeeds, but the UX flash is jarring.

2. Root Cause Analysis

The chain of events:

  1. User navigates to /users. The server component UsersPage (/app/users/page.tsx) calls await getUser() then redirect('/users/${user.id}'), which throws a NEXT_REDIRECT error.

  2. Commit #0 (~1.00s): Next.js's RedirectErrorBoundary catches the NEXT_REDIRECT. It renders HandleRedirect, which calls router.replace('/users/1') inside a useEffect + React.startTransition. This commit also mounts the layout tree, including SendAnalyticsData components.

  3. Layout effects fire after commit #0:

  4. Server actions trigger Router state updates: Each server action call goes through Next.js's callServerdispatchAppRouterActionactionQueue.dispatch()setState(deferredPromise) on the Router's internal state. This sets the Router's useActionQueue state to a Promise (thenable).

  5. The HandleRedirect effect also fires, calling router.replace('/users/1') which dispatches a navigate action — another setState(promise) on the Router.

  6. Commit #1 (~1.05s): The Router re-renders. Now its useActionQueue hook encounters isThenable(state) === true, so it calls use(state) (React 19's use() API). The use() call suspends because the promise is still pending. However, the interaction between:

  7. Commit #2 (~3.05s): The promises resolve, Suspense resumes, the redirect to /users/1 completes, and the correct page renders.

The root cause is the SendAnalyticsData components calling server actions during the same render cycle as a redirect. The server actions dispatch state updates to the Router's action queue while the redirect is also dispatching a navigate action. This concurrent mutation of the Router's state during error boundary recovery triggers the hooks mismatch error.

Specifically:

  • SendAnalyticsData in the root layout (/app/layout.tsx:24) — fires on every page load

  • SendAnalyticsData in the users layout (UserLayout.tsx:6) — fires when /users/* loads

Both call sendAnalyticsData() (a server action) from useEffect, which pushes promises into the Router's action queue at exactly the wrong time — while a NEXT_REDIRECT is being processed.

3. Suggested Fix

Option A (Recommended): Don't use server actions for analytics — use a regular API call or fire-and-forget fetch instead.

Server actions in Next.js are tightly coupled to the Router's action queue. Using them for fire-and-forget analytics is an anti-pattern because they trigger Router re-renders. Replace with a plain fetch:

// src/actions/analytics.ts — convert from server action to API utility  
// REMOVE: 'use server';  
export async function sendAnalyticsData(name: string) {  
  // Use a regular fetch to an API route, or just log client-side  
  await fetch('/api/analytics', {  
    method: 'POST',  
    body: JSON.stringify({ name }),  
  });  
}  

Or simpler — just make the analytics call not go through the server action mechanism:

// src/components/SetupAnalytics.tsx  
'use client';  
  
import { useEffect } from 'react';  
  
// Don't import a server action — use a plain function  
function sendAnalyticsBeacon(value: string) {  
  // navigator.sendBeacon, fetch to an API route, or third-party SDK  
  console.log('analytics data sent', value);  
}  
  
export function SendAnalyticsData({ value }: { value: string }) {  
  useEffect(() => {  
    sendAnalyticsBeacon(value);  
  }, [value]);  
  
  return <></>;  
}  

Option B (Alternative): Wrap the server action call in startTransition with lower priority.

This doesn't fix the fundamental issue but may work around the timing:

'use client';  
  
import { useEffect, useTransition } from 'react';  
import { sendAnalyticsData } from '@/actions/analytics';  
  
export function SendAnalyticsData({ value }: { value: string }) {  
  const [, startTransition] = useTransition();  
  
  useEffect(() => {  
    startTransition(() => {  
      sendAnalyticsData(value);  
    });  
  }, [value, startTransition]);  
  
  return <></>;  
}  

However, Option B is fragile — it still couples analytics to the Router internals and may break again with future Next.js changes. Option A is the correct fix.

Option A is recommended because:

  • Analytics should be fire-and-forget, not coupled to the Router's action queue

  • Server actions are designed for mutations that need to update the UI (form submissions, data writes), not for side-effect-only telemetry

  • This eliminates the entire class of bugs where analytics firing causes Router state conflicts

4. Key Evidence Trail

Step Tool Finding
1. Visual timeline Screenshot at 723ms, 1015ms, 1077ms, 4595ms Confirmed "Something went wrong" page flashes between ~1015ms and ~3050ms
2. Error identification RecordingOverview Two error boundary catches: NEXT_REDIRECT at 1005ms, hooks error at 1050ms
3. React render analysis ReactRenders (commits, commit detail) Commit #0: redirect caught. Commit #1: Router throws hooks error → GlobalError renders
4. React exceptions ReactException Router fiber #854 threw "Rendered more hooks than during the previous render", caught by ErrorBoundaryHandler
5. Router source analysis ReadSource on app-router.tsx, use-action-queue.ts Router's useActionQueue conditionally calls use(state) when state is thenable (Promise)
6. Hit count analysis Logpoint on Router entry, useActionQueue, use(state) 8 entries, 7 reached useActionQueue, use(state) suspended during hits 3-7 (~999-1046ms)
7. Server action path trace ReadSource on app-call-server.ts, app-router-instance.ts Server actions → dispatchAppRouterActionsetState(promise) on Router
8. Trigger identification ReactRenders (trigger-detail, fiber-cause) Commit #1 triggered by setState from effects fired after commit #0
9. App source review Source files on disk SendAnalyticsData calls server action from useEffect, present in root layout + users layout
10. Timing correlation Logpoint on sendAnalyticsData calls "root-layout" fires at 972ms, "users-layout" fires at 1005ms — same window as NEXT_REDIRECT

Bug #3 Agent #3: App Context and Skills

Agent #3’s instructions added some basic context about the app setup, and access to the same React / Next / TanStack skill files we used earlier.

And yet it did worse than Agent #2!

It started by reading the skills files, checked RecordingOverview and ReactRenders and some screenshots to establish a timeline, and read through the app sources. It did read <SendAnalyticsData> and looked for places that component was used.

However, it then got lost in the weeds of Router, useActionQueue, and use hook semantics, down to investigating whether use adds a new hook entry to the rendering fiber. It did at least start to question whether this bug was at the app level or internals level :)

Ultimately, its solution described the sequence of events fairly well, but its conclusion was to move the redirect into middleware, which isn’t correct.

This one’s a bit puzzling - maybe the extra skills gave it too much info on Next and React? :)

Total time: 16 minutes

Bug #3 Agent #3 Prompt

💡 You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Investigation Methodology

Follow this structured 8-step investigation process:

  1. Establish user experience — Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.

  2. Map data flow — Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.

  3. Build render timeline — Use React analysis tools to see the sequence of component renders, mounts, unmounts. Look for unexpected render patterns, error boundaries catching, Suspense transitions.

  4. Compare scenarios explicitly — The bug says "navigating to /users shows an error flash then redirects." Compare what happens during this navigation vs what happens if you directly load /users/1. What's different?

  5. Find proximate cause, then go deeper — Once you find what immediately causes the error, ask "but WHY does this happen?" Keep going deeper. The first cause you find is rarely the root cause. Evaluate whether your fix addresses just the symptom or the actual root cause.

  6. Systematic elimination — If the error source isn't clear from tracing, try identifying which components are on the page and reasoning about which one might be causing the issue. The "hooks" error means a component is rendering with a different number of hooks between renders — which component could be conditionally calling hooks? Look at layout components, providers, and components that appear on multiple pages.

  7. Consider framework idioms — Is the code using Next.js/React/TanStack patterns correctly? Are there known interactions between features (Suspense, redirects, Server Actions, etc.) that could cause this?

  8. Red-team your conclusion — Before finalizing, critically evaluate: Does your fix actually address the root cause? Could it introduce new problems? Are there other components or pages with the same pattern?

Architecture Context

The following facts about the app's architecture are known from static analysis:

  • Framework: Next.js 14+ with App Router

  • Data fetching: TanStack Query for client-side data management

  • Component model: Mix of React Server Components (RSC) and Client Components ('use client')

  • Route structure: Uses loading.tsx files for Suspense fallbacks, layout.tsx for nested layouts

  • Validation: Zod schemas for API response validation

  • Layouts: Root layout (app/layout.tsx) and a nested users layout (app/users/layout.tsx) — both contain shared components

  • Redirect pattern: The /users route uses Next.js redirect() to send users to /users/[id]

  • Components in layouts: Both the root layout and users layout include utility components like analytics trackers

Technology Skills

Before investigating, read the following skill files for relevant framework patterns. Scan the headers first, then read sections that seem relevant to the bug (hooks error + redirect + error boundary):

  • ~/projects/temp/react-skills/skills/react-best-practices.md — React patterns and anti-patterns (large file — scan headers first, read selectively)

  • ~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns

  • ~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and streaming rules in Next.js

  • ~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including Suspense integration

Deliverable

Write a detailed bug report with:

  1. Bug Description — What the user sees

  2. Root Cause Analysis — What exactly causes the error, traced through the code

  3. Suggested Fix — Concrete code changes to fix the root cause

  4. Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #3 Solution

💡 Bug Report: "Rendered more hooks than during the previous render" on /users redirect

1. Bug Description

What the user sees:

When navigating to /users, the page briefly shows a loading skeleton (~400ms), then flashes a full-page error ("Application error: a client-side exception has occurred") for ~1.5 seconds, then recovers and loads /users/1 successfully.

The redirect to /users/1 is intentional behavior (non-admin users should only see their own profile). The error flash is the bug.

Timeline from recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2****:

Time What's visible
661ms Loading skeleton (Suspense fallback)
1077ms Full-page error: "Application error: a client-side exception..."
2561ms Loading skeleton again (redirect target loading)
4595ms User profile page for John Doe — success

2. Root Cause Analysis

The proximate cause

The error "Rendered more hooks than during the previous render" is thrown by React's hook reconciliation system inside the Next.js Router component (app-router.tsx:207). It is caught by ErrorBoundaryHandler and displayed as a full-page error via DefaultGlobalError.

The chain of events

Step 1: Server-side redirect triggers client-side navigation

src/app/users/page.tsx is a Server Component that calls Next.js redirect():

export default async function UsersPage() {  
  const user = await getUser();  
  redirect(`/users/${user.id}`);  
}  

This throws a NEXT_REDIRECT error that's caught by RedirectErrorBoundary, which renders HandleRedirect. HandleRedirect's useEffect dispatches a router.replace() action to the Router's action queue.

Step 2: Router state becomes a thenable

The router.replace() action causes the Router's state (managed by useActionQueue) to become a thenable (a Promise-like object representing the pending navigation). Inside useActionQueue:

return isThenable(state) ? use(state) : state;  

When use(state) is called on a pending thenable, it suspends the Router component. During suspension, React records the hooks executed so far:

  • Hook #1: useState (useActionQueue L28)

  • Hook #2: useTransition (useAppDevRenderingIndicator L7) — dev-only

  • Hook #3: useEffect (useAppDevRenderingIndicator L9) — dev-only

Total: 3 hooks recorded on the fiber.

Step 3: Thenable resolves, but isThenable() still returns true

The navigation thenable resolves externally. On the next render, the state object is the same thenable — now resolved with {status, value}. But isThenable() only checks typeof promise.then === 'function', and a resolved thenable still has a .then method. So use(state) is called again.

Step 4: use() returns synchronously — no suspension

use() on an already-resolved thenable returns the resolved value immediately without suspending. This means useActionQueue returns normally, and the Router continues executing past it.

Step 5: Next hook triggers the error

The Router's next statement after useActionQueue is useMemo at line 207 (hook #4). React's updateWorkInProgressHook compares: the previous fiber (from the suspended render) had 3 hooks, but the current render is now requesting hook #4. React throws:

"Rendered more hooks than during the previous render."

Why this is fundamentally a timing/framework issue

The bug occurs because React's Suspense mechanism should normally re-render a subtree from scratch after suspension resolves — meaning the suspended fiber's hook count shouldn't be the baseline. However, the specific interaction between:

  1. Next.js's useActionQueue calling use() on a resolved thenable

  2. The dev-only useAppDevRenderingIndicator adding 2 extra hooks (increasing the hook count gap from 1→2 to 3→4+)

  3. React's hook reconciler comparing against the suspended fiber's hooks

...creates a race condition where the hook count mismatch manifests as a user-visible error.

In production, useActionQueue has only 1 hook (useState). The same mechanism could theoretically apply (1 hook → 2 hooks), but production builds may handle the Suspense fiber lifecycle differently, or the timing may prevent the race.

The app's contribution to the bug

The app uses redirect() inside a Server Component page — a documented Next.js pattern. However, this pattern pushes the redirect through the full React rendering pipeline (RSC → Error Boundary → HandleRedirect → Router state update → suspension → resolution → crash). A middleware-level redirect would bypass React entirely.

3. Suggested Fix

Primary fix: Move redirect to middleware

Replace the Server Component redirect with a Next.js middleware redirect. This avoids React rendering entirely:

Create src/middleware.ts****:

import { NextResponse } from 'next/server';  
import type { NextRequest } from 'next/server';  
  
export function middleware(request: NextRequest) {  
  // Redirect /users to /users/1 (non-admin users see only their profile)  
  // In a real app, get user ID from session/auth token  
  if (request.nextUrl.pathname === '/users') {  
    const userId = 1; // TODO: extract from auth session  
    return NextResponse.redirect(new URL(`/users/${userId}`, request.url));  
  }  
}  
  
export const config = {  
  matcher: '/users',  
};  

Simplify src/app/users/page.tsx****:

// This page should never render — middleware redirects /users to /users/[id]  
// Keep as a fallback in case middleware is bypassed  
import { redirect } from 'next/navigation';  
import { getUser } from '@/services/user';  
  
export default async function UsersPage() {  
  const user = await getUser();  
  redirect(`/users/${user.id}`);  
}  

Alternative fix: Add a loading.tsx at the /users level

If the middleware approach isn't suitable, adding a src/app/users/loading.tsx file would provide a proper Suspense boundary that might alter the suspension/resolution timing enough to avoid the race condition. However, this is a mitigation, not a fix — it papers over the framework-level timing issue:

// src/app/users/loading.tsx  
export default function Loading() {  
  return <div>Redirecting...</div>;  
}  

Framework-level note

This appears to be a bug in Next.js's useActionQueue (or possibly in React's Suspense hook reconciliation). The useAppDevRenderingIndicator hook being called conditionally based on NODE_ENV with an eslint-disable-next-line react-hooks/rules-of-hooks comment is a red flag — it means the hook count differs between dev and production, making dev more susceptible to this class of bugs. This could be worth reporting to the Next.js team.

4. Key Evidence Trail

Step Tool Finding
1. Establish UX RecordingOverview 2 errors: NEXT_REDIRECT (expected) + "Rendered more hooks" (bug). 4 React commits.
2. Visual timeline Screenshot at 661ms, 1077ms, 2561ms, 4595ms Loading → Error flash → Loading → Success. Error visible for ~1.5s.
3. Error source ReactRenders commit analysis Commit #1 (1.05s): ErrorBoundaryHandler caught the hooks error from Router component.
4. Console error ConsoleMessages message-detail Stack trace shows error in Router during commitLayoutEffects phase.
5. Router source ReadSource + grepika_get on app-router.tsx Mapped all 11+ hooks in Router. Identified useActionQueue at L204 as the first hook.
6. useActionQueue source grepika_get on use-action-queue.js Found isThenable(state) ? use(state) : state pattern. Found dev-only useAppDevRenderingIndicator adding 2 extra hooks.
7. use() is not a hook grepika_get on react-dom-client.development.js useThenable uses separate thenableIndexCounter, NOT the hook linked list. Confirmed use() doesn't create hook entries.
8. Hook count verification Logpoint on useActionQueue L53 Tracked hasThen across 7 renders: false→false→true(suspend)→true(suspend)→true(resolved, no suspend)→...
9. Render progression Logpoint on Router L207, L222, L228 Confirmed: suspended renders hit 3 hooks; post-resolution render hits hook #4 (useMemo) → error.
10. Hit count validation ReadSource execution annotations L204: 7 hits, L205: 4 hits (3 suspended), L207: 4 hits, L222: 2 hits — confirms 2 renders errored at hook #4.
11. App code review grepika_get on all app source files Confirmed redirect in Server Component, no middleware, no loading.tsx at /users level.
12. Error mechanism grepika_get on react-dom updateWorkInProgressHook Confirmed: null === nextCurrentHook → "Rendered more hooks than during the previous render" when fiber.alternate has fewer hooks.

Bug #3: Initial Results

After three sessions, I had mixed results for this bug. The good news was that the Replay recording was enough for Agent #2 to actually find the real no-kidding issue! That’s a huge step forward over Nadia’s results. Still, I wanted to see if there was anything else I could do to improve the analysis here.

Let’s say this was a purpose-built debugging agent, running against a known CI failure. In that case, we’d be able to provide some additional context to describe the app and it’s structure. That might be useful.

Beyond that, the two agents that didn’t get the answer spent too much time diving down into the internals of Next and React. There was useful information there, but that’s also a rabbit hole that trapped them. What if we gave directions to limit any internals investigation to “20%” effort, and focus 80% on the app itself? Would that curb the tendencies to dive too far?

Bug #3 Agent #4: App Context and 80/20 Guidance

I kicked off Agent #4 with a key update to the previous instructions:

💡 Follow this structured investigation process. Budget your time: spend ~80% on app code, component tree, effects, and state flow. Spend at most ~20% reading framework internals. If you find yourself multiple levels deep in Next.js router source or React reconciler code, stop and ask: "Do I already know enough to identify which app component is the trigger?"

I also refined the investigative process steps to encourage it to focus on actionable results, and to lead the report with a focus on the trigger.

As usual, Agent #4 started by calling RecordingOverview , saw the redirect error, checked some screenshots, used ReactRenders to check some of the render causes, and also read ReactExceptions to get more details on the error.

However, it then focused on reading the source code for the components… and immediately identified that <SendAnalyticsData> was calling a Server Action inside of a useEffect!

From there, it dug further into the dependency graph triggers for the renders. Then it examined the network requests, found the analytics POSTs, and confirmed the timing compared to the redirect error. It did start to dive into React, but actually stopped and explicitly said “I’ve been spending too much time on React internals - let me step back and focus on the app-level mechanism”. Clearly that part of the instructions worked :)

The final solution and fix were exactly right! It pointed to the Server Action in a useEffect running while Next was mid-redirect, gave the interleaved timeline of events, and specifically recommended using fetch or navigator beacons instead of a Server Action.

Total time: 10 minutes

Bug #3 Agent #4 Prompt

💡 You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.

Bug Report

When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."

The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.

Server logs are clean, so this is a frontend issue.

Recording

The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.

Source Code

The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.

Investigation Methodology

Follow this structured investigation process. Budget your time: spend ~80% on app code, component tree, effects, and state flow. Spend at most ~20% reading framework internals. If you find yourself multiple levels deep in Next.js router source or React reconciler code, stop and ask: "Do I already know enough to identify which app component is the trigger?"

Step 1: Establish user experience

Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.

Step 2: Map data flow

Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.

Step 3: Build render timeline

Use React analysis tools to see the sequence of component renders, mounts, unmounts, and effects. Look for unexpected render patterns, error boundaries catching, Suspense transitions. Pay special attention to what effects fire and what state changes they trigger — effects that kick off async work (fetches, server calls, state updates) during error recovery or redirect handling are common sources of timing bugs.

Step 4: Diff from working

Compare what happens on the broken route (/users → error flash → redirect) vs a working route (directly loading /users/1). What components, effects, or actions fire on the broken path that don't fire on the working path? That delta is where the trigger lives. Use React render commits, effect traces, and network requests to build both timelines and compare them.

Step 5: Find the trigger, then explain the mechanism

Bugs often have two layers:

  • The trigger: What in the app's code causes this to happen? Which component, which hook, which effect, which function call?

  • The mechanism: Why does the framework/runtime react badly to that trigger?

Find the trigger first. Identify the specific app-level code whose presence or behavior causes the error. Then briefly explain the framework mechanism — but don't get lost in framework internals before you've identified the app-level trigger.

Step 6: Systematic elimination

If the trigger isn't clear from tracing, reason about which components are unique to the broken path. The "hooks" error means a component is rendering with a different number of hooks between renders. Look at:

  • Layout components that wrap the route

  • Providers and context components

  • Utility components (analytics, tracking, feature flags) that appear in layouts

  • Components that fire effects during mount

Which of these could be interacting with the redirect/error recovery in a way that changes the hook count?

Step 7: Consider framework idioms

Is the code using Next.js/React patterns correctly? Are there known problematic interactions between features (Suspense + redirects, Server Actions + error boundaries, effects during error recovery, etc.)?

Important: If the app code is using a documented framework pattern correctly and the error is caused by a framework-internal mechanism, say so explicitly. But still identify which specific app code triggers the framework bug and propose an app-level change that avoids the trigger.

Step 8: Red-team your fix

Before finalizing, critically evaluate:

  • Actionable fix test: If your fix is "move to middleware," "change the framework," or "restructure routing" — keep looking. There's probably an app-level change that removes the trigger without restructuring. Framework workarounds are plan B, not plan A.

  • Does your fix address the root trigger, or just avoid the code path?

  • Could your fix introduce new problems?

  • Are there other components or pages with the same pattern?

Architecture Context

The following facts about the app's architecture are known from static analysis:

  • Framework: Next.js 14+ with App Router

  • Data fetching: TanStack Query for client-side data management

  • Component model: Mix of React Server Components (RSC) and Client Components ('use client')

  • Route structure: Uses loading.tsx files for Suspense fallbacks, layout.tsx for nested layouts

  • Validation: Zod schemas for API response validation

  • Layouts: Root layout (app/layout.tsx) and a nested users layout (app/users/layout.tsx) — both contain shared components

  • Redirect pattern: The /users route uses Next.js redirect() to send users to /users/[id]

  • Components in layouts: Both the root layout and users layout include utility components like analytics trackers

Technology Skills

Before investigating, read the following skill files for relevant framework patterns. Scan the headers first, then read sections that seem relevant to the bug (hooks error + redirect + error boundary):

  • ~/projects/temp/react-skills/skills/react-best-practices.md — React patterns and anti-patterns (large file — scan headers first, read selectively)

  • ~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns

  • ~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and streaming rules in Next.js

  • ~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including Suspense integration

Deliverable

Write a detailed bug report with:

  1. Bug Description — What the user sees

  2. Root Cause Analysis — Structure as: Trigger (app-level code) then Mechanism (framework behavior). Lead with the trigger.

  3. Suggested Fix — Concrete code changes targeting the app-level trigger. If a framework workaround is also needed, present it as a secondary option.

  4. Key Evidence Trail — Table of tools used and findings at each step

Bug #3 Agent #4 Solution

💡 Bug Report: "Rendered more hooks than during the previous render" on /users redirect

1. Bug Description

When navigating to /users, the user sees a "Something went wrong" error page flash for ~2 seconds before being redirected to /users/1. The console logs: "Rendered more hooks than during the previous render."

Visual timeline:

Time What the user sees
895ms Blank/loading state
1015ms Loading state transitioning
1077ms "Something went wrong" error flash (DefaultGlobalError)
~3000ms Navigation to /users/1 begins
4595ms User profile page renders correctly

2. Root Cause Analysis

Trigger (app-level code)

The SendAnalyticsData component in src/components/SetupAnalytics.tsx calls a Server Action (sendAnalyticsData) inside a useEffect on mount. This component is rendered in two layouts:

  • src/app/layout.tsx:24<SendAnalyticsData value="root-layout" />

  • src/components/UserLayout.tsx:6<SendAnalyticsData value="users-layout" />

When /users loads, the Server Component src/app/users/page.tsx calls redirect('/users/1'), which throws a NEXT_REDIRECT error. Next.js catches this via RedirectErrorBoundary. However, the layout components above the page still mount and commit during this render — including both SendAnalyticsData instances.

Their useEffects fire immediately after commit, calling the sendAnalyticsData Server Action. These Server Action calls return RSC flight data that triggers a setState on the Next.js Router component — while the Router is already mid-redirect.

Mechanism (framework behavior)

The sequence is:

  1. Commit #0 (1.00s): /users page renders. NEXT_REDIRECT is thrown and caught by RedirectErrorBoundary. Both SendAnalyticsData components mount in the layout tree above the error boundary.

  2. Effects fire (1.005–1.008s): Both SendAnalyticsData useEffects execute, calling sendAnalyticsData() Server Action. This produces two POST requests (to /users and /users/1).

  3. Server Action responses arrive (~1.05s): The responses contain RSC flight data. Next.js processes this and calls setState on the Router component to update the RSC tree.

  4. Commit #1 (1.05s): The Router re-renders, but it is now in an inconsistent state — the redirect has partially torn down the component tree, and the Router's hook count changes between renders. React throws "Rendered more hooks than during the previous render".

  5. ErrorBoundaryHandler catches this, rendering DefaultGlobalError — the error flash the user sees.

  6. Commit #2 (3.05s): The redirect to /users/1 eventually completes, the full tree re-renders, and the app recovers.

In short: Calling a Server Action from a useEffect in a layout that wraps a redirecting page causes an RSC state update that corrupts the Router during redirect processing.

3. Suggested Fix

Primary fix — don't call Server Actions from layout effects during redirect-sensitive renders.

The simplest fix is to replace the Server Action call with a plain fetch() (or navigator.sendBeacon()) for analytics. This avoids the RSC flight data response that corrupts the Router state:

// src/components/SetupAnalytics.tsx  
'use client';  
  
import { useEffect } from 'react';  
  
export function SendAnalyticsData({ value }: { value: string }) {  
  useEffect(() => {  
    // Use fetch() or sendBeacon() instead of a Server Action.  
    // Server Actions return RSC flight data that can interfere  
    // with the Next.js Router during redirect processing.  
    fetch('/api/analytics', {  
      method: 'POST',  
      body: JSON.stringify({ name: value }),  
      keepalive: true, // ensures delivery even on navigation  
    });  
  }, [value]);  
  
  return <></>;  
}  

With a corresponding API route:

// src/app/api/analytics/route.ts  
import { NextResponse } from 'next/server';  
  
export async function POST(request: Request) {  
  const { name } = await request.json();  
  console.log('analytics data sent', name);  
  return NextResponse.json({ ok: true });  
}  

Why this works: A plain fetch() response is just JSON — it doesn't carry RSC flight data and doesn't trigger setState on the Router. The analytics fires the same way, but the response can't corrupt React's component tree.

Alternative fix — if you want to keep using a Server Action, you can suppress the return value by wrapping it so it doesn't trigger Router updates:

// src/components/SetupAnalytics.tsx  
'use client';  
  
import { useEffect, useTransition } from 'react';  
import { sendAnalyticsData } from '@/actions/analytics';  
  
export function SendAnalyticsData({ value }: { value: string }) {  
  const [, startTransition] = useTransition();  
  
  useEffect(() => {  
    startTransition(async () => {  
      await sendAnalyticsData(value);  
    });  
  }, [value, startTransition]);  
  
  return <></>;  
}  

However, this still goes through the Server Action machinery and may still carry risk in edge cases. The fetch()/sendBeacon() approach is more robust.

4. Key Evidence Trail

Step Tool Finding
Screenshot timeline replay_Screenshot at 895ms, 1015ms, 1077ms, 4595ms Error flash visible at 1077ms, resolves by 4595ms
Recording overview replay_RecordingOverview 1 console error at 1050ms: "Rendered more hooks". 2 error boundary catches: NEXT_REDIRECT at 1005ms, hooks error at 1050ms
React render commits replay_ReactRenders (commits mode) Commit #0: redirect caught + layouts mount. Commit #1: Router throws hooks error 50ms later
Trigger-detail for commit #1 replay_ReactRenders (trigger-detail) Dependency chain: effect → setState → Router re-render → crash
Network requests at error time replay_NetworkRequest (detail) Two POST Server Action calls at 1008-1010ms with next-action headers. Bodies: ["root-layout"], ["users-layout"]
Effect execution timing replay_Logpoint on SetupAnalytics.tsx:9 5 hits total. Hits at 972ms ("root-layout") and 1005ms ("users-layout") fire during redirect processing window
Dependency chain analysis replay_DescribePoint with dependencyChain: true Confirmed: effect from commit #0 → Server Action call → setState on Router → hooks error
Source code review File reads of layout.tsx, UserLayout.tsx, SetupAnalytics.tsx, analytics.ts SendAnalyticsData in both root and users layouts calls Server Action in useEffect

5. Why Other Components Aren't the Problem

The SendAnalyticsData in UserPage.tsx:221 is not part of this bug — UserPage only renders after the redirect completes (commit #3 at 3.89s). The problem is specifically the two instances in layout components that persist across the redirect boundary — they mount during commit #0 when the redirect is being processed, and their effects fire Server Actions whose responses corrupt the Router.

Bug #3 Conclusions

This one was fascinating! Let’s see how the results looked using that updated comparison rubric:

Run 1 (baseline) Run 2 (+methodology) Run 3 (+ctx+skills) Run 4 (best CI)
Did it fix the problem?
Did it identify the root cause correctly?
Did it investigate beyond the first fix? ⚠️ Wrong direction ⚠️ Wrong direction
Did it check the rest of the codebase? Partial
Did it produce the correct architectural fix?

With Bug #2, we saw a pretty consistent stairstep of improvements as we added more context and skills. Here, the agent either nailed everything or got lost in the weeds.

Agent #3 is still the one that confuses me the most. It had the same process instructions as #2, but with added app context and React skills. Somehow that led it off into the guts of React and it never actually figured out the app-level issue.

I was very pleased to see that the instructions to focus on app concerns had the desired results! In fact it was hilarious to read the LLM actually saying “oops I’m going too deep, back up!”

Overall Results

Nadia’s LLM runs got partial credit on Bug #2 (the double loading skeleton bug), and failed completely on Bug #3 (the broken redirect + Server Actions call).

By adding Replay recordings and improving the prompts with better investigative instructions, I saw my agents go from a partial fix of Bug #2 to nailing the exact cause and solution, and two of the four runs for Bug #3 also come up with the exact cause and solution as well!

This was very exciting! Of course I hoped that adding Replay would produce better results - that’s why I tried doing this in the first place. But I had no way of knowing for sure without actually running the experiment and seeing what happened.

Replay MCP’s Tools Were Critical

In all these explorations, the agents heavily used Replay MCP to understand both the app runtime behavior and Next + React’s internals. They consistently oriented themselves with RecordingOverview to surface app details, errors, and renders; used Screenshot to visualize what the app looked like at various points; and dug into behavior details with ReactRenders and NetworkRequests.

For the deep-dive exploration, the agents used SearchSources to find relevant files and ReadSource to view sources with hit counts, Then they dug deeper with numerous Logpoint and Evaluate calls to dynamically introspect behavior at key lines and hits.

As we saw from Nadia’s post and a couple of my example agent runs, just having the source code available wasn’t enough to solve these issues! Reading the app source was important, and it helped the agent understand the structure of the app codebase and find things like the Server Action. Both parts were necessary. But without the runtime data from the Replay recording, the agents could never have understood the full picture of the apps behavior, and wouldn’t have been able to nail the results.

Prompting and Context Still Matter

As I talked about in the previous post on Dan’s useDeferredValue bug: I really was surprised just how much the prompt and instructions mattered here, and how much of a difference they made!

I’m sure there’s plenty of academic studies out there that talk about all these aspects in more scientific detail. I’m just one dev who tried running some experiments locally :) But this really has impressed upon me the importance of providing good context and instructions for any autonomous agent investigative work. (And, for that matter, helps explain why my own personal hands-on agentic development workflow seems to produce good results for me.)

Replay MCP: The Time-Travel Superpower Your Agents Need

As I said in the last post: I joined Replay because I saw the potential and promise of time-travel debugging and how it can make the debugging process drastically faster and better. It enables solving bugs that are otherwise impossible, and provides insights into real runtime behavior that no other monitoring or analysis tool can provide.

Replay DevTools gave humans the ability to investigate and solve bugs with time travel.

Now, Replay MCP gives your agents the same time-travel superpowers.

And, even better: we’ve updated our Replay Github bot to automatically trigger time-travel analysis on failed E2E tests, and provide root-cause and a recommended fix as a comment on your PR! This will help your team fix flaky tests, ensure PR code quality, and improve shipping velocity. One step closer to self-healing CI/CD!

Try out Replay MCP and our E2E Test Suites integrations today, and stay tuned - we’ve got a lot more awesome time-travel debugging capabilities coming soon!