Production Error Triage (Sentry)
Systematic production error investigation: Sentry error analysis (group errors by fingerprint identify the 10 errors with highest occurrence rate and impact, examine breadcrumbs (sequence of events leading to error), stack traces with source map resolution (readable TypeScript stack traces from minified production builds), context (user, URL, browser, OS, app version at error time)), reproduction (reproduce error in staging environment from Sentry breadcrumb trail), root cause analysis (trace error to origin often a data state that should have been validated, a null reference assuming non-null, or race condition between async operations), fix implementation and regression test.
Memory Leak Investigation
Node.js and browser memory leak diagnosis: heap snapshot comparison (Node.js --inspect mode + Chrome DevTools Memory panel take heap snapshot, perform operation suspected to leak, take another snapshot, compare identify objects accumulated), allocation timeline (Chrome Memory panel allocation timeline identify which code paths allocate objects not garbage collected), common Node.js memory leak patterns (closures holding references to large objects, event listeners not removed, global variable accumulation, stream not properly closed, setTimeout/setInterval callbacks referencing outer scope), fix implementation (remove circular references, properly remove event listeners, implement object pooling for high-allocation paths, close streams and database connections correctly).
Race Condition Debugging
Identify and fix concurrency bugs: race condition patterns (TOCTOU Time-of-Check-Time-of-Use: check if resource exists, then use it between check and use another process modifies the resource), async/await misuse (forgetting `await` most common Node.js concurrency bug function returns before async operation completes), database concurrency bugs (two transactions updating same row simultaneously one update lost without SELECT FOR UPDATE or optimistic locking), fix strategies (database-level locks for critical sections: `SELECT FOR UPDATE` for explicit row locking, advisory locks for application-level critical sections; idempotency keys for duplicate prevention; database constraints as final safety net rely on UNIQUE constraints to reject duplicate inserts rather than relying on application-level duplicate checks).
Performance Regression Investigation
Find and fix unexpected performance degradation: git bisect (binary search through commit history to identify the specific commit that introduced the regression automated with a performance test script that fails on regressed behaviour), dependency changelog analysis (library upgrade that introduced regression identify breaking performance changes in changelog), query plan change analysis (PostgreSQL query plan change after statistics update or table growth same query now uses different, slower plan force original plan with `pg_hint_plan` while underlying cause addressed), memory growth profiling (heap snapshot comparison before and after soak test identify objects that accumulated, trace to allocation source).
Third-Party Integration Bugs
Debug bugs in integrations with external APIs and services: webhook delivery failures (Stripe, GitHub, or Twilio webhook failures examine webhook delivery logs in provider dashboard, verify webhook signature validation, check endpoint availability, identify idempotency failures on retry), OAuth flow failures (inspect OAuth state parameter, PKCE code verifier/challenge matching, token endpoint response, token storage and refresh), API version deprecation (silent breaking changes in third-party API responses implement strict response validation with Zod to catch unexpected schema changes), timeout handling (API timeouts not handled gracefully implement circuit breakers, appropriate timeout values, fallback behaviour for degraded downstream services).