Circle of Wizards All APIs on RapidAPI →

ResearchRecon

Turn any company URL into structured competitive intelligence — company profiles, pricing tiers, hiring signals, Reddit sentiment, and page-change monitoring, all returned as clean JSON.

ResearchRecon scrapes the messy public web and hands you back structured data. Point it at a company homepage, a SaaS pricing page, a careers page, or a Reddit thread, and an LLM extraction pipeline parses the raw HTML into typed JSON fields you can drop straight into a dashboard, CRM, or sales-intel workflow. It solves the part nobody wants to build: fetching pages reliably, stripping the markup, and coaxing consistent structure out of wildly inconsistent layouts.

Endpoints

All extraction endpoints are POST and take a JSON body. Every endpoint accepts url (required) and an optional use_browser boolean (default false) that switches on a headless Chromium render for JavaScript-heavy pages.

Endpoint What it returns
POST /extract/company Structured company profile: company_name, description, founders[], funding, industry, headquarters, employee_count, products[]. Returns url and a raw model-output field.
POST /extract/pricing Structured pricing: pricing_model, tiers[] (each with name, price, features[]), free_trial, notes. Returns url and raw.
POST /extract/jobs Hiring intelligence: company, total_openings, jobs[] (each with title, location, department, url, description_snippet), and hiring_signals[] (strategic insights inferred from the openings). Returns url and raw.
POST /extract/reddit Reddit discussion analysis for a given Reddit URL: subreddit, posts[] (each with title, url, score, comments, summary), sentiment (positive / negative / neutral / mixed), and themes[]. Returns url and raw.
POST /monitor/competitor Change detection for a page: status (changed or no_change), current_hash, previous_hash, and — when content changed — a plain-text change_summary describing what's different from the last snapshot.
GET /health Service liveness check (status, service).

Why this API

  • One call, structured output. No HTML parsing, no selectors to maintain, no LLM prompt engineering on your end — send a URL, get typed JSON fields back.
  • Reddit sentiment from the real source. The Reddit endpoint fetches Reddit's own .json API (not a scraped HTML mirror), reads up to 20 posts with their real scores and comment counts, then summarizes sentiment and recurring themes.
  • JavaScript rendering on demand. Set use_browser: true and the page is loaded in headless Chromium (waiting for network idle) so single-page apps and lazy-loaded content extract correctly.
  • Built-in change monitoring. /monitor/competitor stores a SHA-256 hash and snapshot of each page; on the next call it detects whether anything changed and, if so, returns an LLM-written summary of the diff.
  • Cached and fast. Extraction results are cached so repeat lookups of the same URL come back quickly, and page snapshots are retained for a rolling week to power change detection.

Typical use cases

  • Enrich a CRM or lead list: feed company homepages to /extract/company for founders, funding, HQ, and product lines.
  • Competitive pricing tracking: pull a rival's pricing page into structured tiers and feature lists.
  • Talent-signal scouting: read a competitor's careers page to see what roles they're filling and what that implies about their roadmap.
  • Brand and product sentiment: analyze Reddit threads for how a product is actually being received.
  • Watch a competitor's homepage or pricing page and get alerted, with a written summary, when it changes.

Good to know

  • url is required on every extraction endpoint. The /extract/reddit endpoint expects a Reddit URL (e.g. https://reddit.com/r/python or a thread/search URL), not a free-text company name — it fetches that URL's .json feed directly.
  • Extraction is LLM-driven, so any field can be null when the page doesn't contain that information or the model can't determine it. Each response also includes a raw field with the model's unparsed output for debugging.
  • The company endpoint enriches beyond the page. For well-known companies it may fill founders, funding, and employee_count from public knowledge; these can be approximations (prefixed with ~, e.g. ~$8.7B raised, ~8,000). Obscure companies will return null rather than invented figures. Treat estimated values as approximate.
  • Monitoring needs two calls. The first /monitor/competitor call on a URL establishes the baseline and returns no_change with a null previous_hash; subsequent calls compare against the stored snapshot. Snapshots are kept for about a week.
  • Page content is truncated before analysis (roughly the first several thousand characters per page), so very long pages are summarized from their leading content.
  • use_browser: true is slower but necessary for JS-rendered pages; the default fast path uses a direct HTTP fetch.