JobSpec
Turn any messy job posting into clean, structured JSON — from a URL, raw HTML, or plain text.
JobSpec is a job-posting parser. Point it at a job ad and it returns a normalized record: title, company, location, salary range, employment type, experience level, responsibilities, requirements, skills, benefits, and more. It works whether you have a live job URL, a saved HTML page, or a block of pasted text, so you can stop writing brittle per-site scrapers and HTML-regex hacks and just get fields you can store in a database.
Endpoints
| Endpoint | What it returns |
|---|---|
POST /extract |
A normalized JobRecord parsed from a single posting. Send a JSON body with one of url, html, or text. Returns: title, company (name, url, industry, size), location (city, state, country, remote), employment_type, experience_level, salary (min, max, currency, period), description, responsibilities[], requirements[], nice_to_have[], skills[], benefits[], posted_date, apply_url, source_url, job_id, and cached. |
GET /health |
Simple liveness check. Returns {"status": "ok"}. |
Why this API
- Three input modes, one schema. Pass a live
url(JobSpec fetches and cleans the page for you), rawhtmlyou already have, or plaintextyou pasted — the output shape is identical every time. - One consistent record across every job board. Greenhouse, Lever, company career pages, PDFs you've turned into text — they all collapse into the same flat, predictable JSON instead of a different DOM per site.
- Salary parsing built in. Ranges like "$180k–$230k" come back as structured
salary.min,salary.max,currency, andperiodnumbers you can filter and sort on. - Normalized enums and skill tags.
employment_type(full_time, part_time, contract, internship, temporary, volunteer) andexperience_level(entry, mid, senior, lead, executive) are normalized, andskills[]are short tags likePython,AWS,Reactrather than full sentences. - Cached by default. Identical requests are served from a 24-hour cache, so repeat lookups are fast and don't re-run extraction.
Typical use cases
- Build a job board or aggregator by ingesting postings from many sources into one uniform schema.
- Power an ATS or recruiting tool that needs structured requirements, skills, and salary out of free-form ads.
- Normalize salary data across thousands of postings for market/compensation analysis.
- Feed a job-matching engine with clean
skills[]andrequirements[]arrays instead of raw HTML. - Clean up scraped or pasted listings without maintaining a parser for every site layout.
Good to know
- You must send exactly one input. Provide
url,html, ortextin the JSON body. An empty request returns422. If you send aurl, JobSpec fetches it server-side, strips scripts/styles/images, and parses the visible text. - Extraction is LLM-based. Job records are produced by a language model from the posting content, not from an official structured feed. It is accurate on standard postings but, like any model, can occasionally miss or misread fields — treat output as high-quality structured guesses, not a system of record.
- Most fields are nullable. Any scalar the posting doesn't mention comes back as
null; list fields (responsibilities,requirements,nice_to_have,skills,benefits) come back as[]. Always code defensively. - Results are cached for ~24 hours. A repeated identical request returns the same record with
cached: true. Vary your input to force a fresh parse. - URL fetching has limits. JobSpec follows up to 5 redirects, uses a 15-second timeout, and reads up to ~500 KB of HTML. Pages that require login, heavy JavaScript rendering, or aggressive bot protection may not fetch — in those cases pass the
htmlortextyourself. - One posting per call.
/extractparses a single job at a time; there is no bulk/search endpoint. (This is a structured-extraction API — it does not search or list jobs for you.)