JobSpec

Turn any messy job posting into clean, structured JSON — from a URL, raw HTML, or plain text.

JobSpec is a job-posting parser. Point it at a job ad and it returns a normalized record: title, company, location, salary range, employment type, experience level, responsibilities, requirements, skills, benefits, and more. It works whether you have a live job URL, a saved HTML page, or a block of pasted text, so you can stop writing brittle per-site scrapers and HTML-regex hacks and just get fields you can store in a database.

Endpoints

Endpoint What it returns

POST /extract A normalized JobRecord parsed from a single posting. Send a JSON body with one of url, html, or text. Returns: title, company (name, url, industry, size), location (city, state, country, remote), employment_type, experience_level, salary (min, max, currency, period), description, responsibilities[], requirements[], nice_to_have[], skills[], benefits[], posted_date, apply_url, source_url, job_id, and cached.

GET /health Simple liveness check. Returns {"status": "ok"}.

Why this API

Three input modes, one schema. Pass a live url (JobSpec fetches and cleans the page for you), raw html you already have, or plain text you pasted — the output shape is identical every time.
One consistent record across every job board. Greenhouse, Lever, company career pages, PDFs you've turned into text — they all collapse into the same flat, predictable JSON instead of a different DOM per site.
Salary parsing built in. Ranges like "$180k–$230k" come back as structured salary.min, salary.max, currency, and period numbers you can filter and sort on.
Normalized enums and skill tags. employment_type (full_time, part_time, contract, internship, temporary, volunteer) and experience_level (entry, mid, senior, lead, executive) are normalized, and skills[] are short tags like Python, AWS, React rather than full sentences.
Cached by default. Identical requests are served from a 24-hour cache, so repeat lookups are fast and don't re-run extraction.

Typical use cases

Build a job board or aggregator by ingesting postings from many sources into one uniform schema.
Power an ATS or recruiting tool that needs structured requirements, skills, and salary out of free-form ads.
Normalize salary data across thousands of postings for market/compensation analysis.
Feed a job-matching engine with clean skills[] and requirements[] arrays instead of raw HTML.
Clean up scraped or pasted listings without maintaining a parser for every site layout.

Good to know

You must send exactly one input. Provide url, html, or text in the JSON body. An empty request returns 422. If you send a url, JobSpec fetches it server-side, strips scripts/styles/images, and parses the visible text.
Extraction is LLM-based. Job records are produced by a language model from the posting content, not from an official structured feed. It is accurate on standard postings but, like any model, can occasionally miss or misread fields — treat output as high-quality structured guesses, not a system of record.
Most fields are nullable. Any scalar the posting doesn't mention comes back as null; list fields (responsibilities, requirements, nice_to_have, skills, benefits) come back as []. Always code defensively.
Results are cached for ~24 hours. A repeated identical request returns the same record with cached: true. Vary your input to force a fresh parse.
URL fetching has limits. JobSpec follows up to 5 redirects, uses a 15-second timeout, and reads up to ~500 KB of HTML. Pages that require login, heavy JavaScript rendering, or aggressive bot protection may not fetch — in those cases pass the html or text yourself.
One posting per call. /extract parses a single job at a time; there is no bulk/search endpoint. (This is a structured-extraction API — it does not search or list jobs for you.)