Search "how much do AI training jobs pay" and you'll get the same answer in fifty different blog posts: around $15 to $25 an hour. That number is technically true, narrowly. It's also the most misleading single statistic in the industry.
The real distribution isn't a bell curve clustered around $20. It's bimodal — a wide crowd tier between $8 and $25, a narrow specialist tier between $30 and $60, and a small but well-paid expert tier between $75 and $150. The platforms publish the crowd number because that's where most of the volume lives. They quietly pay the expert number to people they can't replace.
This post walks through what each tier actually does, what moves you between them, and why most freelancers stay stuck in the bottom band for years longer than they need to.
Tier 1 — Crowd ($8–25/hr)
The crowd tier is everything that can be done by someone with a stable internet connection, fluent English (or another supported language), and 30 minutes of training. Examples:
- Image labeling for autonomous-vehicle datasets
- Sentiment / topic tagging on short text
- Side-by-side preference comparisons between two model outputs ("which response is more helpful?")
- Transcript correction and basic content moderation
Pay sits at $8–25/hr depending on geography (rate floors are often country-specific), task difficulty, and the platform's current supply-demand balance. The volume is enormous and the entry bar is low — which is exactly why pay is bounded. Anyone can do this work, so no one can charge much for it.
Tier 2 — Specialist ($30–60/hr)
The specialist tier requires either a verifiable skill, a verifiable language pair, or a verifiable domain. Examples:
- Code review for AI assistants (you can read and write idiomatic code in 2+ languages)
- RLHF rubric grading on technical reasoning prompts
- Translation review in mid-tier language pairs (e.g. EN ↔ PT-BR, EN ↔ JA, EN ↔ TR)
- Red-teaming non-safety-critical models for capability discovery
- Multimodal annotation requiring genuine domain literacy (charts, legal documents, scientific figures)
What separates this tier from the crowd is gatekeeping. Platforms screen you in — usually with a written exam, a graded probation period, or an interview — and you keep the rate as long as your output quality stays above their internal threshold. Fall below it, and you get bounced back to the crowd or paused entirely.
Tier 3 — Expert ($75–150/hr)
The expert tier is where the platforms reach for credentials, not just skills. They want a JD, an MD, a PhD, a CFA, or a verifiable senior IC track record. Examples:
- Medical Q&A safety review (licensed clinicians)
- Legal-domain rubric grading and contract clause analysis (practicing attorneys)
- Adversarial probing of frontier models for capability uplift in biology, chemistry, cybersecurity (subject-matter PhDs)
- Senior code review for security-sensitive generation (10+ year engineers, ideally with infosec credentials)
- Mathematics and theorem-proving rater work (graduate-level mathematicians)
This work isn't optional for frontier labs. They literally cannot ship a medical assistant without medically-licensed people in the eval loop. They cannot ship a legal copilot without practicing lawyers in the rubric loop. So they pay accordingly — and the rate stays high because the supply of credentialed people willing to do part-time evaluation work is small.
What actually moves you between tiers
Three things, in roughly this order of impact:
- A verifiable credential or portfolio. A LinkedIn profile is not enough. Platforms want a PDF transcript, a license number, a published paper, a GitHub track record, or a graded probation period. Until you give them something they can verify, you stay in the crowd tier no matter how good your output is.
- A consistently low rejection rate. Every platform tracks your output rejection rate (the fraction of your submissions a reviewer flags as low-quality). Sub-3% rejection across 100+ submissions is roughly the floor for moving up. Sub-1% is what gets you specialist invitations and harder rubrics.
- Showing up. Employers quietly hand their best briefs to raters with high weekly engagement (10+ hours). If you log in twice a month, you'll see the slow leftover briefs forever, regardless of how good your work is.
Why most freelancers stay stuck
The honest reason: most people working AI training work don't ever submit their credentials. The platforms make it surprisingly hard — buried profile fields, opaque tier names, no acknowledgment that an expert tier even exists. A clinician who joined as a generalist three years ago might be sitting at $22/hr today because nobody ever told them to upload a medical license.
The second-most-honest reason: most freelancers are signed up on platforms that broker work, not employers that hire. Brokers re-sell you as a lead. Employers stand behind the brief, pay you directly, and keep verified experts on a roster they own. The economics of those two things look almost identical to a worker until payday — and then they diverge sharply.
That second reason is most of why we built OBG. We chose to be the employer. We publish the briefs we contract with frontier labs to deliver, and we pay the expert who delivers them — directly, weekly, with no chain of middlemen taking a cut at every step.