What is WorkRB, and what is it for?

Modern AI is starting to take part in some of the most important decisions in a person’s working life: which job listing they see first, which CVs a recruiter reads, what skill a training programme suggests next. WorkRB is a public, open way to check how well that AI is actually doing.

Think of WorkRB as a standardised test for AI in the world of work. It is a joint effort across academia, industry, and public institutions, three communities that each hold a different piece of the puzzle. Industry partners contribute models and datasets drawn from real-world hiring and talent platforms; academic labs bring methodology and published research; public institutions maintain the multilingual occupational ontologies (such as ESCO from the European Commission and O*NET from the US Department of Labor) that act as the shared vocabulary of work.

Anyone, from a university lab to a job platform to a single curious developer, can run the same test on their system. When everyone sits the same exam, results become comparable, progress becomes visible, and the field can move forward together.

What WorkRB currently covers

WorkRB today bundles 14 tasks (13 ranking tasks plus one classification task), grouped into four task families that an AI system for the world of work can be asked to do: extraction, normalisation, similarity, and matching.

📝

Extraction

Given a job ad written by a human, can the AI pick out the actual skills being discussed? For example, spotting that “built dashboards in Tableau” involves data visualisation. WorkRB tests this on three different skill-extraction datasets.

🏷️

Normalisation

People describe the same skill or job title in countless ways. WorkRB checks whether the AI can map those free-form phrases onto a shared vocabulary (such as ESCO), so that two systems can talk about the same thing. It covers normalisation of both skills and job titles.

🔎

Similarity

Is a “Data Engineer” close to a “Big Data Developer”? Are two phrasings of a skill really the same? WorkRB scores whether the AI sees past surface labels, on both job-title and skill similarity.

🤝

Matching

Which skills does a given job typically need, and the other way around? Which candidate profiles best fit a freelance project or a search query? WorkRB groups together these “given one thing, rank the other” questions across jobs, skills, and candidate profiles.

Many of these tasks are evaluated in multiple languages, because work doesn’t happen in English alone. Tasks built on ESCO cover up to 28 languages, the full ESCO coverage; others are narrower (for example, candidate-matching tasks cover 5 languages, and one skill-similarity dataset is English-only). The cross-lingual setting (say, French queries against English targets) is supported alongside the monolingual one where the data allows it. The exact language coverage per task is listed in the repository.

WorkRB is under continuous development by community contributions. Future directions include occupational activities, additional ontologies, temporal evaluation, career-path recommendation, and multi-branch architectures such as large language models and re-ranking approaches. If you have an idea for something WorkRB should test, or a project you’d like to work on together, open an issue on GitHub or email us at workrb@techwolf.ai.

Why a shared yardstick matters

AI in HR and the labour market sits close to decisions that affect real lives: hiring, learning, pay, and access to opportunity. Today, the systems that drive those decisions are evaluated mostly behind closed doors, by the same teams that build them. Claims are hard to compare. Progress is easy to overstate. And outsiders almost never get the chance to spot blind spots.

WorkRB is a joint attempt to fix that, built across academia, industry, and public institutions. By being open about the tasks, the data, and the scoring, it gives:

Researchers a place to test new ideas against a serious, public reference.
HR teams and platforms a way to ask vendors “how does your AI score on this?” and get a number they can compare to other vendors.
Public institutions and policy makers a clearer view of what these systems can and can’t do, and a basis for compliance and standardisation work (such as the EU AI Act’s requirements for high-risk employment AI).
Jobseekers and workers, indirectly, fairer and more accurate tools downstream.

How it works, briefly

WorkRB is a free, open-source software package. Anyone with an AI model for jobs or skills can plug it in and get a scorecard within minutes, on the same tasks everyone else is using. The package comes with reference models out of the box, so every new result has a baseline to compare against.

We deliberately keep the technical surface small on this page. If you build or run these systems, the best place to start is the GitHub repository, where you’ll find the installation steps, the full list of tasks, the baselines, and the code you need to add your own model.

Go to the GitHub repository →

Get involved

WorkRB is open source and sustained by its three-pillar community: academia, industry, and public institutions. Wherever you sit, there’s a useful way to take part.

Academic researchers: share datasets, propose new tasks, or run your model and report results against the public reference.
Industry partners: contribute models, baselines, and real-world task formulations from hiring, talent management, and workforce analytics.
Public institutions and employment services: help keep the multilingual occupational ontologies aligned with how the labour market actually evolves.
HR teams, recruiters, and career professionals: tell us which real-world questions are missing from the test, so the benchmark stays grounded in what actually matters at work.
Developers: contribute baselines, fix issues, or extend the framework through the GitHub repository.
Educators and writers: help us explain the project to a wider audience and improve the documentation.

For research collaborations, partnerships, or any conversation that doesn’t fit in a code review, please email us at workrb@techwolf.ai. For anything code-related, the GitHub repository is the best starting point.

Cite

If you write about WorkRB in academic work, please cite the framework paper below. The companion Unified Work Embeddings paper is the reference for the bidirectional multi-task ranker used as a baseline.

WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain
De Lange, Veys, Retyk, Deniz, Jouanneau, Zhang, Bielinski, Jouffroy, Clobes, Baranowska, Graus, Palyart, Zbib, Gkatzia, Demeester, De Bie, Bogers, Decorte, Van Hautte. 2026.
arXiv:2604.13055

@article{delange2026workrb,
  title   = {WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain},
  author  = {De Lange, Matthias and Veys, Warre and Retyk, Federico and Deniz, Daniel
             and Jouanneau, Warren and Zhang, Mike and Bielinski, Aleksander
             and Jouffroy, Emma and Clobes, Nicole and Baranowska, Nina
             and Graus, David and Palyart, Marc and Zbib, Rabih and Gkatzia, Dimitra
             and Demeester, Thomas and De Bie, Tijl and Bogers, Toine
             and Decorte, Jens-Joris and Van Hautte, Jeroen},
  journal = {arXiv preprint arXiv:2604.13055},
  year    = {2026}
}

Companion baseline paper:

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker
Matthias De Lange, Jens-Joris Decorte, Jeroen Van Hautte. 2025.
arXiv:2511.07969

Challenges powered by WorkRB

WorkRB is the evaluation backbone for community challenges in the HR-AI space.

RecSys-HR 2026 Challenge LIVE

ESCO skill extraction and normalisation in free-form text, scored with graded relevance through WorkRB. Registration is open on CodaBench, starter code is on GitHub, and €5000 in prizes is up for grabs (with a dedicated student track). Top teams get a pitch slot at the RecSys-HR workshop.

Details → Register on CodaBench →