The challenge

Do you want to push the limits of large-scale skill intelligence? Or are you just excited to take home a share of the €5000 prize money?

🏁 Register on CodaBench Sign up and submit your rankings 🚀 Get the starter code Baselines, notebook, ready-to-run setup

Skill intelligence still isn’t solved. ESCO has more than 13,000 labels, annotation in this space is far from complete, and the binary yes/no labels used by most published skill-extraction benchmarks routinely miss true positives and treat a near-miss the same as pure nonsense. The result: lexical shortcuts and overfitted classifiers can score as well as systems that genuinely understand the skill space, and the field ends up rewarding the wrong behaviour.

This challenge gives you a sharper instrument. We take the popular open-source skill-extraction and skill-normalisation benchmarks (House, Tech, TechWolf, SkillSkape, SkillNorm) and re-annotate them with graded relevance across five interpretable levels, then release the new annotations to the community (see Evaluation for the label definitions). Your task: recommend the right ESCO skills from free-form text, and normalise extracted skills to the right ESCO concepts.

Everything runs through the open-source WorkRB toolbox: it standardises ranking outputs, generates and scores every submission, and is the same evaluation backbone you can keep using for your own work long after this challenge ends.

Anyone can join

The starter repository contains detailed instructions to quickly bring you up to speed with state-of-the-art baselines and the contributions of this challenge. The development environment lowers the barrier to entry, so anyone can start experimenting with their own ideas (though basic familiarity with machine learning helps 😉).

Task & datasets

The task is skill extraction and skill normalisation against ESCO, scored with graded relevance through the WorkRB toolkit. The starter repository ships a hello-world training setup, baselines, and a knowledge notebook so you can get to a first submission quickly.

Training

Any open-source dataset is allowed, except for the ones used in the test set (see Rules). The starter scripts already include TechWolf’s synthetic ESCO skill sentences. The SOTA notebook lists further useful skill-extraction datasets, but don’t let the starter files limit your imagination. Maybe regularising with non-skill extraction data helps semantic understanding 👀

Validation

Open-source skill-extraction and skill-normalisation validation sets, enriched with graded-relevance annotations, are available now as new tasks inside WorkRB. The validation phase has no submission cap, so use it freely to debug your pipeline. Not all datasets have a validation set, so this signal is more limited than the final evaluation. Feel free to adjust the validation score code in your local setup!

Test

Skill extraction: House, Tech, TechWolf, and SkillSkape test sets.
Skill normalisation: ESCO skill normalisation.

The graded-relevance test annotations are kept hidden during the challenge and will be released through WorkRB after submissions close.

Evaluation & metrics

A new evaluation dimension for skill recommenders.
That binary scoring is what graded relevance is designed to replace. This challenge introduces graded relevance: every (query, skill) pair gets one of five interpretable levels. These surface insights about state-of-the-art skill extraction that binary metrics have been hiding, and more broadly inform how recommender systems should be evaluated when the label space is large, noisy, and only partially observed.

The challenge uses a ranking metric, which means your model must be able to return an ordered list of predictions. This does not restrict you to similarity-based solutions: a classifier can return skills ordered by logits or output probabilities.

Metric: nDCG

$$\mathrm{DCG}@k \;=\; \sum_{i=1}^{k} \frac{2^{\mathrm{rel}_i} - 1}{\log_2(i + 1)}$$

nDCG normalises $\mathrm{DCG}@k$ by the ideal $\mathrm{DCG}@k$, giving a score in $[0, 1]$. Given the sparse annotations, we decided to define k as the total target space size per task.

Relevance levels

Every (query, skill) pair is assigned one of five graded relevance levels. The levels are defined as follows:

Level	Meaning
4	Correct: originally a positive in the binary-labelled dataset, or a direct replacement of it.
3	Strongly relevant: the skill is clearly implied by the query, even if not literally named.
2	Adjacent: the skill could reasonably be recommended but is not core to the query (granularity off).
1	Plausible: the skill fits the broader domain but is not mentioned or implied (activity off).
0	Nonsense: wrong domain entirely.

Queries & new annotations

Queries are drawn from popular open-source skill-extraction and skill-normalisation benchmarks: House, Tech, TechWolf, SkillSkape, and SkillNorm. We re-annotate them with the graded-relevance levels above and release the new annotations to the community through WorkRB: the validation set is available now, and the test set counterpart is published after the challenge ends.

Final score

The final score is a macro-average of the nDCG@100 scores.

Timeline

When	What	Description
2 Jun 2026 (now live)	Challenge resources launch	Rules, training data, hello-world training setup, and knowledge notebook are out. Registration is open on CodaBench. The WorkRB validation tasks and submission system are live, with no submission cap during the validation phase.
15 Jun 2026	Test submissions open	You can submit test rankings. The leaderboard goes live on CodaBench.
31 Jul 2026	Submissions close	Every team submits their training code (kept confidential, not judged on quality) so we can collect insights from every approach, not only the winners. The organisers then begin work on the summary paper. Submitters whose approach surfaces something new will be invited to either publish an arXiv version themselves or be cited by a self-chosen name.
Sep 2026	Summary paper on arXiv	Summary paper published, citing the arXiv works of top performers and naming the systems they describe.
RecSys-HR workshop	Workshop & awards	Top-3 teams get a 5-minute pitch slot at RecSys-HR, one of the largest recommender-systems workshop tracks. Other teams with strong results may be invited to present in the poster session: a chance to put your work in front of the field.

How to participate

Register on CodaBench. Provide your name, email, and affiliation, and indicate whether you’re a student (this is verified later if you finish in a prize position on the student track). Registration counts as consent to the challenge rules and to the public release of your team name and leaderboard score.
Clone the starter repository. It contains the hello-world training setup, the baselines, a knowledge notebook with the context you need to get going, and the exact code to generate submission-ready files. Recommended starting point: git clone and run the example notebook end-to-end.
Submit ranking files to CodaBench. Ranking files are produced by the WorkRB package so the format is consistent across participants. Validation submissions are unlimited; the test phase allows up to 5 submissions per team.

Stuck on something or want to flag an issue? Email recsys-hr-challenge@techwolf.ai.

Rules

Any violation excludes you from both the main and the student prize tracks.

The training set may be enriched with any open-source dataset, except for the ones used for testing. Top performers will be requested to submit their training code.
Open-source skill-extraction training strategies cannot be exactly reproduced as your submission. Most of them are already on the leaderboard as baselines anyway.
The WorkRB package must be used to generate the standardised output rankings of your model.
Submissions during the validation phase are unlimited. Submissions during the test phase are capped at 5 per team.
The student competition is, surprisingly, only for students. 😉 The top student team will be requested to prove their student status.
If the best student team finishes in the top 2 overall, the 3rd-place team receives the €1000 prize and the 4th-place team receives €500.
By registering, participants consent to the public release of their name and leaderboard scores, which may be discussed in the challenge summary paper. The organisers reserve the right to modify or add terms and conditions at any time.

RecSys-HR 2026: WorkRB Challenge