🐝 The challenge is live. Rules, datasets, starter code, and the validation submission system are all open. Register on CodaBench, grab the starter repo, and start submitting.

RecSys-HR 2026: WorkRB Challenge

Push the limits of large-scale skill intelligence, and take home a share of €5000.

Part of the workshop series. Organised by with .

The challenge

Do you want to push the limits of large-scale skill intelligence? Or are you just excited to take home a share of the €5000 prize money?

Skill intelligence still isn’t solved. ESCO has more than 13,000 labels, annotation in this space is far from complete, and the binary yes/no labels used by most published skill-extraction benchmarks routinely miss true positives and treat a near-miss the same as pure nonsense. The result: lexical shortcuts and overfitted classifiers can score as well as systems that genuinely understand the skill space, and the field ends up rewarding the wrong behaviour.

This challenge gives you a sharper instrument. We take the popular open-source skill-extraction and skill-normalisation benchmarks (House, Tech, TechWolf, SkillSkape, SkillNorm) and re-annotate them with graded relevance across five interpretable levels, then release the new annotations to the community (see Evaluation for the label definitions). Your task: recommend the right ESCO skills from free-form text, and normalise extracted skills to the right ESCO concepts.

Everything runs through the open-source WorkRB toolbox: it standardises ranking outputs, generates and scores every submission, and is the same evaluation backbone you can keep using for your own work long after this challenge ends.

Anyone can join

The starter repository contains detailed instructions to quickly bring you up to speed with state-of-the-art baselines and the contributions of this challenge. The development environment lowers the barrier to entry, so anyone can start experimenting with their own ideas (though basic familiarity with machine learning helps 😉).

Prizes

Three-step prize podium. The first-place step holds two figures side by side: one is a main-track winner holding a trophy, the other wears a graduation cap and represents the student-track winner.
🥇
€2000
1st place
🥈
€1000
2nd place
🥉
€500
3rd place
🎓
€1000
Best student team

If the best student team finishes in the top 2, the 3rd-place team receives €1000 and the 4th-place team receives €500.

Task & datasets

The task is skill extraction and skill normalisation against ESCO, scored with graded relevance through the WorkRB toolkit. The starter repository ships a hello-world training setup, baselines, and a knowledge notebook so you can get to a first submission quickly.

Training

Any open-source dataset is allowed, except for the ones used in the test set (see Rules). The starter scripts already include TechWolf’s synthetic ESCO skill sentences. The SOTA notebook lists further useful skill-extraction datasets, but don’t let the starter files limit your imagination. Maybe regularising with non-skill extraction data helps semantic understanding 👀

Validation

Open-source skill-extraction and skill-normalisation validation sets, enriched with graded-relevance annotations, are available now as new tasks inside WorkRB. The validation phase has no submission cap, so use it freely to debug your pipeline. Not all datasets have a validation set, so this signal is more limited than the final evaluation. Feel free to adjust the validation score code in your local setup!

Test

The graded-relevance test annotations are kept hidden during the challenge and will be released through WorkRB after submissions close.

Evaluation & metrics

A new evaluation dimension for skill recommenders.
That binary scoring is what graded relevance is designed to replace. This challenge introduces graded relevance: every (query, skill) pair gets one of five interpretable levels. These surface insights about state-of-the-art skill extraction that binary metrics have been hiding, and more broadly inform how recommender systems should be evaluated when the label space is large, noisy, and only partially observed.

The challenge uses a ranking metric, which means your model must be able to return an ordered list of predictions. This does not restrict you to similarity-based solutions: a classifier can return skills ordered by logits or output probabilities.

Metric: nDCG

$$\mathrm{DCG}@k \;=\; \sum_{i=1}^{k} \frac{2^{\mathrm{rel}_i} - 1}{\log_2(i + 1)}$$

nDCG normalises $\mathrm{DCG}@k$ by the ideal $\mathrm{DCG}@k$, giving a score in $[0, 1]$. Given the sparse annotations, we decided to define k as the total target space size per task.

Relevance levels

Every (query, skill) pair is assigned one of five graded relevance levels. The levels are defined as follows:

LevelMeaning
4Correct: originally a positive in the binary-labelled dataset, or a direct replacement of it.
3Strongly relevant: the skill is clearly implied by the query, even if not literally named.
2Adjacent: the skill could reasonably be recommended but is not core to the query (granularity off).
1Plausible: the skill fits the broader domain but is not mentioned or implied (activity off).
0Nonsense: wrong domain entirely.

Queries & new annotations

Queries are drawn from popular open-source skill-extraction and skill-normalisation benchmarks: House, Tech, TechWolf, SkillSkape, and SkillNorm. We re-annotate them with the graded-relevance levels above and release the new annotations to the community through WorkRB: the validation set is available now, and the test set counterpart is published after the challenge ends.

Final score

The final score is a macro-average of the nDCG@100 scores.

Timeline

WhenWhatDescription
2 Jun 2026 (now live) Challenge resources launch Rules, training data, hello-world training setup, and knowledge notebook are out. Registration is open on CodaBench. The WorkRB validation tasks and submission system are live, with no submission cap during the validation phase.
15 Jun 2026 Test submissions open You can submit test rankings. The leaderboard goes live on CodaBench.
31 Jul 2026 Submissions close Every team submits their training code (kept confidential, not judged on quality) so we can collect insights from every approach, not only the winners. The organisers then begin work on the summary paper. Submitters whose approach surfaces something new will be invited to either publish an arXiv version themselves or be cited by a self-chosen name.
Sep 2026 Summary paper on arXiv Summary paper published, citing the arXiv works of top performers and naming the systems they describe.
RecSys-HR workshop Workshop & awards Top-3 teams get a 5-minute pitch slot at RecSys-HR, one of the largest recommender-systems workshop tracks. Other teams with strong results may be invited to present in the poster session: a chance to put your work in front of the field.

How to participate

  1. Register on CodaBench. Provide your name, email, and affiliation, and indicate whether you’re a student (this is verified later if you finish in a prize position on the student track). Registration counts as consent to the challenge rules and to the public release of your team name and leaderboard score.
  2. Clone the starter repository. It contains the hello-world training setup, the baselines, a knowledge notebook with the context you need to get going, and the exact code to generate submission-ready files. Recommended starting point: git clone and run the example notebook end-to-end.
  3. Submit ranking files to CodaBench. Ranking files are produced by the WorkRB package so the format is consistent across participants. Validation submissions are unlimited; the test phase allows up to 5 submissions per team.

Stuck on something or want to flag an issue? Email recsys-hr-challenge@techwolf.ai.

Rules

Any violation excludes you from both the main and the student prize tracks.

Support & organisers

Challenge-specific questions: recsys-hr-challenge@techwolf.ai
WorkRB toolkit, integrations, research collaborations: workrb@techwolf.ai

Organising community