All Collections
AI Modules
How to run a Scoring Pilot with
How to run a Scoring Pilot with
Benqa avatar
Written by Benqa
Updated over a week ago

Running a Scoring Pilot allows users to test and refine's latest AI Scoring model on a subset of data before full deployment.

The main goals of running a scoring pilot are:

  • Evaluating the scoring model performance on real data

  • Identifying any data issues or gaps

  • Laying the groundwork for broader deployment

This guide covers best practices for customers looking to implement AI-powered recruiting.

STEP 0: Introducing Cerebro,'s latest scoring model

We have recently announced our most advanced AI scoring algorithm yet - Cerebro, a 4-billion parameters model, expanding our scoring capabilities to 43 languages.

Cerebro represents a significant leap forward in Profile and Job Matching algorithms. Not only has it surpassed all previous performance benchmarks, but it has also outperformed the accuracy, speed, and efficiency of our previous GemNet Algorithm generation.

STEP 1: Confidentiality agreement

Before undertaking an AI scoring pilot, both parties sign a mutual non-disclosure agreement (MNDA) to protect data confidentiality. This MNDA should clearly outline the types of shared data, the purposes for which it can be used, and measures to prevent unauthorized access or leaks.

Any sensitive personnel data or profiles provided for training should be properly anonymized and handled according to applicable regulations. Responsible data sharing and governance practices are key to developing ethical AI tools.

STEP 2: Preparing training data

To retrain our model effectively, data is the linchpin. Our data encompasses:

  • Profiles: These can be raw files (PDF, PNG, JPEG, etc.) or structured objects such as JSON and YAML, each profile having a unique ID.

  • Jobs: Each job listing has a unique identifier and includes a detailed description.

  • Trackings (Applications History): It contains crucial data points, including profile and job IDs and the status of each application.

We recommend the sharing of at least :

  • 10k profiles

  • 1K jobs

  • 10k trackings

Recognizing that data distribution varies from client to client, we understand the necessity of a tailored approach, so we've adopted a multistage training process:

We have produced a foundational general model easily adaptable to retraining using client-specific data. This approach ensures our AI stays in sync with the ever-evolving data distribution.

Two fundamental points should be taken into consideration when preparing your training data:

  • The data should be selected randomly. If any selection bias is applied during this selection process, the model's performance will be reduced.

  • The tracking data should not be very skewed (for example having a 5% or less positive matching rate between profiles and jobs). If this is the case, we highly recommend sending more data to have a good enough representation of each application status.

STEP 3: Sharing training data

Clients have two convenient methods for sharing their data with us:

Method 1 - Upload Notebook

Use our APIs directly to upload data using the following resources:

  1. Parsing Notebook (if using raw data): This Notebook helps you parse your raw data and turn it into structured objects.

  2. Upload Notebook: This Notebook helps you upload your structured data

However, this first approach is usually slower and error-prone. This is why we recommend following the second method.

Method 2 - Email a secured Zip file

Send a password-locked zip file to the following email address: support+[Custommer_Subdomain_name] This is our recommended method for its speed and reliability.

Embracing diverse data formats and an adaptable training process allows us to tailor our solutions to the unique needs of each client.

STEP 4: Training and Deployment of the model

After receiving the client data, the R&D team starts with the Exploratory Data Analysis step before launching the Processing, Training, Evaluation, and Deployment steps.

Thanks to our optimized state-of-the-art pipelines, this whole process can take from 1 to 5 working days, depending on the load.

After deployment, the client receives an algorithm key, which is the name of his newly trained model. By creating a new algorithm with this key in the AI studio, the client can access all the metrics of the model and also inspect its fairness (if it's available in their region).

This newly created algorithm will be used to create either the Recruiter Copilot and/or Talent Copilot in AI Widgets.

Did this answer your question?