Efficiently standardizes scraped job titles to Bureau of Labor Statistics (BLS) titles using a high-performance TF-IDF algorithm.
Maintainer(s):
martin-conur
Installing and Loading
INSTALL title_mapper FROM community;
LOAD title_mapper;
Example
-- Standardize a column
SELECT standardize_title(scraped_title_column) FROM your_table;
-- Standardize tech job titles
SELECT standardize_title('Sr. Software Eng') AS standardized_title;
-- Result: 'Software Engineer - Software Developers'
-- Standardize healthcare titles
SELECT standardize_title('RN - Emergency Room') AS standardized_title;
-- Result: 'Registered Nurse - Registered Nurses'
About title_mapper
DuckDB Title Mapper
duckdb-title-mapper is a highly optimized DuckDB extension written in Rust. It standardizes scraped job titles to BLS (Bureau of Labor Statistics) standard titles using a fast TF-IDF implementation.
What It Does
This extension transforms messy, inconsistent job titles from various sources into standardized BLS titles:
| Scraped Title (Input) | Standardized Title (Output) |
|---|---|
| Sr. Software Eng | Software Engineer |
| Registered Nurse - ICU | Registered Nurse |
| Accountant III | Accountant |
| Sales Rep (B2B) | Sales Representative |
| Elementary School Teacher - 3rd Grade | Elementary School Teacher |
| Exec. Chef | Executive Chef |
| Marketing Coordinator/Specialist | Marketing Specialist |
| Licensed Practical Nurse (LPN) | Licensed Practical Nurse |
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| standardize_title | scalar | Returns the BLS standard title using TF-IDF | NULL | [SELECT standardize_title(scraped_title_column) FROM your_table;] |