nvidia/Nemotron-PII
Viewer • Updated • 200k • 3.7k • 100
How to use kalyan-ks/ettin-68m-nemotron-pii with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("token-classification", model="kalyan-ks/ettin-68m-nemotron-pii") # Load model directly
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("kalyan-ks/ettin-68m-nemotron-pii")
model = AutoModelForTokenClassification.from_pretrained("kalyan-ks/ettin-68m-nemotron-pii")Light Weight PII Detection Model | Open Source | 68M Parameters | 96.27 F1 Score | Blog Post
Ettin-68m-nemotron-pii is based on the ettin-encoder-68M model and fine-tuned over the Nemotron PII dataset. This model can detect 50+ PII entities in both structured and unstructured texts across various domains like healthcare, finance, legal, cybersecurity etc. With just 68M parameters, the model achieves a strong F1-score of 96.27.
This model can detect the following 55 PII entity types
| Entity | Description |
|---|---|
| account_number | Account Number |
| age | Age |
| api_key | API Key |
| bank_routing_number | Bank Routing Number |
| biometric_identifier | Biometric Identifier |
| blood_type | Blood Type |
| certificate_license_number | Certificate or License Number |
| city | City |
| company_name | Company Name |
| coordinate | Geographic Coordinate |
| country | Country |
| county | County |
| credit_debit_card | Credit or Debit Card Number |
| customer_id | Customer ID |
| cvv | Card Verification Value (CVV) |
| date | Date |
| date_of_birth | Date of Birth |
| date_time | Date and Time |
| device_identifier | Device Identifier |
| education_level | Education Level |
| Email Address | |
| employee_id | Employee ID |
| employment_status | Employment Status |
| fax_number | Fax Number |
| first_name | First Name |
| gender | Gender |
| health_plan_beneficiary_number | Health Plan Beneficiary Number |
| http_cookie | HTTP Cookie |
| ipv4 | IPv4 Address |
| ipv6 | IPv6 Address |
| language | Language |
| last_name | Last Name |
| license_plate | Vehicle License Plate |
| mac_address | MAC Address |
| medical_record_number | Medical Record Number |
| national_id | National Identification Number |
| occupation | Occupation |
| password | Password |
| phone_number | Phone Number |
| pin | Personal Identification Number (PIN) |
| political_view | Political View |
| postcode | Postcode / Zip Code |
| race_ethnicity | Race or Ethnicity |
| religious_belief | Religious Belief |
| sexuality | Sexuality / Sexual Orientation |
| ssn | Social Security Number |
| state | State |
| street_address | Street Address |
| swift_bic | SWIFT / BIC Code |
| tax_id | Tax Identification Number |
| time | Time |
| unique_id | Unique Identifier |
| url | URL / Web Address |
| user_name | Username |
| vehicle_identifier | Vehicle Identification Number (VIN) |
# First install Hugging Face transformers library
!pip install transformers
# Initialize and run the PII detection pipeline to extract PII entities
from transformers import pipeline
## Initialize the PII detection pipeline
ner = pipeline("ner", model="kalyan-ks/ettin-68m-nemotron-pii", aggregation_strategy="simple")
input_text = "Kalyan KS is from India. His email id is [email protected]"
## Run the PII detection to extract PII entities
pii_entities = ner(input_text)
## Process the extracted PII entities
def format_pii_entities(entities, original_text):
if not entities:
return []
merged_entities = []
entities = sorted(entities, key=lambda x: x['start'])
current_entity = {
'start': entities[0]['start'],
'end': entities[0]['end'],
'label': entities[0]['entity_group'],
'text': entities[0]['word']
}
for next_ent in entities[1:]:
is_same_label = next_ent['entity_group'] == current_entity['label']
is_adjacent = next_ent['start'] <= current_entity['end'] + 1
if is_same_label and is_adjacent:
current_entity['end'] = max(current_entity['end'], next_ent['end'])
current_entity['text'] = original_text[current_entity['start']:current_entity['end']]
else:
merged_entities.append(clean_entity(current_entity))
current_entity = {
'start': next_ent['start'],
'end': next_ent['end'],
'label': next_ent['entity_group'],
'text': next_ent['word']
}
merged_entities.append(clean_entity(current_entity))
return merged_entities
def clean_entity(ent):
raw_text = ent['text']
stripped_text = raw_text.strip()
leading_spaces = len(raw_text) - len(raw_text.lstrip())
return {
'start': ent['start'] + leading_spaces,
'end': ent['start'] + leading_spaces + len(stripped_text),
'text': stripped_text,
'label': ent['label']
}
# Display the extracted PII entities
formatted_entities = format_pii_entities(pii_entities, input_text)
print(formatted_entities)
# Output
[{'start': 0, 'end': 9, 'text': 'Kalyan KS', 'label': 'first_name'}, {'start': 18, 'end': 23, 'text': 'India', 'label': 'country'}, {'start': 41, 'end': 60, 'text': '[email protected]', 'label': 'email'}]
This model is evaluated on a 10k sample test set from Neomotron PII dataset and achieved the following results
| Metric | Score |
|---|---|
| F1 | 96.27 |
| Precision | 96.35 |
| Recall | 96.19 |
| Accuracy | 99.26 |
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| biometric_identifier | 0.9963 | 0.9966 | 0.9964 |
| date_of_birth | 0.9952 | 0.9963 | 0.9957 |
| api_key | 0.9932 | 0.9978 | 0.9955 |
| mac_address | 0.9929 | 0.9965 | 0.9947 |
| 0.9942 | 0.9942 | 0.9942 | |
| ipv4 | 0.9950 | 0.9933 | 0.9941 |
| medical_record_number | 0.9952 | 0.9904 | 0.9928 |
| health_plan_beneficiary_number | 0.9924 | 0.9925 | 0.9924 |
| vehicle_identifier | 0.9867 | 0.9977 | 0.9922 |
| bank_routing_number | 0.9967 | 0.9862 | 0.9914 |
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| occupation | 0.7200 | 0.5493 | 0.6232 |
| time | 0.8094 | 0.7781 | 0.7934 |
| age | 0.8333 | 0.9273 | 0.8778 |
| political_view | 0.8533 | 0.9247 | 0.8876 |
| state | 0.9077 | 0.8792 | 0.8932 |
| fax_number | 0.9047 | 0.9013 | 0.9030 |
| company_name | 0.9048 | 0.9072 | 0.9060 |
| national_id | 0.8995 | 0.9224 | 0.9108 |
| education_level | 0.9269 | 0.8973 | 0.9118 |
| race_ethnicity | 0.9027 | 0.9388 | 0.9204 |
occupation has low F1 score.@misc{ettin-68m-pii-2026,
title = {ettin-68m-nemotron-pii-2026: PII Detection Model},
author = {Kalyan KS},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/kalyan-ks/ettin-17m-nemotron-pii}
}