Company
Founded in Switzerland.
Artificial Intelligence Suisse SA, PO 280, Delemont, Switzerland.
ai4privacy
Collection
New release
The world's largest open multilingual PII masking corpus. 3M+ synthetic examples across 30 languages spanning Europe, the Americas, and Asia-Pacific, purpose-built for training privacy-preserving NLP models in a truly global setting.
pii-masking-openpii-1.5m
The expanded open-source core, now covering Asia-Pacific languages alongside Europe and the Americas across 30 languages.
pii-masking-work-pwi-400k
Work & HR Information (PWI): job titles, organisations, salaries, document numbers, and employment identifiers.
pii-masking-financial-pfi-400k
Financial Information (PFI): IBAN, account & card details, balances, crypto wallet addresses, and insurance policy numbers.
pii-masking-location-pli-400k
Location & Travel Information (PLI): geo-coordinates, addresses, airport & station codes, and vehicle and travel identifiers.
pii-masking-health-phi-400k
Health & Medical Information (PHI): diagnoses, medications, test results, allergies, hospital names, and medical record numbers.
pii-masking-digital-pdi-350k
Digital Information (PDI): usernames, passwords, API keys, MAC addresses, device IMEIs, OTPs, and user agents.
7 new languages
23 locales
North & South
{
"source_text": "本日の集合場所は 射水市 円池 の 昼場 6-28-20、郵便番号は 520-2111 です。",
"masked_text": "本日の集合場所は [CITY_1] の [STREET_1] [BUILDINGNUM_1]、郵便番号は [ZIPCODE_1] です。",
"privacy_mask": [ { "value": "射水市 円池", "label": "CITY" }, { "value": "昼場", "label": "STREET" }, { "value": "6-28-20", "label": "BUILDINGNUM" }, { "value": "520-2111", "label": "ZIPCODE" } ],
"language": "ja", "region": "JP", "script": "Jpan"
}
Train NER models to detect and classify PII entities with pre-computed mBERT-compatible BIO labels.
Build production-grade anonymization pipelines compliant with GDPR, the EU AI Act, PDPA, APPI, and PIPA.
Fine-tune large language models for privacy-aware text generation and redaction across languages.
Get access to the full 3M dataset including all industry-specific components and Asia-Pacific coverage, with commercial licensing for your organization.