Skip to content

Latest commit

 

History

History
56 lines (44 loc) · 2.8 KB

README.md

File metadata and controls

56 lines (44 loc) · 2.8 KB

Foreign Names and Surnames Dataset

Overview

This repository contains structured datasets for the most popular names and surnames across various languages, including Ukrainian, Russian, Belarusian, and others. Each dataset includes transliterations, equivalents in Polish and English, informal diminutives, and other variants. The primary goal is to assist in transliteration accuracy, typo detection, and cross-linguistic studies.

Contents

The repository currently includes:

  1. Ukrainian Names and Surnames

    • Female Names
    • Male Names
    • Surnames
  2. Russian Names and Surnames

    • Female Names
    • Male Names
    • Surnames
  3. Belarusian Names and Surnames

    • Female Names
    • Male Names
    • Surnames
  4. Vietnamese Names and Surnames

    • Female Names
    • Male Names
    • Surnames
  5. Planned Additions

    • Frequency of occurrence of the given names and surnames.

Dataset Structure

Each dataset contains circa 50 female names, 50 male names, and 50 surnames. The data is organized into tables with the following columns:

  • Original Name: The name or surname in its original script.
  • Polish Transliteration: Transliteration based on Polish orthography.
  • Alt Polish Trans: Alternative Polish transliterations.
  • Polish Equivalent: Direct equivalent in Polish.
  • English Equivalent: Standard English equivalent.
  • Informal Diminutive: Common informal or diminutive forms.
  • Other Variants: Additional regional or historical forms.

Example File Structure

Each file is formatted as a XLSX table, and an example structure is shown below:

Female Names (Ukrainian Example)

Original Name Polish Transliteration Alt Polish Trans Polish Equivalent English Equivalent Informal Diminutive Other Variants
Олена Olena Olona Helena Helen Olenka Alyona
Наталія Natalia Nataliya Natalia Natalie Nata Natasha

Surnames (Ukrainian Example)

Original Name Polish Transliteration Alt Polish Trans Polish Equivalent English Equivalent Informal Diminutive Other Variants
Шевченко Szewczenko Szevczenko Szewczyk Shevchenko Chevtchenko
Мельник Melnyk Melnik Mielnik Melnyk Melnykov