Skip to content

Latest commit

 

History

History
115 lines (92 loc) · 5.72 KB

README.md

File metadata and controls

115 lines (92 loc) · 5.72 KB

MBI-KG: A knowledge graph of structured and linked economic research data extracted from the book "Die Maschinen-Industrie im Deutschen Reich" written by Herbert Patschan in 1937

Contributor Covenant PRs Welcome Open Code Open Data Open Science

Table of contents

Repo structure

MBI-KG/
├── docs/
│   ├── talks/
│   │   ├── README_talks.md
│   │   ├── 2023.05.05_EURHISFIRM-Workshop-Kamlah-Shigapov.pdf
│   │   └── 2022.11.23_NFDI-Workshop-Research-Data-Maschinenindustrie-EN.pdf
│   ├── sparql_examples/
│   │   └── README_sparql_examples.md
│   └── README_docs.md
├── data/
│   ├── structured_data/
│   │   ├── README_structured_data.md
│   │   └── MBI_1937_structured.csv
│   ├── scanned_images/
│   │   └── README_scanned_images.md
│   ├── ocr_output/
│   │   └── README_ocr_output.md
│   ├── models/
│   │   ├── mbi-1937_print.mlmodel
│   │   ├── mbi-1937_layout.mlmodel
│   │   └── README_models.md
│   ├── kg_dataset/
│   │   ├── README_kg_dataset.md
│   │   ├── MBI_KG_bulk_cli_v1.0.ttl
│   │   ├── MBI_KG_bulk_cli_v1.0.json
│   │   ├── MBI_KG_bulk_api_v1.0.ndjson
│   │   └── MBI_KG_bulk_api_v1.0.csv
│   └── README_data.md
├── code/
│   ├── semantify.py
│   ├── requirements.txt
│   ├── entities2kg.py
│   ├── create_bulk_files_cli.sh
│   ├── create_bulk_files_api.py
│   ├── book2entities.py
│   └── README_code.md
├── README.md
├── LICENSE.md
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── CITATION.cff

Data

The folder data contains the data used in this project:

  • structured_data contains the structured data in CSV, JSON and RDF formats, representing various entities such as companies, individuals, and administrative entities.
  • scanned_images contains the scanned images of the original book pages in JPEG format with 400 dpi.
  • ocr_output contains the raw text output from the Optical Character Recognition (OCR) process, saved in plain text files.
  • models contains the OCR-models
  • kg-dataset contains bulk data exported via Wikibase API (in CSV and NFJSON formats) and also via command line php-scripts (in ttl and JSON formats)

Data availability statement: Data used in this project are freely available under the CC BY license.

Docs

The folder docs contains a documentation for this project including

Code

The folder code contains codes used in this project:

  • book2entities.py

Code availability statement: Codes used in this project are openly available under MIT license.

How to contribute

Thank you for your interest in contributing to MBI knowledge graph. All contributions are welcome.

To get started, please follow these steps:

  1. Fork the repository or clone it to your local machine.
  2. Create a new branch for your changes.
  3. Make your changes and commit them with clear commit messages.
  4. Push your changes to your forked repository.
  5. Submit a pull request to the main repository.

More info in CONTRIBUTING.md.

License

This work is licensed under the MIT license (code) and Creative Commons Attribution 4.0 International license (for everything else). You are free to share and adapt the material for any purpose, even commercially, as long as you provide attribution (see Attribution).

Attribution

Dataset (replication package):

  • Shigapov, R., Schmidt, T., Kamlah, J., Schumm, I., Streb, J., & Lehmann-Hasemeyer, S. (2024). MBI-KG: Replication package for a knowledge graph of structured and linked economic research data extracted from the 1937 book "Die Maschinen-Industrie im Deutschen Reich". MADATA, [Dataset]. https://doi.org/10.7801/467.

Paper:

  • Shigapov, R., Schmidt, T., Kamlah, J., Schumm, I., Streb, J., & Lehmann-Hasemeyer, S. (2024). MBI-KG: A knowledge graph of structured and linked economic research data extracted from the 1937 book "Die Maschinen-Industrie im Deutschen Reich". Data in Brief, 111238. https://doi.org/10.1016/j.dib.2024.111238