Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework website source to incorporate repository name change #73

Merged
merged 5 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Intrusion Detection Datasets
# COMIDDS

A comprehensive overview of datasets for research in host-based and/or network-based intrusion detection, with a focus on enterprise networks.

**The content of this repository is intended to be viewed through its [github.io site](https://fkie-cad.github.io/intrusion-detection-datasets/)!**
**The content of this repository is intended to be viewed through its [github.io site](https://fkie-cad.github.io/COMIDDS/)!**

## Content and Goals

This repository contains the website for *Intrusion Detection Datasets*, an overview of datasets for research in intrusion detection.
This repository contains the website for *COMIDDS*, an overview of datasets for research in intrusion detection.
Our goal is to provide a comprehensive and detailed list of relevant datasets along with descriptions and links, aiding researchers in finding and selecting suitable data to work with.
Beyond the [table of all datasets](https://fkie-cad.github.io/intrusion-detection-datasets/content/all_datasets/), each dataset has a separate page, listing key features and describing the underlying environment, activity, contained data, etc.
Beyond the [table of all datasets](https://fkie-cad.github.io/COMIDDS/content/all_datasets/), each dataset has a separate page, listing key features and describing the underlying environment, activity, contained data, etc.

We mainly focus on datasets suited for developing and evaluating methods for intrusion detection in enterprise networks, i.e., common office environments involving applications such as browsing, emailing, or text processing as well as services such as web, email, or database servers.
We intentionally omit datasets from very different environments such as industrial control systems or Internet exchange points.

## Contributing

Any kind of contribution is most welcome, both in the form of adding new entries and improving existing ones!
For more information, please refer to the [Contribution Guide](https://fkie-cad.github.io/intrusion-detection-datasets/content/contributing/).
For more information, please refer to the [Contribution Guide](https://fkie-cad.github.io/COMIDDS/content/contributing/).

## Further Information

For more information, please see the [About page](https://fkie-cad.github.io/intrusion-detection-datasets/content/about/).
For more information, please see the [About page](https://fkie-cad.github.io/COMIDDS/content/about/).
4 changes: 2 additions & 2 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
############################

# Name of website
title: Intrusion Detection Datasets
title: COMIDDS

# Your name to show in the footer
author: Fraunhofer FKIE, Philipp Bönninghausen
Expand Down Expand Up @@ -101,7 +101,7 @@ share-links-active:

# How to display the link to your website in the footer
# Remove this if you don't want a link in the footer
url-pretty: "Intrusion Detection Datasets"
url-pretty: "COMIDDS"

# Excerpt word length - Truncate the excerpt of each post on the feed page to the specified number of words
excerpt_length: 50
Expand Down
2 changes: 1 addition & 1 deletion _includes/gh_buttons.html
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<!-- Made with https://buttons.github.io/ -->

<div class="text-center">
<a class="github-button" href="https://github.com/fkie-cad/intrusion-detection-datasets/subscription" data-color-scheme="no-preference: light; light: light; dark: dark;" data-icon="octicon-eye" data-size="large" data-show-count="true" aria-label="Watch fkie-cad/intrusion-detection-datasets on GitHub">Watch</a>
<a class="github-button" href="https://github.com/fkie-cad/COMIDDS/subscription" data-color-scheme="no-preference: light; light: light; dark: dark;" data-icon="octicon-eye" data-size="large" data-show-count="true" aria-label="Watch fkie-cad/intrusion-detection-datasets on GitHub">Watch</a>
<a class="github-button" href="https://github.com/fkie-cad/intrusion-detection-datasets" data-color-scheme="no-preference: light; light: light; dark: dark;" data-icon="octicon-star" data-size="large" data-show-count="true" aria-label="Star fkie-cad/intrusion-detection-datasets on GitHub">Star</a>
Maspital marked this conversation as resolved.
Show resolved Hide resolved
</div>

Expand Down
10 changes: 5 additions & 5 deletions _posts/2024-01-23-initial-post.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
layout: post
title: First release of Intrusion Detection Datasets
title: First release of COMIDDS
subtitle: 43 datasets described in detail, with more to come!
gh-repo: fkie-cad/intrusion-detection-datasets
gh-badge: [star, fork, follow]
Expand All @@ -9,14 +9,14 @@ comments: true
author: Philipp Bönninghausen
---

This post marks the beginning of the "Intrusion Detection Datasets" collection.
This post marks the beginning of COMIDDS.
It is intended to be a comprehensive resource for anyone looking for a dataset suitable for IDS development and evaluation.
However, with research requirements often being complex (and dataset documentation often being lacking), this collection aims to be more than just a list of names and one-line descriptions.

### Features
All datasets are summarized in a [table](/intrusion-detection-datasets/content/all_datasets), which lists some relevant information for each entry - helpful when you want to quickly determine which of them might me useful to you.
All datasets are summarized in a [table](/COMIDDS/content/all_datasets), which lists some relevant information for each entry - helpful when you want to quickly determine which of them might me useful to you.

For every dataset, there is a separate entry (for example [this one](/intrusion-detection-datasets/content/datasets/ait_log_dataset)) describing the following characteristics of a given dataset:
For every dataset, there is a separate entry (for example [this one](/COMIDDS/content/datasets/ait_log_dataset)) describing the following characteristics of a given dataset:
- Overview (A general description of the dataset, giving a brief overview over origin, intended usage and some properties of the dataset)
- Environment (A description of the environment the dataset originated from, including networks, operating systems, running services, etc.)
- Activity (What kind of activity, benign and malicious, was performed during the period of data collection)
Expand All @@ -32,4 +32,4 @@ Additional information includes:
As there are certainly more than 43 IDS-adjacent datasets out there, any help in documenting them in the level of detail outlined above is more than welcome.
Alternatively, although I tried to be as thorough as possible during my research (while spending a reasonable amount of time per dataset), it is of course likely that I have missed some information, or made slight mistakes.
Any help in this regard, as in improving existing entries, is also much appreciated.
For more information, visit the [Contribution Guide](/intrusion-detection-datasets/content/contributing)
For more information, visit the [Contribution Guide](/COMIDDS/content/contributing)
6 changes: 3 additions & 3 deletions _posts/2024-02-21-related-work.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ comments: true
author: Philipp Bönninghausen
---

This update adds a new subpage for "Related Work", intended to provide additional source material and accessible via the navbar (or [this link](/intrusion-detection-datasets/content/related_work)).
This update adds a new subpage for "Related Work", intended to provide additional source material and accessible via the navbar (or [this link](/COMIDDS/content/related_work)).
Contents are divided into "Publications" and "Collections", where the former is any academic work that at least partially covers the topic of available IDS datasets.
Entries of this category, which are usually surveys, consist of the following:
- Publication title
Expand All @@ -19,7 +19,7 @@ Entries of this category, which are usually surveys, consist of the following:
- List of referenced collections

Referenced datasets link to their respective entries on this webpage, if available.
Those that are not (which are quite a few) will be looked at and possibly be added to the Intrusion Detection Datasets collection in the future.
Those that are not (which are quite a few) will be looked at and possibly be added to COMIDDS in the future.

The latter category, "Collections", simply features dataset collections not backed by a scientific publication.
These are maintained by individuals or organizations, and cover different types of datasets, ranging from "only pcaps" to "anything cybersecurity-related".
Expand All @@ -29,5 +29,5 @@ Entries consist of:
- Date of last update, i.e., the last time a new entry was added
- Short description of the focus of this collection

There is of course a significant overlap between the different publications/collections, for example, almost every survey references the age-old [KDD Cup 1999 dataset](/intrusion-detection-datasets/content/datasets/kdd_cup_1999).
There is of course a significant overlap between the different publications/collections, for example, almost every survey references the age-old [KDD Cup 1999 dataset](/COMIDDS/content/datasets/kdd_cup_1999).
The diversity of collections might nevertheless prove useful, as each resource provides a slightly different viewpoint upon the topic of IDS datasets.
16 changes: 8 additions & 8 deletions _posts/2024-04-04-new-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,16 @@ author: Philipp Bönninghausen
This update adds a couple of new dataset entries, changes some existing ones, and adds information on how to cite this website.

New dataset entries:
- [CIC DoS](/intrusion-detection-datasets/content/datasets/cic_dos)
- [CIC-DDoS2019](/intrusion-detection-datasets/content/datasets/cic_ddos)
- [gureKDDCup](/intrusion-detection-datasets/content/datasets/gure_kddcup)
- [User-Computer Associations in Time](/intrusion-detection-datasets/content/datasets/user_computer_associations)
- [CIC DoS](/COMIDDS/content/datasets/cic_dos)
- [CIC-DDoS2019](/COMIDDS/content/datasets/cic_ddos)
- [gureKDDCup](/COMIDDS/content/datasets/gure_kddcup)
- [User-Computer Associations in Time](/COMIDDS/content/datasets/user_computer_associations)

Changes to existing entries:
- Modified all datasets which are derived from the [DARPA'98 dataset](/intrusion-detection-datasets/content/datasets/darpa98) to refer to it instead of containing the same copy/pasted description
- Updated download source for [NSL-KDD](/intrusion-detection-datasets/content/datasets/nsl_kdd_dataset) as the old link was deprecated
- Modified all datasets which are derived from the [DARPA'98 dataset](/COMIDDS/content/datasets/darpa98) to refer to it instead of containing the same copy/pasted description
- Updated download source for [NSL-KDD](/COMIDDS/content/datasets/nsl_kdd_dataset) as the old link was deprecated

Other changes:
- Added info on how this website can be cited in [About](/intrusion-detection-datasets/content/about)
- Updated links in [Related Work](/intrusion-detection-datasets/content/related_work) to point to new entries
- Added info on how this website can be cited in [About](/COMIDDS/content/about)
- Updated links in [Related Work](/COMIDDS/content/related_work) to point to new entries
- Minor changes to the repositories README
4 changes: 2 additions & 2 deletions _posts/2024-04-17-csv-download.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ author: Philipp Bönninghausen
---

This update adds the possibility to download a `.csv` file containing information automatically parsed from all currently existing dataset entries.
It can be used to sort and filter data in a spreadsheet program or generate statistics and plots - access the file via the navbar (or [this link](/intrusion-detection-datasets/content/csv_download)).
It can be used to sort and filter data in a spreadsheet program or generate statistics and plots - access the file via the navbar (or [this link](/COMIDDS/content/csv_download)).
There, you will also find explanation for all contained fields.

New dataset entries:
- [TUIDS](/intrusion-detection-datasets/content/datasets/tuids)
- [TUIDS](/COMIDDS/content/datasets/tuids)

Other changes:
- Attempted to "normalize" several descriptions (e.g., "Packet captures", "Pcaps" and "pcaps" are now all called "pcaps")
Expand Down
12 changes: 6 additions & 6 deletions _posts/2024-06-04-version-1-4.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ TL;DR:
- Added related work
- Updated/improved some entries

This update adds a new *Statistics* subpage, accessible via the navbar or [this link](/intrusion-detection-datasets/content/statistics).
This update adds a new *Statistics* subpage, accessible via the navbar or [this link](/COMIDDS/content/statistics).
There, you can find plots visualizing various aspects of the surveyed datasets, along with detailed explanations.
Plots are automatically generated from the CSV file added in v1.3.0.

Expand All @@ -34,13 +34,13 @@ Secondly, the three-class label for "Labeled?" has been changed from [Labeled, G
The original naming was unclear, since labels itself are also a form of ground truth.

New dataset entries:
- [ISOT Botnet](/intrusion-detection-datasets/content/datasets/isot_botnet)
- [UNIBS](/intrusion-detection-datasets/content/datasets/unibs)
- [UWF-ZeekData22](/intrusion-detection-datasets/content/datasets/uwf_zeekdata22)
- [ISOT Botnet](/COMIDDS/content/datasets/isot_botnet)
- [UNIBS](/COMIDDS/content/datasets/unibs)
- [UWF-ZeekData22](/COMIDDS/content/datasets/uwf_zeekdata22)

Added related work:
- [Kenyon et al.: Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets (2020)](/intrusion-detection-datasets/content/related_work/#are-public-intrusion-datasets-fit-for-purpose-characterising-the-state-of-the-art-in-intrusion-event-datasets-2020)
- [Yang et al.: A systematic literature review of methods and datasets for anomaly-based network intrusion detection (2022)](/intrusion-detection-datasets/content/related_work/#a-systematic-literature-review-of-methods-and-datasets-for-anomaly-based-network-intrusion-detection-2022)
- [Kenyon et al.: Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets (2020)](/COMIDDS/content/related_work/#are-public-intrusion-datasets-fit-for-purpose-characterising-the-state-of-the-art-in-intrusion-event-datasets-2020)
- [Yang et al.: A systematic literature review of methods and datasets for anomaly-based network intrusion detection (2022)](/COMIDDS/content/related_work/#a-systematic-literature-review-of-methods-and-datasets-for-anomaly-based-network-intrusion-detection-2022)

Changed entries (major):
- *All Entries*: Normalized description of benign user activity
Expand Down
10 changes: 5 additions & 5 deletions content/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ We intentionally omit datasets from very different environments such as industri

### Features

All datasets are summarized in a [table](/intrusion-detection-datasets/content/all_datasets), which lists some relevant information for each entry - helpful when you want to quickly determine which of them might me useful to you.
All datasets are summarized in a [table](/COMIDDS/content/all_datasets), which lists some relevant information for each entry - helpful when you want to quickly determine which of them might me useful to you.

For every dataset, there is a separate entry (for example [this one](/intrusion-detection-datasets/content/datasets/ait_log_dataset)) describing the following characteristics of a given dataset:
For every dataset, there is a separate entry (for example [this one](/COMIDDS/content/datasets/ait_log_dataset)) describing the following characteristics of a given dataset:
- Overview (A general description of the dataset, giving a brief overview over origin, intended usage and some properties of the dataset)
- Environment (A description of the environment the dataset originated from, including networks, operating systems, running services, etc.)
- Activity (What kind of activity, benign and malicious, was performed during the period of data collection)
Expand All @@ -32,10 +32,10 @@ If you would like to cite this overview in your (academic) work, we recommend to
<!-- {% raw %} -->
```
@misc{idd100,
author = {{Intrusion Detection Datasets} contributors},
title = {{Intrusion Detection Datasets v1.0.0 -- GitHub}},
author = {{COMIDDS} contributors},
title = {{COMIDDS v1.0.0 -- GitHub}},
year = {2024},
howpublished = {\url{https://github.com/fkie-cad/intrusion-detection-datasets/releases/tag/v1.0.0}},
howpublished = {\url{https://github.com/fkie-cad/COMIDDS/releases/tag/v1.0.0}},
note = {[Online; accessed DD-MMM-YYYY]},
}
```
Expand Down
6 changes: 3 additions & 3 deletions content/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ Any help in this regard, as in improving existing entries, is also much apprecia

You can contribute in one of two ways:

- Open a [new issue](https://github.com/fkie-cad/intrusion-detection-datasets/issues/new/choose) in our repository, describing your suggestions for improvement.
- Open a [new issue](https://github.com/fkie-cad/COMIDDS/issues/new/choose) in our repository, describing your suggestions for improvement.
- Fork this repository, implement your suggested changes, and then open a pull request.

If you want to contribute a new dataset entry, please use this [template](https://raw.githubusercontent.com/fkie-cad/intrusion-detection-datasets/main/docs/new_entry_template.md) from the documentation.
If you want to contribute a new dataset entry, please use this [template](https://raw.githubusercontent.com/fkie-cad/COMIDDS/main/docs/new_entry_template.md) from the documentation.
A new entry should consist of said template filled out and named appropriately, placed in `/content/datasets/`.
Additionally, a new row should be added to the list of all datasets in `/content/all_datasets.md`, adding information to each cell as needed.

You can find a list of datasets that we are aware of, but which do not have an entry yet, in [this issue](https://github.com/fkie-cad/intrusion-detection-datasets/issues/13)
You can find a list of datasets that we are aware of, but which do not have an entry yet, in [this issue](https://github.com/fkie-cad/COMIDDS/issues/13)

On every page you will also find an "Edit Page" button at the bottom leading you to GitHub, where you will be prompted to fork this repository - saving you a few clicks when you want to edit an existing entry.
While contributions should generally be aimed towards datasets, suggestions regarding the underlying structure (like the website itself) are of course also welcome.
2 changes: 1 addition & 1 deletion content/datasets/adfa_ld.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Note: Download links will most likely expire by 15.12.2023, as the storage provi

### Related Entries

- [ADFA-WD](/intrusion-detection-datasets/content/datasets/adfa_wd)
- [ADFA-WD](/COMIDDS/content/datasets/adfa_wd)

### Data Examples

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/adfa_wd.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ Note: Download links will most likely expire by 15.12.2023 as the storage provid

### Related Entries

- [ADFA LD](/intrusion-detection-datasets/content/datasets/adfa_ld)
- [ADFA LD](/COMIDDS/content/datasets/adfa_ld)

### Data Examples

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/ait_alert_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ The epoch timestamps resemble the time a given alert was generated, which happen

### Related entries

- [AIT Log Dataset](/intrusion-detection-datasets/content/datasets/ait_log_dataset)
- [AIT Log Dataset](/COMIDDS/content/datasets/ait_log_dataset)

### Example Data

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/ait_log_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ A circled checkmark indicates that labels exists for that file.

### Related entries

- [AIT Alert Dataset](/intrusion-detection-datasets/content/datasets/ait_alert_dataset)
- [AIT Alert Dataset](/COMIDDS/content/datasets/ait_alert_dataset)

### Example Data

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/asnm_datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ Data is available in the form of `.csv` files.

### Related Entries

- [CDX CTF 2009](/intrusion-detection-datasets/content/datasets/cdx_2009)
- [CDX CTF 2009](/COMIDDS/content/datasets/cdx_2009)

### Data Examples

Expand Down
2 changes: 1 addition & 1 deletion content/datasets/cic_dos.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,4 +70,4 @@ I would assume data is labeled, but obviously have no way to confirm this.
- [Homepage](https://www.unb.ca/cic/datasets/dos-dataset.html)

### Related Entries
- [ISCX Intrusion Detection Evaluation Dataset](/intrusion-detection-datasets/content/datasets/iscx_ids_2012)
- [ISCX Intrusion Detection Evaluation Dataset](/COMIDDS/content/datasets/iscx_ids_2012)
2 changes: 1 addition & 1 deletion content/datasets/cic_ids2017.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ event originated from.

### Related Entries

- [CSE CIC IDS2018](/intrusion-detection-datasets/content/datasets/cse_cic_ids2018)
- [CSE CIC IDS2018](/COMIDDS/content/datasets/cse_cic_ids2018)

### Data Examples

Expand Down
Loading