Skip to content

Commit 0f8c843

Browse files
authored
Merge pull request #483 from uhh-lt/import
updated README
2 parents 64fc497 + 73caf90 commit 0f8c843

11 files changed

+367
-19
lines changed

README.md

Lines changed: 241 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,252 @@
1+
<div align="center">
2+
3+
![DATS Logo](assets/banner.png)
4+
5+
**The AI-powered platform for multi-modal discourse analysis.**
6+
7+
<br/>
8+
9+
<p>
10+
<a href="#quick-start">Quick start</a> •
11+
<a href="#why-dats">Why DATS?</a> •
12+
<a href="https://dats.ltdemos.informatik.uni-hamburg.de">Demo</a> •
13+
<a href="https://github.com/uhh-lt/dats/wiki/User-Guide">User Guide</a> •
14+
<a href="https://github.com/uhh-lt/dats/wiki/FAQ">FAQ</a> •
15+
<a href="https://github.com/uhh-lt/dats/wiki">Wiki</a> •
16+
<a href="https://www.dwise.uni-hamburg.de/">D-WISE</a>
17+
</p>
18+
19+
<p>
20+
<a href="https://dats.ltdemos.informatik.uni-hamburg.de"><img alt="Demo" src="https://img.shields.io/website?url=https%3A%2F%2Fdats.demo.hcds.uni-hamburg.de&label=Demo"></a>
21+
<a href="https://github.com/uhh-lt/dats/blob/main/LICENSE"><img alt="Licence" src="https://img.shields.io/github/license/uhh-lt/dats.svg?color=blue"></a>
22+
<a href="https://results.pre-commit.ci/latest/github/uhh-lt/dats/mwp_v1"><img alt="Pre-commit" src="https://results.pre-commit.ci/badge/github/uhh-lt/dats/mwp_v1.svg"></a>
23+
<a href="https://github.com/uhh-lt/dats/releases"><img alt="Release" src="https://img.shields.io/github/release/uhh-lt/dats.svg"></a>
24+
<a href="https://aclanthology.org/2023.acl-demo.31/"><img alt="DOI" src="https://img.shields.io/badge/DOI-10.18653%2Fv1%2F2023.acl--demo.31-blue"></a>
25+
</p>
26+
27+
</div>
28+
29+
---
30+
131
# Discourse Analysis Tool Suite (DATS)
232

3-
![DATS Logo](assets/DATS_colour.png)
33+
DATS is a machine-learning powered web application for multi-modal discourse analysis.
34+
It provides tools for the typical workflow of a discsourse analysis project including data collection, data management, exploration, annotation, qualitative & quantitative analysis, interpratation and reflection.
35+
See the [Features](#features) section to learn more about the various functionalities.
36+
37+
## Why DATS?
38+
39+
- Multi-modal: Support for 📝 text, 🖼 image, 🎵audio, and 🎞 video documents
40+
- Multi-lingual: Support for 🇺🇸 english, 🇩🇪 german, 🇮🇹 italian and more
41+
- ⚙️ Extensive pre-processing (e.g. automatic transcriptions, entity identification, keyword extraction, ...) ease data mangement
42+
- 🤖 AI Assistance: state-of-the-art machine-learning and large language models assist with time-consuming tasks
43+
- 👥 Collaborate with your team in shared projects
44+
- 📥 Export data to continue your project with other tools
45+
- 💻 No software installation or special hardware is required
46+
- 🔓 Free open source software
47+
48+
## Quick start
49+
50+
The best way to getting started is to watch our [Tutorial Video Series](https://www.youtube.com/), read the [User Guide](https://github.com/uhh-lt/dats/wiki/User-Guide) and play with DATS on our [Demo Instance](https://dats.ltdemos.informatik.uni-hamburg.de/).
51+
52+
<details>
53+
<summary>Host it yourself</summary>
54+
55+
#### 0. Requirements
56+
57+
- Machine with NVIDIA GPU
58+
- Docker with NVIDIA Container Toolkit
59+
60+
#### 1. Clone the repository
61+
62+
```bash
63+
git clone https://github.com/uhh-lt/dats.git
64+
```
65+
66+
#### 2. Run setup scripts
67+
68+
```bash
69+
./bin/setup-envs.sh --project_name dats --port-prefix 101
70+
```
71+
72+
```bash
73+
./bin/setup-folders.sh
74+
```
75+
76+
#### 3. Start docker containers
77+
78+
```bash
79+
docker compose -f compose.ollama.yml up -d
80+
```
81+
82+
```bash
83+
docker compose -f compose.yml -f compose.production.yml up --wait
84+
```
85+
86+
#### 4. Open DATS
87+
88+
Open [https://localhost:10100/](https://localhost:1-100/) in your browser
89+
90+
</details>
91+
92+
<details>
93+
<summary>Ask for a hosted instance</summary>
94+
95+
#### Hosted instance @ HCDS
96+
97+
We may be able to host DATS for your research institute.
98+
Please contact the House of Computing and Data Science (HCDS) [here](https://www.hcds.uni-hamburg.de/en/hcds.html).
99+
100+
</details>
101+
102+
## Further reading
103+
104+
- **User Guide**: If you want to use DATS, we recommend to start looking at the [Features](#features) below and playing around with the tool. If you have questions, you may find help in the [User Guide](https://github.com/uhh-lt/dats/wiki/User-Guide) or in the [FAQ](https://github.com/uhh-lt/dats/wiki/FAQ). If you encounter problems or bugs, please leave us some [feedback](#feedback).
105+
- **Admin Guide**: See the [quick start](#quick-start) guide above. For more information on how to configure DATS on a server, please see the [Admin Guide](https://github.com/uhh-lt/dats/wiki/Admin-Guide).
106+
- **Developer Guide**: DATS is open source software. If you want to contribute to the project, please start with the [Developer Guide](https://github.com/uhh-lt/dats/wiki/Developer-Guide).
107+
108+
## Feedback?
109+
110+
DATS is still under development, so please feel free to give us feedback, tell us your wishes or report bugs:
111+
112+
- For feedback, please write [us](https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/tim-fischer.html)
113+
- To report bugs, please open an issue on [GitHub](https://github.com/uhh-lt/dats/issues)
114+
115+
## Features
116+
117+
### Data collection
118+
119+
DATS can handle most data formats for text, image, audio, and video documents.
120+
You can easily upload your files to DATS.
121+
It also offers an integrated crawler implemented with Scrapy and Beautifulsoup to scrape websites and their images in case additional material is required.
122+
123+
### Data pre-processing
124+
125+
<!-- <div align="center" style="margin: 10px">
126+
<img src="./assets/feature-preprocessing.png" alt="Pre-processing" />
127+
</div> -->
128+
129+
![Pre-processing](assets/feature-preprocessing.png)
130+
131+
DATS automatically pre-processes documents as they are uploaded.
132+
This process extracts metadata and enriches the material with additional information, including:
133+
134+
- Named entity recognition (people, organizations, locations, etc.)
135+
- Object detection in images and videos (cars, people, buildings, etc.)
136+
- Image captioning
137+
- Automatic speech recognition (transcription)
138+
139+
This powerful feature enables you to precisely filter documents by keywords, entities, and other criteria later on.
140+
141+
### Data management
142+
143+
![Data-management](assets/feature-data-management.png)
144+
145+
<div style="width: 67%; float: left">
146+
147+
DATS makes it easy to organize and analyze your data.
148+
Each document can be assigned metadata – some of which DATS detects automatically – to help you categorize and find what you need.
149+
You can also add your own tags to documents.
150+
151+
Powerful filtering and search options let you quickly sift through your data.
152+
Find documents containing specific keywords, entities (like people, organizations, or locations), or other criteria.
153+
This flexible system keeps you in control of your data and ensures you can quickly find the information that matters most to your research.
154+
155+
DATS offers an AI Assistant that can help you streamline your data management tasks.
156+
The AI Assistant can suggest tags and extract metadata for your documents, making it even easier to organize data.
157+
158+
Read more about LLM Assistance in our publlication [Exploring Large Language Models for Qualitative Data Analysis](https://aclanthology.org/2024.nlp4dh-1.41/).
159+
160+
</div>
161+
162+
<div style="width: 33%; float: left;">
163+
164+
![LLM-Assistance-Tags](assets/feature-llm-assistance-tagging.png)
165+
166+
</div>
167+
168+
### Exploration
169+
170+
DATS makes exploring your data easy and intuitive.
171+
Its powerful similarity search allows you to quickly find related documents, even across different modalities.
172+
173+
Found an interesting article? DATS can instantly find others like it.
174+
Discovered a key image? DATS can locate similar images, or even text documents that relate to the same concept.
175+
This cross-modal capability unlocks new ways to explore connections within your data.
176+
This feature may help you to uncover hidden connections between documents and gain a deeper understanding of your data.
177+
178+
Further, when viewing search results, DATS presents an overview of the most frequent keywords, tags, and entities found within those documents.
179+
This frequency analysis feature allows you to:
180+
181+
- Spot key themes: Quickly grasp the main topics being discussed.
182+
- Discover new avenues for research: Identify potentially relevant keywords or entities you hadn't previously considered.
183+
- Refine your searches: Use the frequency list to add new search terms or filters, leading you to new documents and a deeper understanding of your corpus.
184+
185+
### Annotation
186+
187+
DATS provides tools for text (span & sentence) and image annotation.
188+
Annotating audio and video documents directly is not (yet) supported.
189+
Instead, the automatically generated transcript can be used.
190+
191+
For example, the sentence annotator enables you to:
192+
193+
- Highlight important passages: Easily mark key sections of the text.
194+
- Develop a code hierarchy: Create a structured taxonomy of codes and sub-codes to organize your analysis. DATS's interface - allows you to easily manage and update this code hierarchy as your research evolves.
195+
- Collaborate with others: Codes and annotations are shared with colleagues, fostering teamwork and discussion.
196+
197+
![Sentence-Annotation](assets/feature-sentence-annotation.png)
198+
199+
The AI Assistant integrated in DATS can also help you with the annotation process.
200+
It can suggest relevant text annotations, which you can then review and accept or reject.
201+
This can save you a lot of time and effort, especially if you are working with a large dataset.
202+
203+
Read more about the Sentence Annotation feature in our publication [Semi-automatic Sequential Sentence Classification in the Discourse Analysis Tool Suite](https://www.inf.uni-hamburg.de/en/inst/ab/lt/people/tim-fischer.html).
204+
205+
### Analysis
206+
207+
DATS offers various tools for qualitative and quantitative analysis including Word- and Code-Frequency, or timeline analyses.
208+
The more-advanced Concept-over-time analysis is explained below.
209+
210+
#### Concept-over-time Analysis
211+
212+
![Concept-Over-Time-Analysis](assets/feature-cota-timeline.png)
213+
214+
DATS includes Concept Over Time Analysis, a powerful feature that allows you to visualize how concepts evolve over time within your data.
215+
With the Concept Over Time Analysis feature, you can:
216+
217+
- Define and refine your concepts of interest.
218+
- Visualize the occurrence of concepts over time.
219+
- Uncover patterns, trends, and shifts in discourse.
220+
- Gain a deeper understanding of how concepts change.
221+
222+
To use Concept Over Time Analysis, you first define the concepts you are interested in.
223+
For example, if you are interested in the concept of "democracy", you would provide a short description of what you mean by "democracy".
224+
DATS uses this description to identify relevant sentences in your data.
225+
You can then review these sentences and provide feedback to DATS, which helps to refine the concept and improve the accuracy of the analysis.
226+
Finally, the occurrence of the concept over time analysis are visualized.
227+
228+
DATS's Concept Over Time Analysis is a valuable tool for qualitative data analysis, providing a unique perspective on the dynamics of discourse.
229+
Read more about COTA in our publication [Concept Over Time Analysis: Unveiling Temporal Patterns for Qualitative Data Analysis](https://aclanthology.org/2024.naacl-demo.15/).
4230

5-
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/uhh-lt/dats/mwp_v1.svg)](https://results.pre-commit.ci/latest/github/uhh-lt/dats/mwp_v1)
231+
### Interpretation
6232

7-
This is the repository for the Discourse Analysis Tool Suite (DATS) - an outcome of
8-
the [D-WISE Project](https://www.dwise.uni-hamburg.de/)
233+
![Whiteboard](assets/feature-whiteboard.png)
9234

10-
_Please also have a look at our [Wiki](https://github.com/uhh-lt/dats/wiki) for more information and How-To's_
235+
DATS features interactive Whiteboards that provide a customizable graph-based interface to organize and manipulate your research objects and analyses.
236+
With Whiteboards, you can:
11237

12-
## Try it out!
238+
- Visualize your data and analyses in a flexible and customizable way.
239+
- Organize and refine your code taxonomies.
240+
- Keep track of your research process and findings.
241+
- Create a variety of visualizations, including sampling maps and actor networks, to gain new insights into your data.
13242

14-
- test the online demo at [https://dats.ltdemos.informatik.uni-hamburg.de/](https://dats.ltdemos.informatik.uni-hamburg.de/)
15-
- host it on your own machine with `docker compose`
16-
- clone this repository: `git clone https://github.com/uhh-lt/dats.git`
17-
- navigate to the docker directory: `cd dats/docker`
18-
- create a copy of the .env.example file: `cp .env.example .env`
19-
- edit the .env example file and put in correct values for `UID`, `GID`, and `JWT_SECRET`
20-
- run `docker compose -f docker-compose-ollama.yml up -d` to start Ollama
21-
- run `docker compose up -d` to start the Tool Suite
22-
- visit [http://localhost:13100/](http://localhost:13100/) in your browser
243+
To use Whiteboards, you simply drag and drop your research objects onto the canvas. You can then connect them with edges to represent relationships between them. You can also add text, shapes, and images to your Whiteboards to further annotate your data. Whiteboards are a powerful way to interact with your data, making it easier to conduct qualitative data analysis and uncover hidden connections.
23244

24-
## Tech Stack
245+
Read more about the Whiteboards in our publication [ Extending the Discourse Analysis Tool Suite with Whiteboards for Visual Qualitative Analysis](https://aclanthology.org/2024.lrec-main.615/).
25246

26-
![TechStack](assets/DATS_Arch-backend-techstack.drawio.png)
247+
### Reflection
27248

28-
## License
249+
DATS provides tools for reflection and documentation that are seamlessly integrated into your workflow, helping you to capture and organize your thoughts throughout the research process:
29250

30-
Apache 2.0 - See [license file](LICENSE) for details
251+
- Memos: Capture your thoughts and ideas as you work by attaching notes to documents, annotations, codes, and tags. This ensures that valuable insights are not lost and provides a rich record of your evolving interpretations.
252+
- Logbook: Summarize your findings and document your research process in a logbook. You could use it to track your progress, identify patterns in your analysis, or ensure the transparency and reproducibility of your research.
-283 KB
Binary file not shown.

assets/DATS_colour.png

-37.9 KB
Binary file not shown.

assets/banner.html

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
<!DOCTYPE html>
2+
<html lang="en">
3+
<head>
4+
<link rel="preconnect" href="https://fonts.googleapis.com" />
5+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
6+
<link
7+
href="https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100..900;1,100..900&display=swap"
8+
rel="stylesheet"
9+
/>
10+
<meta charset="UTF-8" />
11+
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
12+
<title>Discourse Analysis Tool Suite</title>
13+
<style>
14+
body {
15+
display: flex;
16+
justify-content: center;
17+
align-items: center;
18+
height: 100vh;
19+
background-color: #111;
20+
font-family: "Roboto", serif;
21+
}
22+
23+
/* colors from https://mui.com/material-ui/customization/default-theme/ */
24+
.banner {
25+
display: flex;
26+
align-items: center;
27+
justify-content: center;
28+
width: 800px;
29+
height: 140px;
30+
border-radius: 20px;
31+
background: linear-gradient(to top right, #42a5f5, #7b1fa2);
32+
padding: 20px;
33+
color: white;
34+
box-shadow: 0 5px 15px rgba(0, 0, 0, 0.2);
35+
}
36+
37+
.logo {
38+
width: 60px;
39+
height: 60px;
40+
position: relative;
41+
margin-right: 20px;
42+
}
43+
44+
.dot {
45+
position: absolute;
46+
background: white;
47+
border-radius: 50%;
48+
}
49+
50+
/* Different sized dots positioned randomly */
51+
.dot:nth-child(1) {
52+
width: 16px;
53+
height: 16px;
54+
top: 0px;
55+
left: 10px;
56+
}
57+
.dot:nth-child(2) {
58+
width: 12px;
59+
height: 12px;
60+
top: 16px;
61+
left: 30px;
62+
}
63+
.dot:nth-child(3) {
64+
width: 20px;
65+
height: 20px;
66+
top: 22px;
67+
left: 6px;
68+
}
69+
.dot:nth-child(4) {
70+
width: 14px;
71+
height: 14px;
72+
top: 34px;
73+
left: 36px;
74+
}
75+
.dot:nth-child(5) {
76+
width: 18px;
77+
height: 18px;
78+
top: 46px;
79+
left: 16px;
80+
}
81+
.dot:nth-child(6) {
82+
width: 10px;
83+
height: 10px;
84+
top: 46px;
85+
left: 0px;
86+
}
87+
.dot:nth-child(7) {
88+
width: 6px;
89+
height: 6px;
90+
top: 14px;
91+
left: 2px;
92+
}
93+
94+
.text {
95+
text-align: left;
96+
}
97+
98+
.title {
99+
font-size: 42px;
100+
font-weight: bold;
101+
}
102+
103+
.subtitle {
104+
font-size: 16px;
105+
opacity: 0.8;
106+
}
107+
</style>
108+
</head>
109+
<body>
110+
<div class="banner">
111+
<div class="logo">
112+
<div class="dot"></div>
113+
<div class="dot"></div>
114+
<div class="dot"></div>
115+
<div class="dot"></div>
116+
<div class="dot"></div>
117+
<div class="dot"></div>
118+
<div class="dot"></div>
119+
</div>
120+
<div class="text">
121+
<div class="title">Discourse Analysis Tool Suite</div>
122+
<div class="subtitle">Developed by LT</div>
123+
</div>
124+
</div>
125+
</body>
126+
</html>

assets/banner.png

124 KB
Loading

assets/feature-cota-timeline.png

259 KB
Loading

assets/feature-data-management.png

478 KB
Loading
97.1 KB
Loading

assets/feature-preprocessing.png

128 KB
Loading
533 KB
Loading

0 commit comments

Comments
 (0)