NTU NLP Lab

All

49 repositories

Induct-Learn
Public
Jupyter Notebook
•0•0•0•0•Updated Oct 2, 2024Oct 2, 2024
ConvLogRecaller-dataset
Public
ConvLogRecaller-dataset
0•0•0•0•Updated Apr 12, 2024Apr 12, 2024
Self-ICL
Public
Python
•1•2•1•0•Updated Dec 3, 2023Dec 3, 2023
FECS
Public
Python
•0•6•0•0•Updated Nov 28, 2023Nov 28, 2023
ZARA
Public
0•1•1•0•Updated Nov 5, 2023Nov 5, 2023
contrastive-debate-representation
Public
Contrastively learning participant representations per round in thread-based debates.
Python
•0•2•0•0•Updated Oct 25, 2023Oct 25, 2023
ContributionSum
Public
The ContributionSum Dataset
GNU General Public License v3.0
•0•1•0•0•Updated Aug 14, 2023Aug 14, 2023
Citation-Intent-Classification-Evidence-Extraction-
Public
Citation Intent Classification and Its Supporting Evidence Extraction for Citation Graph Construction
0•1•1•0•Updated Aug 13, 2023Aug 13, 2023
AMDRD
Public
Analysis Model of Discourse Relations within a Document(AMDRD)
Python
•
GNU General Public License v3.0
•1•2•0•0•Updated Aug 11, 2023Aug 11, 2023
LifeEventDialog
Public
Life Event Dialog contains fine-grained personal life event annotations on DailyDialog.
0•8•0•0•Updated May 2, 2023May 2, 2023
NTUNLP-ImageGallery
Public
提供台大AI中心共享平台圖片。
Apache License 2.0
•0•0•0•0•Updated Apr 29, 2023Apr 29, 2023
traditional-chinese-alpaca
Public
A Traditional-Chinese instruction-following model with datasets based on Alpaca.
Python
•
Apache License 2.0
•18•133•2•0•Updated Mar 28, 2023Mar 28, 2023
C2RC2
Public
Categorizing Citation Relations in Scientific Papers Based on the Contributions of Cited Papers
MIT License
•0•2•0•0•Updated Nov 21, 2022Nov 21, 2022
Lifelog-SentiLiveKB-Dataset
Public
0•0•0•0•Updated Nov 14, 2022Nov 14, 2022
Lifelog-FPImgCapHRC
Public
0•0•0•0•Updated Nov 3, 2022Nov 3, 2022
tw-eH
Public
Learning to Generate Explanation from e-Hospital Services for Medical Suggestion
Python
•
MIT License
•0•3•0•0•Updated Nov 3, 2022Nov 3, 2022
Lifelog-PKBQAC2.0-Dataset
Public
0•0•0•0•Updated Oct 17, 2022Oct 17, 2022
ICDA
Public
Interactive Clinical Diagnostic Assistant for Medical Interview
Python
•0•2•0•0•Updated Sep 7, 2022Sep 7, 2022
SEEN
Public
SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
natural-language-processing language-model personal-knowledge-base lifelog graph-neural-networks information-recall
Python
•
MIT License
•0•3•0•0•Updated Aug 21, 2022Aug 21, 2022
PRRCA
Public
Peer Review and Rebuttal Counter-Arguments Dataset
0•1•0•0•Updated Aug 13, 2022Aug 13, 2022
NTUSD
Public
Sentiment words are employed to compute the tendency of a sentence, and then a document. To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable. However, a small dictionary may suffer from the problem of coverage. A method to learn sentiment words and their strengths from multiple resources is developed i…
MIT License
•2•7•0•0•Updated Jun 20, 2022Jun 20, 2022
Dialogue-MPDD
Public
A dialogue dataset is an indispensable resource for building a dialogue system. Additional information like emotions and interpersonal relationships labeled on conversations enables the system to capture the emotion flow of the participants in the dialogue. However, there is no publicly available Chinese dialogue dataset with emotion and relatio…
0•4•0•0•Updated Jun 1, 2022Jun 1, 2022
WSD-MSD-1030
Public
A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.
0•2•0•0•Updated Jun 1, 2022Jun 1, 2022
Finance-FinNum
Public
Numeral is the crucial part of financial documents. In order to understand the detail of opinions in financial documents, we should not only analyze the text, but also need to assay the numeric information in depth. Because of the informal writing style, analyzing social media data is more challenging than analyzing news and official documents. …
1•3•0•0•Updated Jun 1, 2022Jun 1, 2022
Finance-FinProLex
Public
FinProLex provides 5,162 tokens in professional analysts' reports and the financial social media platform posts with expert-like scores. The expert-like scores are calculated based on the pointwise mutual information (PMI).
1•4•0•0•Updated Jun 1, 2022Jun 1, 2022
Finance-Numeracy-600K
Public
Numeral is the crucial part of in narrative, especially in financial documents. We should not only analyze the text, but also need to assay the numeric information in depth. Numeracy-600K is a dataset for testing the numeracy of machines.
1•3•0•0•Updated Jun 1, 2022Jun 1, 2022
Finance-NTUSD-Fin
Public
NTUSD-Fin provides various scoring methods including frequency, CFIDF, chi-squared value, market sentiment score and word vector for the tokens. Only the tokens appeared at least ten times and shown significantly difference between expected and observed frequency with chi-squared test are remained in our dictionary. The predetermined significanc…
2•4•1•0•Updated Jun 1, 2022Jun 1, 2022
NTU-English-Tense-Predictor
Public
A rule-based English tense predictor based on the output of the dependency parser like Stanford CoreNLP.
Python
•0•2•0•0•Updated Jun 1, 2022Jun 1, 2022
Chinese-Word-Ordering-Errors-Detection-and-Correction-Corpus
Public
Word Ordering Errors (WOEs) are the most frequent type of grammatical errors at sentence level for non-native Chinese language learners. Learners taking Chinese as a foreign language often place character(s) in the wrong places in sentences, and that results in wrong word(s) or ungrammatical sentences. Besides, there are no clear word boundaries…
0•4•0•0•Updated Jun 1, 2022Jun 1, 2022
NTU-Irony-Corpus
Public
The NTU Irony Corpus consists of more than 1,000 microblog messages collected from the Plurk website. All the messages in the corpus are in Traditional Chinese and have been confirmed to be ironic. They are marked with three types of labels: (1) ironic word/phrase , (2) context, and (3) rhetoric element.
0•4•0•0•Updated Jun 1, 2022Jun 1, 2022