Skip to content
Change the repository type filter

All

    Repositories list

    • Jupyter Notebook
      0000Updated Oct 2, 2024Oct 2, 2024
    • ConvLogRecaller-dataset
      0000Updated Apr 12, 2024Apr 12, 2024
    • Self-ICL

      Public
      Python
      1210Updated Dec 3, 2023Dec 3, 2023
    • FECS

      Public
      Python
      0600Updated Nov 28, 2023Nov 28, 2023
    • ZARA

      Public
      0110Updated Nov 5, 2023Nov 5, 2023
    • Contrastively learning participant representations per round in thread-based debates.
      Python
      0200Updated Oct 25, 2023Oct 25, 2023
    • The ContributionSum Dataset
      GNU General Public License v3.0
      0100Updated Aug 14, 2023Aug 14, 2023
    • Citation Intent Classification and Its Supporting Evidence Extraction for Citation Graph Construction
      0110Updated Aug 13, 2023Aug 13, 2023
    • AMDRD

      Public
      Analysis Model of Discourse Relations within a Document(AMDRD)
      Python
      GNU General Public License v3.0
      1200Updated Aug 11, 2023Aug 11, 2023
    • Life Event Dialog contains fine-grained personal life event annotations on DailyDialog.
      0800Updated May 2, 2023May 2, 2023
    • 提供台大AI中心共享平台圖片。
      Apache License 2.0
      0000Updated Apr 29, 2023Apr 29, 2023
    • A Traditional-Chinese instruction-following model with datasets based on Alpaca.
      Python
      Apache License 2.0
      1813320Updated Mar 28, 2023Mar 28, 2023
    • C2RC2

      Public
      Categorizing Citation Relations in Scientific Papers Based on the Contributions of Cited Papers
      MIT License
      0200Updated Nov 21, 2022Nov 21, 2022
    • 0000Updated Nov 14, 2022Nov 14, 2022
    • 0000Updated Nov 3, 2022Nov 3, 2022
    • tw-eH

      Public
      Learning to Generate Explanation from e-Hospital Services for Medical Suggestion
      Python
      MIT License
      0300Updated Nov 3, 2022Nov 3, 2022
    • 0000Updated Oct 17, 2022Oct 17, 2022
    • ICDA

      Public
      Interactive Clinical Diagnostic Assistant for Medical Interview
      Python
      0200Updated Sep 7, 2022Sep 7, 2022
    • SEEN

      Public
      SEEN: Structured Event Enhancement Network for Explainable Need Detection of Information Recall Assistance
      Python
      MIT License
      0300Updated Aug 21, 2022Aug 21, 2022
    • PRRCA

      Public
      Peer Review and Rebuttal Counter-Arguments Dataset
      0100Updated Aug 13, 2022Aug 13, 2022
    • NTUSD

      Public
      Sentiment words are employed to compute the tendency of a sentence, and then a document. To detect sentiment words in Chinese documents, a Chinese sentiment dictionary is indispensable. However, a small dictionary may suffer from the problem of coverage. A method to learn sentiment words and their strengths from multiple resources is developed i…
      MIT License
      2700Updated Jun 20, 2022Jun 20, 2022
    • A dialogue dataset is an indispensable resource for building a dialogue system. Additional information like emotions and interpersonal relationships labeled on conversations enables the system to capture the emotion flow of the participants in the dialogue. However, there is no publicly available Chinese dialogue dataset with emotion and relatio…
      0400Updated Jun 1, 2022Jun 1, 2022
    • A word similarity dataset with high proportion of multi-sense words that is designed to facilitate more reliable evaluations of sense embeddings.
      0200Updated Jun 1, 2022Jun 1, 2022
    • Numeral is the crucial part of financial documents. In order to understand the detail of opinions in financial documents, we should not only analyze the text, but also need to assay the numeric information in depth. Because of the informal writing style, analyzing social media data is more challenging than analyzing news and official documents. …
      1300Updated Jun 1, 2022Jun 1, 2022
    • FinProLex provides 5,162 tokens in professional analysts' reports and the financial social media platform posts with expert-like scores. The expert-like scores are calculated based on the pointwise mutual information (PMI).
      1400Updated Jun 1, 2022Jun 1, 2022
    • Numeral is the crucial part of in narrative, especially in financial documents. We should not only analyze the text, but also need to assay the numeric information in depth. Numeracy-600K is a dataset for testing the numeracy of machines.
      1300Updated Jun 1, 2022Jun 1, 2022
    • NTUSD-Fin provides various scoring methods including frequency, CFIDF, chi-squared value, market sentiment score and word vector for the tokens. Only the tokens appeared at least ten times and shown significantly difference between expected and observed frequency with chi-squared test are remained in our dictionary. The predetermined significanc…
      2410Updated Jun 1, 2022Jun 1, 2022
    • A rule-based English tense predictor based on the output of the dependency parser like Stanford CoreNLP.
      Python
      0200Updated Jun 1, 2022Jun 1, 2022
    • Word Ordering Errors (WOEs) are the most frequent type of grammatical errors at sentence level for non-native Chinese language learners. Learners taking Chinese as a foreign language often place character(s) in the wrong places in sentences, and that results in wrong word(s) or ungrammatical sentences. Besides, there are no clear word boundaries…
      0400Updated Jun 1, 2022Jun 1, 2022
    • The NTU Irony Corpus consists of more than 1,000 microblog messages collected from the Plurk website. All the messages in the corpus are in Traditional Chinese and have been confirmed to be ironic. They are marked with three types of labels: (1) ironic word/phrase , (2) context, and (3) rhetoric element.
      0400Updated Jun 1, 2022Jun 1, 2022