You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After thorough review of your article and preliminary experimentation with the 20newsgroup dataset as suggested, I have encountered a few areas of uncertainty that I hope you could clarify to enhance the precision and scientific integrity of my work:
Based on the dataset sizes D_1 = 11,314 and D_2 = 7,532 mentioned in your paper, am I correct in understanding that you have utilized both training and test sets to segment the data into D_1 and D_2?
In the GitHub repository (https://github.com/Crisp-Unimib/ContrXT), under the directory tests/test_data, there are two DataFrames: df_time_1.csv with 8,486 entries, and df_time_2.csv with 4,533 entries. Could you please elaborate on how these specific subsets were generated from the original 20newsgroup data?
I am keen to understand which specific subsets of data were used to train the text classification model for generating the experimental results.
Your example on GitHub illustrates training two independent models on datasets D_1 and D_2, respectively. I am trying to comprehend whether this method effectively captures the changes in feature importance between the two training phases, t_1 and t_2. From my understanding, training on D_1 at t_1 and subsequently using those weights to train on D_2 at t_2 might better reflect the evolution of features. Could you provide your insights on this approach?
The training data used in your models appears to retain certain metadata elements such as headers, footers, and quotes, which are typically removed in 20newsgroup data processing as recommended by sklearn (https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_20newsgroups.html). Would a more meticulous data cleaning enhance the meaningfulness of the model's explanations?
I am grateful for your pioneering work in this field and eagerly anticipate your guidance to refine my research further.
The text was updated successfully, but these errors were encountered:
After thorough review of your article and preliminary experimentation with the 20newsgroup dataset as suggested, I have encountered a few areas of uncertainty that I hope you could clarify to enhance the precision and scientific integrity of my work:
I am grateful for your pioneering work in this field and eagerly anticipate your guidance to refine my research further.
The text was updated successfully, but these errors were encountered: