-
-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On the use of LLMs (e.g. ChatGPT) in a JOSS submission #1297
Comments
I think that asking reviewers to spend their time reviewing code or documentation that the authors didn't even take the time to write would be a really bad thing for JOSS. I don't want JOSS to be a source of free labor where someone throws together a package carelessly with an LLM and then reviewers are then asked to pick through and fix all the bugs created by the LLM. We can't have a policy like "no LLMs" because it would be unenforceable, but I think reviewers should have the ability to signal to the editor that 'hey i think that most of this is just LLM junk and i don't want to review that' and for that to terminate the review. on the author side, i think requiring disclosure for LLM docs and code would be a really good idea so that reviewers can know what they are reviewing. |
I think the problem here is that some authors use LLMs to improve text, such as translation, smoothing, joining, etc. I don't think we should disallow this. I'm less sure about code. Maybe we should just reemphasize/reinforce that authors are responsible for everything the submit, no matter what tools were used to work on it. |
Yes I think that the use of LLMs to improve text, specifically in the case of translation, is completely fine, and agree we shouldn't disallow it. I do think a disclosure statement would be good - if an author uses an LLM for translation, I would hope that our reviewers and editors would understand that in context as a legitimate need, and in that case (if the target language was english and the source language was not) I would want to spend extra time to help with the text. If the text feels odd without a disclosed reason, I would not want that to be a source of suspicion for the reviewer. so in both cases i think disclosure (without an unenforceable/undesirable blanket ban) would be a good idea. |
A challenge is what LLMs we ask people to disclose. Google docs and phones suggesting text is an LLM, for example. I lean towards not asking what tools people used, as this seems to me to be a limitless pit I don't really want to jump in |
I see your point, but I also think that there is some qualitative distinction to draw between predictive text generally and the kinds of tools that are being thrust into the world that generate masses of plausible seeming text, specifically in their capacity to create meaningful labor imbalances that exploit the 'good will' nature of JOSS's review system. The revival of this issue was prompted by (being vague bc i'm not sure what's public) the attempted use of a software artifact that does not exist and never did, so reviewers are being asked to spend time evaluating code that the authors didn't spend the time to write. This turns JOSS into a free labor farm for correcting LLM-generated bugs instead of a system for improving the health of FOSS. I appreciate the thought of falling back on holding authors accountable for whatever they submit, because ideally that would be enough, but I think we might be missing an adversarial case here without a specific policy that might meaningfully impact the operation of the journal |
In my opinion: For well-intentioned users, which tools they used shouldn't be an issue, and I don't want to potential denigrate people who use LLMs for language over those who don't, as I think this would hurt non-English speakers (writers). For the adversarial case, I don't think that asking authors to volunteer the use of LLMs will work, as I suspect anyone adversarial will not actually disclose this. I think, but am less sure, that we will see more code that uses LLMs along the way over time, and I don't see this as problematic, as long as its an aide and the code is both understood and tested by the author. So, while I understand the case you raise and do agree that we want to avoid it and similar ones in the future, I don't see a way to do so that will both be successful and will not harm those who are behaving well. |
the expectation is that well-intentioned people will describe what they did so that the reviewers are aware of what they are being asked to evaluate. So hopefully our reviewers, when seeing a disclosure that says that LLMs were used to help with translation, see that as not a problem and also a thing they could potentially help with in their review. In the case that folks (for whatever reason) lie about their use of LLMs, the reviewers then have something to point to say "you say this code was not LLM generated, and yet this has few other explanations than being LLM-generated code. plz explain." In any case we need a policy here, because a lack of a policy both threatens poisoning the reviewer pool and also makes us unresponsive as a journal to a pretty important development in FOSS, regardless of whether we are "for it" or "against it." |
Let's see what other people think/say. I have given my opinion, but I'm ok with being outweighed by others |
I agree with everything @sneakers-the-rat has been saying. And I should add that the policy should not just be some disclosure requirements for authors, but rather something that enables editors and reviewers to expeditiously challenge or walk away from submissions that have been apparently submitted in bad faith without such disclosure, or perhaps even in good faith but without the level scholarly effort to understand and verify the text/code that is being submitted. This second part is very important because I have personally spoken with the editor-in-chief of two established (30- to 50-year-old) scholarly journals that have effectively been destroyed by a flood of low-quality submissions. Coinciding with the availability of LLMs such as ChatGPT, the journals saw a tremendous increase in submissions, most of which did not stand up to close scrutiny. In some cases the problems were obvious, such as nonsensical text and fabricated references, but in other cases the deficiencies required more careful examination to expose. Regardless, the sheer number of submissions overwhelmed the editorial apparatus of the journal. The editors did not have enough time to check all the papers carefully enough to determine if they should be desk-rejected, nor to find reviewers for all the papers that passed their inadequate checks. Of the papers that did get assigned to reviewers, a higher proportion were low-quality submissions that the reviewers ended up (sometimes angrily) rejecting. The overall quality of published submissions was therefore lowered, and everyone involved in the journal, from editors to reviewers to good-faith authors to subscribers, was in some way overwhelmed or let down. Finding no support from the publisher – one of the big names in scholarly publishing – the editor-in-chief ended up resigning, leaving along with the editorial assistant and nearly all the associate editors and editorial advisory board members. I would not want anything like this to happen to JOSS, and so would argue that we should be prepared to deal efficiently with negligent or bad-faith submissions on a large scale. Though I don't think the journals I mentioned above had any policy concerning the use of LLMs, a policy that merely required disclosure would not have helped any. JOSS's more collegial publishing model is particularly vulnerable to abuse, since we see reviewing as a process to collaboratively improve the submissions. But this process needs to begin with a submission of a certain minimum level of quality! People who submit to us LLM-generated material, and who themselves are not willing or able to check that it meets this minimum level of quality, should not expect our editors and reviewers to do so for them, let alone help bring this material up to publication quality. Since reviewers are already pretty much free to refuse or abandon reviews, our policy should probably focus on what measures editors can take to avoid assigning problematic submissions to reviewers in the first place. |
I wrote this earlier on slack:
It's true that there is no sharp line between grammar-checking and generating paragraphs of text. The pattern of human perception and behavior is that even when people say they have proof-read/edited paragraphs of generated text, it is still likely to contain errors and false implications. Elsevier has papers published months ago with the text "I'm very sorry, [...], as I am an AI language model" that have been publicly called out, yet remain unedited and unretracted. (I'm delighted that such "papers" are unciteable in Science, and suggest they apply that standard to the entire journal.) As a reviewer thinking I'm providing feedback to a person, I would feel violated to learn I was reviewing generated text. The same applies if I were to learn that as a reader, and it would cast doubt on the journal's integrity and practices that such a thing could pass review. The intent and value of a publication has to be more than the bean. I think grammar-checking is fine, as well as narrow use in translation, but writing a paper in one language and bulk translating it presents a range of problematic second-order effects as @oliviaguest noted on Slack (besides increased chance of factual errors). LMs promise a short-cut to forming a coherent mental model and communicating that. That is very much the incentive for plagiarism, but there is a perception that LM-generated content is somehow victimless while plagiarism is theft from the original author. I would stipulate they are of the same cloth, and that both are also subjecting the reader without consent. If a human took verbatim text/code from one or more sources and applied token-level obfuscation (changing variable names, loop structures, synonym substitution, etc.), they are still plagiarizing and the resulting code is considered a derivative work. (It is hard to prove this without historical records, thus courts will scrutinize instances in which humans were sloppy.) LMs automate this obfuscation while shredding the records (which would be evidence of intent in a human system) and promising plausible deniability. Clean-room design is the way to ensure clean IP, but LMs do not and can not work in this way. I consider it poisoning the fabric of society when LM boosters attempt to reduce human cognition to token manipulation or claim that it doesn't matter. TL;DR: We need to consider the broader ethical and social context as well as JOSS' reputation, not just current law and a blanket statement. Specific affirmations sort out the accidentally-sloppy from the malicious. |
Just a thought or two: This seems very important for us to get ahead of, but also different than the typical examples being considered since we are as a journal not as focused on written text, while still including and reviewing some. Do we need different rules for text and code or are can they be treated the same? That is, what might be the guideline for using ChatGPT to assist in checking one's writing vs. Github Copilot in assisting in writing one's code? Also regarding the example given by @logological, given that we are working with codebases that have at least some history (and often a long history), wouldn't we be less likely to have random AI-generated repositories submitted to us? Or do you think that is the potential future of submissions we will see? |
I obviously cannot know if we will get random obviously silly codebases created by AI tools (which likely are plagiarised, or even stolen, or minimally not crediting the authors) that we would likely desk-reject for reasons of quality (as well as history, like you hint at: no sensible commit history)... but you raise a very good question. In my teaching, with student work such as essays and other situations where such tools can be used to cheat (I am teaching them to write an essay; and they are not writing an essay), my experience has been that it is glaringly obvious and that they freely admit to it. I am sharing not to argue for a specific outcome or action, just for others to know what (fellow) educators have been seeing. |
I recently encountered documentation that was clearly written by a Large Language Model (LLM) in a JOSS submission. I wondered what the policy of JOSS is on the use of LLMs, on authorship etc.
Personally, I'm not against the use of LLMs in writing either code or documentation. I think we can assume many of the submissions (will eventually) make use of LLMs such as GitHub Copilot. This will not always be clear or detectable. But we could request to state that a LLM has been used, and for which purpose, in the paper?
Policy on LLMs of Nature and Science
Nature
Science
The text was updated successfully, but these errors were encountered: