Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug for running tasks twice #1804

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

libre-man
Copy link

This fixes #1498 by keeping track of the namespaces of a task in an array, instead of just the last one.

@libre-man
Copy link
Author

libre-man commented Sep 12, 2024

I'm not entire sure what the test failure is here, can someone shed some light on that? Locally all tests pass.

It seems like a race condition, but running even master with the race detector on gives many errors.

@pd93
Copy link
Member

pd93 commented Sep 13, 2024

It's not clear to me what is being fixed here since #1498 is already resolved. Could you please provide an example that reproduces your specific issue. Please also read the discussion in #1655 as a decision was made to not change some behaviour on purpose.

@libre-man
Copy link
Author

I created this PR before that ticket was closed. The issue here is when we have a structure like this

FileA |
      Includes -> FileB
                | Includes -> FileC
                            | -> Has Task A
      Includes -> FileC
                | -> Has Task A

If now we call Task A through FileA -> FileB -> FileC and through FileA -> FileC we get Task A to run multiple times. This is especially and issue when Task A is not idempotent or not safe to run concurrently. A prime example of this is installing something with npm ci.

@pd93
Copy link
Member

pd93 commented Sep 14, 2024

Here is a reproduction of your issue as I understand it:

# Taskfile.yml
version: "3"

includes:
  b: ./b.yml
  c: ./c.yml

tasks:
  default:
    deps:
      - abc
      - ac

  abc:
    cmds:
      - task: b:c:task-a

  ac:
    cmds:
      - task: c:task-a
# b.yml
version: "3"

includes:
  c: ./c.yml
# c.yml
version: "3"

tasks:
  task-a:
    run: once
    sources:
      - ./input
    generates:
      - ./output
    cmds:
      - touch output
❯ task -d ./tmp/1804 abc
task: [b:c:task-a] touch output

❯ task -d ./tmp/1804 ac
task: [c:task-a] touch output          # Runs task again

This is currently working as intended (See discussion in #1655), although we have left the door open to discussion. We made the decision that we would not prevent these tasks from running since they have different calling paths. It is possible that the tasks can contain different data etc and it is safer to run a task twice than risk it not running when it should have.

This needs more discussion before it can be merged, but I have left a couple of comments. @vmaerten, @andreynering, do you have any additional thoughts beyond what has been previously discussed? Note that this implementation appears to follow option 3 as described in #1655 i.e. Whether the task is run or not is only affected when run: once is enabled. The behavior does not appear to change if this is omited.

@libre-man please feel free to speak up if you disagree with any of the comments made previously. Also the PR could probably do with a new test that replicates this issue to ensure that we don't get regressions in the future it this gets merged.

Comment on lines +40 to +51
func (sb *SyncBuffer) String() string {
sb.mu.Lock()
defer sb.mu.Unlock()
return sb.buf.String()
}

func (sb *SyncBuffer) Reset() {
sb.mu.Lock()
defer sb.mu.Unlock()
sb.buf.Reset()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are necessary. You can just call buff.buf.Reset() like you have done elsewhere. The only operation that needs a mutex is Write().

@@ -219,7 +221,7 @@ func (t *Task) DeepCopy() *Task {
Platforms: deepcopy.Slice(t.Platforms),
Location: t.Location.DeepCopy(),
Requires: t.Requires.DeepCopy(),
Namespace: t.Namespace,
Namespace: append([]string{}, t.Namespace...),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change isn't necessary.

Suggested change
Namespace: append([]string{}, t.Namespace...),
Namespace: t.Namespace,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is not an array, which would mean that a deep copy would need to deep copy the array too right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes sorry, you are right. We have deepcopy.Slice() for this.

Suggested change
Namespace: append([]string{}, t.Namespace...),
Namespace: deepcopy.Slice(t.Namespace),

@libre-man
Copy link
Author

libre-man commented Sep 14, 2024

The main issue is that run: once as I see it is used to say "I only want this test to be run once", which it currently doesn't do. I would even say that for run: when_changed the current behaviour of saying that different calling paths are different tasks don't make a lot of sense, however I might be wrong of course. The task, in my opinion, should care where it is called from, only which arguments it gets.

My use case is this: I have a file Taskfile.setup.yml that is imported by many other taskfiles. It is responsible for setting up stuff like Python, pnpm, smithy. Making all those scripts safe to be run concurrently is not really feasible, I would need to use file system locks and it would get way too complicated very quickly. With the current implementation there is no way around this, and it kinda breaks my encapsulation that every task is responsible for setting up their own dependencies.

If the change gets the OK I'll polish the PR with tests too, I'm waiting to do that until the OK so I don't waste any time on that if it won't get merged anyway.

One thing about your reproduction: it also runs the task twice if you would do task --parallel -d ./tmp/1804 abc ac.

@erictooth
Copy link

Just to add to the discussion, I'm running into what sounds like a similar (the same?) situation where some taskfiles get included by multiple other taskfiles, and despite them having run: once, the same task (pkg_a:build) runs multiple times in parallel:

pkg_b:pkg_a:build
pkg_c:pkg_b:pkg_a:build
pkg_d:pkg_c:pkg_b:pkg_a:build

The only workaround I've been able to think of is renaming the task pkg_a_build and then including the pkg_a taskfile with flatten: true. This causes the task to be properly deduplicated across the chained includes and then it runs only once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

run: once not working for a task included multiple times
3 participants