fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

ooesili · 2024-11-26T19:45:26Z

Questions

I believe I have fixed the underlying issue, but I am not sure how to write an integration test to verify the fix. I have created a new integration test function with a TODO comment on where I got stuck. The questions I have around this are:

My plan was to start a pipeline with watch and delete_on_finished enabled, the use an SFTP client directly to inspect which files exist on the server to make sure they are all deleted after the pipeline runs. However, I'm not sure how to actually run the pipeline. Is too specific of a test to run using integration.StreamTests(), and if not, could you point me in the right direction?
The other pattern I've seen would be to call newSFTPReaderFromParsed() directly from the tests then use Connect(), and ReadBatch() to interact with the plugin. However this plugin appears to be unusually structured in the way that it progresses through the input files. What it does is finds the first file in Connect() and sets up the scanner for the file. In ReadBatch(), when the file is exhausted, ReadBatch() returns service.ErrNotConnected which will cause the engine to re-run Connect() which advances to the next file. If the plugin only required Connect() to be called once, I would be happy to drive the plugin directly in the tests, but because of the reconnection logic required, I was hesitant to reimplement the reconnection loop in the tests. Is there a utility somewhere that I can use from a test that implements the reconnect logic?

rockwotj · 2024-11-26T19:51:09Z

I don't think there is a utility so either you need to do option 1 or implement the retry logic - which I don't think should be too bad?

Here's the code that drives this in benthos AFAIK: https://github.com/redpanda-data/benthos/blob/dad70374cd8fb323f0c7f47452498ea94c2ed7aa/internal/component/input/async_reader.go#L115

The pipeline option (number 1) might be the best route, but I'm not too familiar with that test helper myself.

This commit reduces the scope of critical sections guarded by scannerMut to remove a deadlock that causes the last file to not be deleted when the SFTP input is used with watching enabled.

`(*watcherPathProvider).Next()` currently uses recursion to loop until a path is found. This commit refactors that function to use a for loop instead which is more straight forward to read.

This integration test makes sure that when `delete_on_finish` is true and watching is enabled that we delete every file.

ooesili · 2024-12-03T22:18:51Z

internal/impl/sftp/integration_test.go

+	builder := service.NewStreamBuilder()
+	require.NoError(t, builder.SetYAML(config))


@rockwotj

I don't think there is a utility so either you need to do option 1 or implement the retry logic - which I don't think should be too bad?

Here's the code that drives this in benthos AFAIK: https://github.com/redpanda-data/benthos/blob/dad70374cd8fb323f0c7f47452498ea94c2ed7aa/internal/component/input/async_reader.go#L115

The pipeline option (number 1) might be the best route, but I'm not too familiar with that test helper myself.

Knowledge from the great and powerful @mihaitodor

Great tests thanks :)

rockwotj · 2024-12-13T17:22:12Z

internal/impl/sftp/input.go

+					// path from the cache then we skip that path because the
+					// watcher will eventually poll again, and the cache.Get
+					// operation will re-run.
+					if v, err := cache.Get(ctx, path); errors.Is(err, service.ErrKeyNotFound) || (!w.followUpPoll && string(v) == "!") {


nit: separately it would be nice to use constants for the pending symbol and other cache values :)

rockwotj · 2024-12-13T17:29:42Z

internal/impl/sftp/input.go

 	s.scannerMut.Lock()
 	defer s.scannerMut.Unlock()

 	if s.scanner != nil {
-		return nil
+		skip = true


Why do we skip if there is a scanner (and what are we skipping)? Can you add a comment?

rockwotj · 2024-12-13T17:32:37Z

internal/impl/sftp/input.go

+					outErr = fmt.Errorf("remove %v: %w", nextPath, outErr)
+				}
+			}
+			s.scannerMut.Unlock()


since we always return after this block we could defer this right? I just get worried if the unlock is not right after.

rockwotj · 2024-12-13T17:35:02Z

internal/impl/sftp/input.go

+
+	details := service.NewScannerSourceDetails()
+	details.SetName(nextPath)
+	if s.scanner, err = s.scannerCtor.Create(file, func(ctx context.Context, aErr error) (outErr error) {


I think the assignment of s.scanner needs to be under the mutex as well based on the ReadBatch function right?

rockwotj · 2024-12-13T17:42:36Z

internal/impl/sftp/input.go

@@ -242,11 +241,22 @@ func (s *sftpReader) seekNextPath(ctx context.Context) (file *sftp.File, nextPat
 		s.pathProvider = s.getFilePathProvider(ctx)
 	}

+	return s.client, s.pathProvider, false, nil


As a newcomer to this code this feels unsafe.

If the only important part of this code that all these variables are accessed/set together atomically, I do wonder if an atomic is better suited. You can use Swap to set the new value and destroy in Close, Store in Connect and Load in ReadBatch. I don't quite understand the higher level contract here of why it's only required that they are accessed concurrently and we don't have to worry about Close clobbering something ongoing in Connect or ReadBatch.

And when I talk about using atomics I mean using typed.AtomicValue in our typed package in internal/typed and wrapping all this state into a struct so it becomes typed.AtomicValue[*sftpReaderState]

rockwotj · 2024-12-13T17:43:37Z

internal/impl/sftp/integration_test.go

+	builder := service.NewStreamBuilder()
+	require.NoError(t, builder.SetYAML(config))


Great tests thanks :)

ooesili added bug inputs Any tasks or issues relating specifically to inputs labels Nov 26, 2024

ooesili requested review from mihaitodor and Jeffail November 26, 2024 19:45

ooesili self-assigned this Nov 26, 2024

ooesili force-pushed the sftp-delete-last-file branch from 83668cd to 2e47f2a Compare November 26, 2024 19:50

ooesili added 5 commits December 3, 2024 14:33

fix(sftp): fix polling logic in watcher

a36f25e

fix(sftp): fix deadlock so last file is deleted

68eba81

This commit reduces the scope of critical sections guarded by scannerMut to remove a deadlock that causes the last file to not be deleted when the SFTP input is used with watching enabled.

refactor(sftp): use for loop in watcher provider

259d12b

`(*watcherPathProvider).Next()` currently uses recursion to loop until a path is found. This commit refactors that function to use a for loop instead which is more straight forward to read.

fix(sftp): reduce mutex scope even further

18b29aa

test(sftp): add test for delete-on-finish bug

ab133f4

This integration test makes sure that when `delete_on_finish` is true and watching is enabled that we delete every file.

ooesili force-pushed the sftp-delete-last-file branch from 1bbf6fa to ab133f4 Compare December 3, 2024 21:34

ooesili marked this pull request as ready for review December 3, 2024 22:01

ooesili commented Dec 3, 2024

View reviewed changes

rockwotj reviewed Dec 13, 2024

View reviewed changes

mihaitodor mentioned this pull request Dec 17, 2024

Sftp refactor #3073

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

ooesili commented Nov 26, 2024

rockwotj commented Nov 26, 2024

ooesili Dec 3, 2024

rockwotj Dec 13, 2024

rockwotj Dec 13, 2024

rockwotj Dec 13, 2024 •

edited

Loading

rockwotj Dec 13, 2024

rockwotj Dec 13, 2024

rockwotj Dec 13, 2024

rockwotj Dec 13, 2024 •

edited

Loading

rockwotj Dec 13, 2024

		builder := service.NewStreamBuilder()
		require.NoError(t, builder.SetYAML(config))

fix(sftp): make sure to delete last file when watch and delete_on_finish are enabled #3037

Are you sure you want to change the base?

fix(sftp): make sure to delete last file when watch and delete_on_finish are enabled #3037

Conversation

ooesili commented Nov 26, 2024

Questions

rockwotj commented Nov 26, 2024

ooesili Dec 3, 2024

Choose a reason for hiding this comment

rockwotj Dec 13, 2024

Choose a reason for hiding this comment

rockwotj Dec 13, 2024

Choose a reason for hiding this comment

rockwotj Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

rockwotj Dec 13, 2024

Choose a reason for hiding this comment

rockwotj Dec 13, 2024

Choose a reason for hiding this comment

rockwotj Dec 13, 2024

Choose a reason for hiding this comment

rockwotj Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

rockwotj Dec 13, 2024

Choose a reason for hiding this comment

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

fix(sftp): make sure to delete last file when `watch` and `delete_on_finish` are enabled #3037

rockwotj Dec 13, 2024 •

edited

Loading

rockwotj Dec 13, 2024 •

edited

Loading