Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting error while creating a client in a multithreaded usecase "Failed to create SFTP client: error receiving version packet from server: server unexpectedly closed connection: unexpected EOF" #576

Closed
aashakabra opened this issue Jan 24, 2024 · 8 comments

Comments

@aashakabra
Copy link

Hello,

I have created ssh connection and closed it in the methods which is just called once, but for every new requests I am creating a new client.
This method where new client is created is called in a goroutine by engine. Do I need to synchronize while creating a client? How can I do this? As I am getting error "error receiving version packet from server: server unexpectedly closed connection: unexpected EOF" while creating the client.

//to create connection
conn, err := ssh.Dial("tcp", addr, &config)
if err != nil {
	fmt.Errorf("failed to dial: %s", err.Error())
}
//to close connection
if s.conn != nil {
err := s.conn.Close()
if err != nil {
	logCache.Infof("Error in closing SSH client : ", err.Error())
   }
}

Please note : I cannot use defer conn.close() here as start and stop are called at the start and stop of the engine and only once.

and in between these, the requests to perform upload operation against same connection will be called based on the number of requests.
I am creating a client in this area-

sc, err := sftp.NewClient(conn)
if err != nil {
	fmt.Sprintf("Failed to create SFTP client: %s", err.Error())
}
defer sc.Close()

Here I get the error only when this method is called by multiple requests.

Can you please help how can I proceed so that client creation does not fail in case of multiple parallel requests from engine come in?

@puellanivis
Copy link
Collaborator

I’m not clear on why you are generating multiple clients.

The sftp.NewClient() should work the same as the ssh.Dial() for the duration of the life of the program. (Unless there is a disconnect event, but you should handle that case with a complete reconnect with a new ssh.Dial())

@aashakabra
Copy link
Author

Thankyou for reply @puellanivis.
In my initial code, I created ssh.Dial() and sftp.NewClient() just once both in Start method. and used this client in a code (for e.g client.OpenFile() ) which gets called from engine multiple times based on number of jobs spawned. if the job spawned is just one, client.OpenFile() works fine but when two parallel jobs are spawned I get connection lost error.

It looks like there was some error which caused the client to close the connection (connection was not closed by my Stop method). Why exactly I am not able to use same client for two parallel requests? or do I need to handle in some different way?

Let me also try to explain this as usecase-

I have start method having ssh.Dial and NewClient()
I have a stop method where I close client and ssh connection

Case 1: I have one flow -> having SFTP download operation followed by SFTP upload operation
This works fine.

Case 2 : I have two flows

Flow1 has -> SFTP download followed by SFTP upload
Flow2 is exact copy of Flow1

Start stop is called just once but these two flows where client.OpenFile is used is called in parallel not after one another. And in that case I get Connection Lost error. Some error must have caused this connection to close (and not my stop method).

How can I handle this?

@puellanivis
Copy link
Collaborator

Your Case 2 scenario should work just fine. I would however, have to see some source code of a reproduction to know what you might be doing wrong.

As an example, #572 is opening 1000 files all at the same time for writing, and while it does hit a bug, that’s only because it calls the file close operation twice, and we’re addressing it.

@aashakabra
Copy link
Author

aashakabra commented Jan 30, 2024

Hello @puellanivis , I wasn't opening the client with ClientOption like UseConcurrentReads, UseConcurrentWrites. But even after I did now, I am still getting same error.

Let me share the snippet -

//start method

config := ssh.ClientConfig{
	User: s.User,
	Auth: []ssh.AuthMethod{
		ssh.PublicKeys(signer),
	},
	HostKeyCallback: ssh.InsecureIgnoreHostKey(),
}

addr := fmt.Sprintf("%s:%d", s.Host, s.Port)
conn, _ := ssh.Dial("tcp", addr, &config)
//defer conn.Close() is commented on purpose and close is moved to Stop method called when engine is shut

client, _ := sftp.NewClient(conn)
//defer client.Close() is commented on purpose and close is moved to Stop method called when engine is shut

//stop method

client.Close()
conn.Close()

//upload operation -> file to file transfer ---this method gets called in parallel by engine if I spawn more than one job

srcFile, _ := os.Open(localFileName)
defer srcFile.Close()

//file on sftp server
var dstFile *sftp.File
if overwrite {
	dstFile, err = client.OpenFile(remoteFileName, (os.O_TRUNC | os.O_CREATE | os.O_WRONLY))
} else {
	dstFile, err = client.OpenFile(remoteFileName, (os.O_APPEND | os.O_CREATE | os.O_WRONLY))
}
if err != nil {
	fmt.Println(err.Error())
}
defer dstFile.Close()

bytes, err := io.Copy(dstFile, srcFile)
if err != nil {
	fmt.Println(err.Error())
}
fmt.Println(bytes)

Error I am getting is upload operation - connection lost at client.OpenFile

Error when I shut the engine and stop method is called -
Error closing SFTP client : EOF

What am I doing incorrectly?

@puellanivis
Copy link
Collaborator

The UseConcurrentReads and UseConcurrentWrites were irrelevant to the bug in the linked to issue, as they were only ever transferring 14-bytes at a time, well below the maximum size of an SFTP data packet.

I still can’t really say what is going wrong, I need to see more code, specifically, the parallelizing code, particularly since you’re only seeing issues if you call it in parallel.

What server are you trying to talk to?

@aashakabra
Copy link
Author

Parallel calling code is more of a engine code and I get a callback from it. I do no have access to that.

I am connecting with jscape sftp server, .rebex tiny sftp server.

@puellanivis
Copy link
Collaborator

I am unsure how I could proceed towards helping you without a grasp for how the parallel calling code works… there is nothing wrong with the code you have posted, and there is no reason for why it should be acting the way you are describing, except or unless it’s closing the client?

Although maybe, 🤔 does the same behavior persist if you try it against an openssh sftp server? Perhaps there is a subtle incompatibility of our packet encodings/decodings?

@aashakabra
Copy link
Author

Thankyou @puellanivis for your reply and explanation throughout this thread.

I did a more debugging by writing a go test with goroutines and then found I was passing *sftp.Client by value instead of by reference. This was causing breaking of the flow in case of multithreading.

Thankyou for your time. Will make a note of sharing the method signature as well next time to catch such mistakes earliest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants