Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random SFTP connection attempts gets stuck #329

Closed
alexndr79 opened this issue May 23, 2023 · 20 comments
Closed

Random SFTP connection attempts gets stuck #329

alexndr79 opened this issue May 23, 2023 · 20 comments

Comments

@alexndr79
Copy link

alexndr79 commented May 23, 2023

We're seeing random SFTP connection attempts that get stuck forever at:

"ftp-download-task-9" - Thread t@80
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Object.wait(Native Method)
        - waiting on <759b6fec> (a com.jcraft.jsch.Channel$MyPipedInputStream)
        at java.io.PipedInputStream.read(PipedInputStream.java:326)
        at java.io.PipedInputStream.read(PipedInputStream.java:377)
        at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2863)
        at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2891)
        at com.jcraft.jsch.ChannelSftp.start(ChannelSftp.java:233)
        at com.jcraft.jsch.Channel.connect(Channel.java:158)
     .....

We've set the session timeout (using session.setTimeout(60000)) and we use timeout when opening the connection ( session.connect(60000)). We're opening several parallell connections to same server, different sessions, if that can add some value to understanding the issue.

We've looked on this similar issue #54 but we don't have anything going on in the session thread so it feels unrelated:

"Connect thread sftp.company.com session" - Thread t@50080
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at com.jcraft.jsch.IO.getByte(IO.java:93)
        at com.jcraft.jsch.Session.read(Session.java:1074)
        at com.jcraft.jsch.Session.run(Session.java:1643)
        at com.jcraft.jsch.Session$$Lambda$110/1385229225.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:750)

   Locked ownable synchronizers:
        - None

We're using latest release 0.2.8. Problem appears a couple of times every month for a job scheduled to run every minute 24-7.

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

  1. Are you creating multiple SFTP channels from a single session when making parallel connections to the same server, or do you create a different session for each parallel connection to the same server?
  2. For the multiple parallel connections to the same server, are you driving the SFTP transfers all from within the same thread, or do you use a different thread to drive each connection?

Thanks,
Jeremy

@alexndr79
Copy link
Author

Hi @norrisjeremy!

Thanks for getting involved.

  1. New session for each new channel.
  2. Different thread for each channel (and session as defined by 1.)

/Alex

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

Are you calling session.setTimeout() before or after you call session.connect()?

Thanks,
Jeremy

@alexndr79
Copy link
Author

alexndr79 commented May 23, 2023

We're setting session.setTimeout(60000) after session.connect(60000).

/Alex

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

Do you see any Java thread besides the ftp-download-task-X thread that is stuck on any sort of TIMED_WAITING state?

Also, would it be possible for you to share some code snippets of depicting how you are creating sessions and interacting with the SFTP channels?

Also, when these hangs occur, can you capture output of netstat -tn command to see how much data is in the TCP send/receive queues for the SFTP sessions that are hanging?

Thanks,
Jeremy

@alexndr79
Copy link
Author

Hi @norrisjeremy !

I will gladly pull what I can the beginning of next week. Thanks for your support.

.. one additional piece of information that could be useful, we're using a quite old JRE, OpenJDK8 jre8u345.

/A

/Alex

@alexndr79
Copy link
Author

alexndr79 commented May 29, 2023

No, other threads stuck in TIMED_WAITING state which is not obviously related something else.

Here are some snippets describing our usage.

We have wrapped session/channel creation in an FtpConnection- class (presented here somewhat as pseudo- code, I have removed some parts to make everything a little more clear).

This class is instantiated for every FTP connection attempt, and we can potentially do several of these to the same server concurrently (different ftp user though). And I earlier mentioned that we use 60s timeout but seems to be 30s.

public class FtpConnection implements Closeable {

   ...

    private Session session;
    private final ChannelSftp channel;
    private FtpConfiguration ftpConfiguration;
    private final List<ChannelSftp.LsEntry> lsEntries = new ArrayList<>();

    protected static final int CONNECT_TIMEOUT_MS = 30000;
    protected static final int SOCKET_TIMEOUT_MS = 30000;
    protected static final int CHANNEL_TIMEOUT_MS = 30000; 

   protected FtpConnection(FtpConfiguration ftpConfiguration) throws JSchException, SftpException {
       this.ftpConfiguration = ftpConfiguration;
       this.session = openSession(ftpConfiguration);
       this.channel = openChannel(session);
       
       channel.cd("/" + ftpConfiguration.getDirectory());
   }

   private ChannelSftp openChannel(Session session) throws JSchException {
        Channel channel = session.openChannel(CHANNEL_TYPE);
        channel.connect(CHANNEL_TIMEOUT_MS); //  <-- This is where the connect randomly gets stuck. 
        return (ChannelSftp) channel;
   }

   private Session openSession(FtpConfiguration ftpConfiguration) throws JSchException {
        JSch ssh = new JSch();
        Session session = ssh.getSession(ftpConfiguration.getUsername(), ftpConfiguration.getHostname(), ftpConfiguration.getPort());
        session.setPassword(ftpConfiguration.getPassword());
        session.setConfig(CONFIG_HOST_KEY, CONFIG_HOST_KEY_VAL);

        session.setConfig(SERVER_HOST_KEY, session.getConfig(SERVER_HOST_KEY) + "," + SSH_RSA);
        session.setConfig(PUBKEY_ACCEPTED_ALGORITHMS, session.getConfig(PUBKEY_ACCEPTED_ALGORITHMS) + "," + SSH_RSA);

        session.setServerAliveInterval(SOCKET_TIMEOUT_MS);
        session.connect(CONNECT_TIMEOUT_MS);
        session.setTimeout(SOCKET_TIMEOUT_MS);
        return session;
   }


   @Override
   public void close() {
     if (isConnected()) {
        disconnect();
     }
   }

   public boolean isConnected() {
        return (session != null && channel != null) && (session.isConnected() || channel.isConnected());
   }

   public List<ChannelSftp.LsEntry> getFiles() {
        return lsEntries;
   }

   public void listFiles() throws BatchProcessingException {
            channel.ls(ftpConfiguration.getFilePattern(), getFileOnlySelector());
    }

    public void disconnect() {
        if (channel != null) {
            try {
                channel.disconnect();
            } catch (Exception ex) {
                /* NOP */
            }
        }

        //If session is active and not cached, close it.
        if (session != null && sshSessionCache == null) {
            try {
                session.disconnect();
            } catch (Exception ex) {
                /* NOP */
            }
        }
    }

    private ChannelSftp.LsEntrySelector getFileOnlySelector() {
        return entry -> {
            if (!entry.getAttrs().isDir()) {
                lsEntries.add(entry);
            }
            return ChannelSftp.LsEntrySelector.CONTINUE;
        };
    }

}

We then have another class initiating the connection and doing the actual downloads

public class FtpDownloadTask {
	
     public void download(FtpConfiguration ftpConfiguration) {

        FtpConnection ftpConnection = null;

        try {

            //Connect to FTP.
            ftpConnection = new FtpConnection(ftpConfiguration);

            //List files
            ftpConnection.listFiles();
            List<ChannelSftp.LsEntry> lsEntryList = ftpConnection.getFiles();

            //For each file found
            for (ChannelSftp.LsEntry lsEntry : lsEntryList) {
               //Download file
               File localFile = ftpConnection.download(lsEntry);
               //Delete file
               ftpConnection.delete(lsEntry);
               ...
            }

        } finally {
            if (ftpConnection != null) {
                //Disconnect from FTP server
                ftpConnection.disconnect();
            }
        }

    }


@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

So just to confirm, the stuck connections you see always occur when calling channel.connect()?
Do you know what kind of SSH server you are connecting to?
And do you have access to the SSH server to see if it reports anything unusual when these connections get stuck?

Thanks,
Jeremy

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

Also, have you attempted to enable JSch logging to see if anything interesting is logged whenever these connections get stuck?

Thanks,
Jeremy

@alexndr79
Copy link
Author

alexndr79 commented May 30, 2023

Hi! @norrisjeremy !

Correct, always at channel.connect().

We haven't seen the issue for 2 weeks now but back then we had it once per day for three straight days so very random.

I'll try to get hold of the other information requested.

No, we have not enabled extended logging yet but it sounds like a good idea. I guess it's just a matter of defining a proper log level for com.jcraft.jsch?

/A

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

We have several built-in loggers (depending upon the type of logging backend you use):

  1. com.jcraft.jsch.JulLogger (java.util.logging)
  2. com.jcraft.jsch.Log4j2Logger
  3. com.jcraft.jsch.Slf4jLogger

You can simply add a single call to JSch.setLogger(new XYZLogger()); somewhere before you start making any connections (such as in your application startup). Then you would just need to capture logs for com.jcraft.jsch.JSch.

Thanks,
Jeremy

@alexndr79
Copy link
Author

We're using AWS SSH SFTP. Remote string says SSH-2.0-AWS_SFTP_1.1. We don't have much further details than that but I guess we can ask AWS- support if needed.

We have checked SFTP server logs but couldn't see any apparent issues.

We have enabled detailed JSch- logging now so let's see if that will give us anything.

Thanks!

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

How many simultaneous connections to the same server do you max at out?

Thanks,
Jeremy

@alexndr79
Copy link
Author

Hi @norrisjeremy !

We make max 10 concurrent connections to the same server from this JVM.

/Alex

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

And just to confirm, you are using the AWS Transfer Service based SFTP server? And do you back this with S3 or EFS?

Sorry for all the detailed questions: I'm trying to determine if there could be a a way to easily replicate this issue.

Thanks,
Jeremy

@alexndr79
Copy link
Author

Hi @norrisjeremy!

Yes, AWS Transfer Service based SFTP server, backed by S3.

/A

@norrisjeremy
Copy link
Contributor

Hi @alexndr79,

Ok, thanks.
FYI, this is likely something that will be difficult to track down, and unless I can find a way to easily reproduce it, I'm loathe to attempt applying any changes for fear of causing regressions.

The call to channel.connect(CHANNEL_TIMEOUT_MS) ultimately only applies a timeout to the initial SSH_MSG_CHANNEL_OPEN message that gets sent: after the SSH server responds with a SSH_MSG_CHANNEL_OPEN_CONFIRMATION response, we proceed with sending an SSH_FXP_INIT message and then wait indefinitely for an SSH_FXP_VERSION response (which is where you are seeing the intermittent hangs).

Additionally, in a bizarre twist, the SSH server appears to remain responsive in so much as that it would seem to still be responding to periodic keepalives being generated by the ServerAliveInterval, else the Connect thread sftp.company.com session thread would disconnect the session and you would ultimately see the call to channel.connect() throw an Exception of some sort.

Your best bet may simply be to wrap the call to channel.connect() into some sort of async wrapper in which you can apply some sort of timeout to terminate it if it hangs for an extended period of time.
For example, something like:

CompletableFuture<Void> future = CompletableFuture.runAsync(() -> channel.connect(CHANNEL_TIMEOUT_MS));
try {
  future.get(CHANNEL_TIMEOUT_MS, TimeUnit.MILLISECONDS);
} catch (CancellationException e) {
  ...
} catch (ExecutionException e) {
  ...
} catch (InterruptedException e) {
  ...
} catch (TimeoutException e) {
  ...
}

Thanks,
Jeremy

@alexndr79
Copy link
Author

Ok, thanks for you support @norrisjeremy. We will check the logs if we see the hang again and maybe, in the end, go for the solution suggested above. Thanks!

@alexndr79
Copy link
Author

Hi again,

This weekend we faced the issue again. This is what we could see in the logs:

...
20:09:09 [ftp-download-task-1] INFO com.jcraft.jsch.JSch:47 Authentications that can continue: password
20:09:09 [ftp-download-task-1] INFO com.jcraft.jsch.JSch:47 Next authentication method: password
20:09:10 [ftp-download-task-1] INFO com.jcraft.jsch.JSch:47 Authentication succeeded (password).
20:09:10 [ftp-download-task-1] INFO com.myorg.ftpservice.FtpConnection:198 Connecting to channel.
..

Our own logging the revealed at 20:09:10 that we tried to connect to the channel and it then got stuck. Nothing more logged from JSch after 20:09:10.

We will now try the async wrapper approach which should work I guess.

Thanks.

/A

@alexndr79 alexndr79 reopened this Jun 19, 2023
@alexndr79
Copy link
Author

The suggested work around seems to work well for us. Thanks for your support @norrisjeremy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants