Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interrupted transfers when exceeding max simultaneous SMB operations #492

Open
mmakassikis opened this issue Dec 5, 2024 · 11 comments
Open

Comments

@mmakassikis
Copy link

Hello,

There is an issue when transferring large files when the "ksmbd: check outstanding simultaneous SMB operations" d0e6ce1
commit is applied and "smb2 max credits" is set to a small value.

My test environment has the following config:

[global]
        workgroup = WORKGROUP
        server string = filesrv
        guest user = nobody
        netbios name = filesrv
        follow symlinks = no
        map to guest = never
        share:fake_fscaps = 0
        smb2 max credits = 64
        smb2 max read = 64K
        smb2 max write = 64K
        smb2 max trans = 64K

[testshare]
        comment = testshare
        path = /media/share
        writeable = yes

Copying a large file (~25GB) fails and there is the following console log:

[ 5980.324541] ksmbd: Cannot handle request

On the client side I see:

$ smbclient //192.168.1.254/testshare -U user%pass 
Try "help" to get a list of possible commands.
smb: \> put file.bin 
cli_push returned NT_STATUS_USER_SESSION_DELETED
putting file file.bin as \file.bin smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
NT_STATUS_INTERNAL_ERROR closing remote file \file.bin
smb: \> smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed

The connection is closed because checking conn->mux_smb_requests returns -ENOSPC in queue_ksmbd_work() and this breaks out of the loop in ksmbd_conn_handler_loop().

I tried handling the error with the following patch:

diff --git a/fs/smb/server/connection.c b/fs/smb/server/connection.c
index ea1cd9dc9cd1..7d0c40551b50 100644
--- a/fs/smb/server/connection.c
+++ b/fs/smb/server/connection.c
@@ -345,6 +345,8 @@ int ksmbd_conn_handler_loop(void *p)
 	conn->last_active = jiffies;
 	set_freezable();
 	while (ksmbd_conn_alive(conn)) {
+		int ret;
+
 		if (try_to_freeze())
 			continue;
 
@@ -420,7 +422,8 @@ int ksmbd_conn_handler_loop(void *p)
 			break;
 		}
 
-		if (default_conn_ops.process_fn(conn)) {
+		ret = default_conn_ops.process_fn(conn);
+		if (ret && ret != -ENOSPC) {
 			pr_err("Cannot handle request\n");
 			break;
 		}

This doesn't break the connection, but the transfer still fails:

$ smbclient //192.168.1.254/testshare -U user%pass
Try "help" to get a list of possible commands.
smb: \> put file.bin
cli_push returned NT_STATUS_IO_TIMEOUT
putting file file.bin as \file.bin smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
NT_STATUS_INTERNAL_ERROR closing remote file \file.bin
smb: \> smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed
smb: \> smb2cli_req_compound_submit: Insufficient credits. 0 available, 1 needed

I see the same behaviour using Windows as a client (fails with error 0x8007003b).

I am not sure the "Insufficient credits" message is relevant (or event right).

Without the "check outstanding simultaneous SMB operations" patch, the file transfer completes with no errors.
I tested doing a file copy to a samba server (4.15.13+dfsg-0ubuntu1.6 and 4.20.6-r1 on alpinelinux) with "smb2 max credits = 64" and read/write/trans set as 64K and smbclient succeeded.

I tried the following quick and dirty patch to try to apply backpressure to the client, i.e.: stop reading from the socket until there is room for new smb operations:

diff --git a/fs/smb/server/connection.c b/fs/smb/server/connection.c
index ea1cd9dc9cd1..f4419483bfc6 100644
--- a/fs/smb/server/connection.c
+++ b/fs/smb/server/connection.c
@@ -79,6 +79,7 @@ struct ksmbd_conn *ksmbd_conn_alloc(void)
 
 	init_waitqueue_head(&conn->req_running_q);
 	init_waitqueue_head(&conn->r_count_q);
+	init_waitqueue_head(&conn->smb_req_q);
 	INIT_LIST_HEAD(&conn->conns_list);
 	INIT_LIST_HEAD(&conn->requests);
 	INIT_LIST_HEAD(&conn->async_requests);
@@ -345,6 +347,8 @@ int ksmbd_conn_handler_loop(void *p)
 	conn->last_active = jiffies;
 	set_freezable();
 	while (ksmbd_conn_alive(conn)) {
+		int ret;
+
 		if (try_to_freeze())
 			continue;
 
@@ -420,9 +424,15 @@ int ksmbd_conn_handler_loop(void *p)
 			break;
 		}
 
-		if (default_conn_ops.process_fn(conn)) {
-			pr_err("Cannot handle request\n");
-			break;
+		ret = default_conn_ops.process_fn(conn);
+		if (ret) {
+			if (ret == -ENOSPC)
+				wait_event_interruptible(conn->smb_req_q,
+							 atomic_read(&conn->mux_smb_requests) < conn->vals->max_credits);
+			else {
+				pr_err("Cannot handle request\n");
+				break;
+			}
 		}
 	}
 
diff --git a/fs/smb/server/connection.h b/fs/smb/server/connection.h
index 8ddd5a3c7baf..3aa0e60bad27 100644
--- a/fs/smb/server/connection.h
+++ b/fs/smb/server/connection.h
@@ -63,6 +63,7 @@ struct ksmbd_conn {
 	spinlock_t			credits_lock;
 	wait_queue_head_t		req_running_q;
 	wait_queue_head_t		r_count_q;
+	wait_queue_head_t		smb_req_q;
 	/* Lock to protect requests list*/
 	spinlock_t			request_lock;
 	struct list_head		requests;
diff --git a/fs/smb/server/server.c b/fs/smb/server/server.c
index 0305b549a5f9..a955390f54c7 100644
--- a/fs/smb/server/server.c
+++ b/fs/smb/server/server.c
@@ -270,7 +270,9 @@ static void handle_ksmbd_work(struct work_struct *wk)
 
 	ksmbd_conn_try_dequeue_request(work);
 	ksmbd_free_work_struct(work);
-	atomic_dec(&conn->mux_smb_requests);
+	if (atomic_dec_return(&conn->mux_smb_requests) < conn->vals->max_credits &&
+	    waitqueue_active(&conn->smb_req_q))
+		wake_up(&conn->smb_req_q);
 	/*
 	 * Checking waitqueue to dropping pending requests on
 	 * disconnection. waitqueue_active is safe because it

This works better (more data is transferred) but it still fails:

$ smbclient //192.168.1.254/testshare -U user%pass 
Try "help" to get a list of possible commands.
smb: \> put file.bin 
cli_push returned NT_STATUS_IO_TIMEOUT
NT_STATUS_CONNECTION_DISCONNECTED closing remote file \file.bin
smb: \> putting file file.bin as \file.bin SMBecho failed (NT_STATUS_CONNECTION_DISCONNECTED). The connection is disconnected now

Server side ksmbd sees the connection was reset:

[14779.229810] ksmbd: sock_read failed: -104
[14907.230304] ksmbd: sock_read failed: -104
[14907.230375] ksmbd: Failed to send message: -32
[14907.233349] ksmbd: Failed to send message: -32
[14907.238406] ksmbd: Failed to send message: -32
[14907.242199] ksmbd: Failed to send message: -32
[14907.246738] ksmbd: Failed to send message: -32
[14907.250959] ksmbd: Failed to send message: -32
[14907.255513] ksmbd: Failed to send message: -32
[14907.259904] ksmbd: Failed to send message: -32
[14907.264477] ksmbd: Failed to send message: -32
[14907.268781] ksmbd: Failed to send message: -32
[14907.273176] ksmbd: Failed to send message: -32
[14907.277830] ksmbd: Failed to send message: -32

The NT_STATUS_IO_TIMEOUT makes me think smbclient starts a timer when it has finished sending a request, so I modified the patch to check mux_smb_requests counter before reading from the socket:

diff --git a/fs/smb/server/connection.c b/fs/smb/server/connection.c
index ea1cd9dc9cd1..fa5b545b53ce 100644
--- a/fs/smb/server/connection.c
+++ b/fs/smb/server/connection.c
@@ -79,6 +79,7 @@ struct ksmbd_conn *ksmbd_conn_alloc(void)
 
 	init_waitqueue_head(&conn->req_running_q);
 	init_waitqueue_head(&conn->r_count_q);
+	init_waitqueue_head(&conn->smb_req_q);
 	INIT_LIST_HEAD(&conn->conns_list);
 	INIT_LIST_HEAD(&conn->requests);
 	INIT_LIST_HEAD(&conn->async_requests);
@@ -351,6 +353,12 @@ int ksmbd_conn_handler_loop(void *p)
 		kvfree(conn->request_buf);
 		conn->request_buf = NULL;
 
+		if (atomic_inc_return(&conn->mux_smb_requests) >= 64) {
+			atomic_dec_return(&conn->mux_smb_requests);
+			wait_event_interruptible(conn->smb_req_q,
+						 atomic_read(&conn->mux_smb_requests) < 64);
+		}
+
 		size = t->ops->read(t, hdr_buf, sizeof(hdr_buf), -1);
 		if (size != sizeof(hdr_buf))
 			break;
diff --git a/fs/smb/server/connection.h b/fs/smb/server/connection.h
index 8ddd5a3c7baf..3aa0e60bad27 100644
--- a/fs/smb/server/connection.h
+++ b/fs/smb/server/connection.h
@@ -63,6 +63,7 @@ struct ksmbd_conn {
 	spinlock_t			credits_lock;
 	wait_queue_head_t		req_running_q;
 	wait_queue_head_t		r_count_q;
+	wait_queue_head_t		smb_req_q;
 	/* Lock to protect requests list*/
 	spinlock_t			request_lock;
 	struct list_head		requests;
diff --git a/fs/smb/server/server.c b/fs/smb/server/server.c
index 0305b549a5f9..8a758b292525 100644
--- a/fs/smb/server/server.c
+++ b/fs/smb/server/server.c
@@ -270,7 +270,9 @@ static void handle_ksmbd_work(struct work_struct *wk)
 
 	ksmbd_conn_try_dequeue_request(work);
 	ksmbd_free_work_struct(work);
-	atomic_dec(&conn->mux_smb_requests);
+	if (atomic_dec_return(&conn->mux_smb_requests) < 64 &&
+	    waitqueue_active(&conn->smb_req_q))
+		wake_up(&conn->smb_req_q);
 	/*
 	 * Checking waitqueue to dropping pending requests on
 	 * disconnection. waitqueue_active is safe because it
@@ -300,11 +302,6 @@ static int queue_ksmbd_work(struct ksmbd_conn *conn)
 	if (err)
 		return 0;
 
-	if (atomic_inc_return(&conn->mux_smb_requests) >= conn->vals->max_credits) {
-		atomic_dec_return(&conn->mux_smb_requests);
-		return -ENOSPC;
-	}
-
 	work = ksmbd_alloc_work_struct();
 	if (!work) {
 		pr_err("allocation for work failed\n");

Since conn->vals is not initialized in the loop, I hardcoded the max to 64 for testing purposes.
With the patch above, smbclient and windows don't fail when transferring a large file.

I'd like to clean this up for inclusion, but there are a few questions:

  • should we conflate max_credits and max simultaneous requests ? Perhaps it can be set in server_conf so that userland can override it if necessary. This would solve the problem with the value not being known early in ksmbd_conn_handler_loop()
  • i added a waitlist, but i wonder if req_running_q can be used instead. How is mux_smb_requests different from conn->req_running ?
    Related: why is conn->req_running not incremented on SMB Cancel requests ? ksmbd_conn_try_dequeue_request() always decrements conn->req_running
@namjaejeon
Copy link
Owner

Looks great work!

should we conflate max_credits and max simultaneous requests ? Perhaps it can be set in server_conf so that userland can override it if necessary. This would solve the problem with the value not being known early in ksmbd_conn_handler_loop()

Yes. we can use max_credits for max simultaneous requests. and we can move it to server_conf.

i added a waitlist, but i wonder if req_running_q can be used instead. How is mux_smb_requests different from conn->req_running ?

As you checked, ksmbd don't count smb2 cancel for req_running. because it doesn't add it to request list. Please check smb2_cacnel function. ksmbd need to find the request using message id of smb2 cancel. If you want to use req_running, we need to add if condition to skip smb2 cancel type when finding a request in smb2 cancel.
and ksmbd_conn_try_dequeue_request check() check if request entry is empty. so it doesn't decrement.

@mmakassikis
Copy link
Author

I was thinking we can get away with always updating req_running in ksmbd_conn_enqueue_request() / ksmbd_conn_try_dequeue_request(), and keeping the behaviour around not adding smb2 cancel to conn->requests. smb2_cancel handler traverses the conn->requests list without checking req_running so that would work.

The current code doesn't account for a smb2 cancel that is in flight when either ksmbd_conn_wait_idle() or ksmbd_conn_wait_idle_sess_id() is called. Is that correct ?

@namjaejeon
Copy link
Owner

Right, If we do that as you said, we can use req_running instead of mux_smb_requests.

@mmakassikis
Copy link
Author

Tentative implementation:

https://github.com/mmakassikis/ksmbd/tree/private/marios/wip-max-inflight-req

Comments & tests welcome

@namjaejeon
Copy link
Owner

Totally looks good except missing wake_up event. I have added the comment in your patch. Please update it. I will test it.

@mmakassikis
Copy link
Author

updated

@namjaejeon
Copy link
Owner

reconnect1 test in smbtorture failed.

test: reconnect1
time: 2024-12-06 16:13:22.371112
time: 2024-12-06 16:13:29.675636
failure: reconnect1 [
../../source4/torture/smb2/session.c:110: status was NT_STATUS_CONNECTION_DISCONNECTED, expected NT_STATUS_USER_SESSION_DELETED: smb2_getinfo_file returned unexpected status
]
Error: Process completed with exit code 1.

@namjaejeon
Copy link
Owner

I have sent the testcase to your mail. Can you check it?

@namjaejeon
Copy link
Owner

I have checked testcases I sent.

You need to move atomic_dec to under wait_event_interruptible(). and no need to use XXX_return. use _dec().

		if (atomic_inc_return(&conn->req_running) >= max_req) {
			atomic_dec_return(&conn->req_running);
			wait_event_interruptible(conn->req_running_q,
				atomic_read(&conn->req_running) < max_req);
		}

So..

		if (atomic_inc_return(&conn->req_running) >= max_req) {
			wait_event_interruptible(conn->req_running_q,
				atomic_read(&conn->req_running) < max_req);
                        atomic_dec(&conn->req_running);
		}

Can you test on your side and your testcases(smbclient) also?

@namjaejeon
Copy link
Owner

This is correct fix. Please check.

recheck:
                if (atomic_inc_return(&conn->req_running) > max_req) {
                        atomic_dec(&conn->req_running);
                        wait_event_interruptible(conn->req_running_q,
                                atomic_read(&conn->req_running) < max_req);
                        goto recheck;
                }

@namjaejeon
Copy link
Owner

I have updated your patch. Please check it.
08fd7f5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants