Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suddenly, the unsent data size becomes zero. Why? #651

Open
kiyotsu opened this issue Feb 24, 2022 · 15 comments
Open

Suddenly, the unsent data size becomes zero. Why? #651

kiyotsu opened this issue Feb 24, 2022 · 15 comments

Comments

@kiyotsu
Copy link

kiyotsu commented Feb 24, 2022

I am using usrsctp as the SCTP stack for WebRTC.
In our environment, a strange phenomenon occurs.
That is, while continuing to deliver the video stream, the delivery suddenly stopped.

Wanting to figure them out, we enabled some of the logging in "sctp_lower_sosend()".

From around 23:03:14.069558
・"prw" has not changed.
・The value of unsent keeps increasing.

23:03:13.199946  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.200028  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.200095  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.200154  USR Send complete qo:0 prw:129664 unsent:117 tf:378 cooq:4 toqs:378 err:0
23:03:13.269830  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.269924  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.269989  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.270047  USR Send complete qo:0 prw:129664 unsent:120 tf:381 cooq:4 toqs:381 err:0
23:03:13.379646  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.379751  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.379823  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.379883  USR Send complete qo:0 prw:129668 unsent:115 tf:376 cooq:4 toqs:376 err:0
23:03:13.489684  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.489817  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.489885  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.489958  USR Send complete qo:0 prw:129664 unsent:118 tf:379 cooq:4 toqs:379 err:0
23:03:13.569716  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.569810  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.569891  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.569948  USR Send complete qo:0 prw:129676 unsent:107 tf:368 cooq:4 toqs:368 err:0
23:03:13.689677  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.689766  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.689832  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.689890  USR Send complete qo:0 prw:129668 unsent:115 tf:376 cooq:4 toqs:376 err:0
23:03:13.789856  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.789958  USR Send complete qo:0 prw:130404 unsent:65 tf:151 cooq:2 toqs:151 err:0
23:03:13.790062  USR Send complete qo:0 prw:130108 unsent:24 tf:191 cooq:3 toqs:191 err:0
23:03:13.790174  USR Send complete qo:0 prw:129616 unsent:219 tf:426 cooq:4 toqs:426 err:0
23:03:13.790354  USR Send complete qo:0 prw:127672 unsent:1397 tf:1855 cooq:6 toqs:1855 err:0
23:03:13.790532  USR Send complete qo:0 prw:125664 unsent:1462 tf:3349 cooq:8 toqs:3349 err:0
23:03:13.790696  USR Send complete qo:0 prw:123656 unsent:1462 tf:4843 cooq:10 toqs:4843 err:0
23:03:13.790725  USR Send complete qo:1 prw:123656 unsent:1462 tf:4843 cooq:10 toqs:6305 err:0
23:03:13.790747  USR Send complete qo:1 prw:123656 unsent:2678 tf:4843 cooq:10 toqs:7521 err:0
23:03:13.879793  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.879889  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.879967  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.880041  USR Send complete qo:0 prw:129472 unsent:309 tf:570 cooq:4 toqs:570 err:0
23:03:13.969602  USR Send complete qo:0 prw:130744 unsent:54 tf:70 cooq:1 toqs:70 err:0
23:03:13.969701  USR Send complete qo:0 prw:130448 unsent:24 tf:110 cooq:2 toqs:110 err:0
23:03:13.969792  USR Send complete qo:0 prw:130056 unsent:119 tf:245 cooq:3 toqs:245 err:0
23:03:13.969876  USR Send complete qo:0 prw:129632 unsent:151 tf:412 cooq:4 toqs:412 err:0
23:03:14.069558  USR Send complete qo:0 prw:131072 unsent:54 tf:0 cooq:0 toqs:54 err:0
23:03:14.069614  USR Send complete qo:0 prw:131072 unsent:24 tf:0 cooq:0 toqs:24 err:0
23:03:14.069639  USR Send complete qo:0 prw:131072 unsent:143 tf:0 cooq:0 toqs:143 err:0
23:03:14.109509  USR Send complete qo:0 prw:131072 unsent:270 tf:0 cooq:0 toqs:270 err:0
23:03:14.189516  USR Send complete qo:0 prw:131072 unsent:324 tf:0 cooq:0 toqs:324 err:0
23:03:14.189566  USR Send complete qo:0 prw:131072 unsent:348 tf:0 cooq:0 toqs:348 err:0
23:03:14.189592  USR Send complete qo:0 prw:131072 unsent:467 tf:0 cooq:0 toqs:467 err:0
23:03:14.189615  USR Send complete qo:0 prw:131072 unsent:594 tf:0 cooq:0 toqs:594 err:0
23:03:14.299552  USR Send complete qo:0 prw:131072 unsent:648 tf:0 cooq:0 toqs:648 err:0
23:03:14.299594  USR Send complete qo:0 prw:131072 unsent:672 tf:0 cooq:0 toqs:672 err:0
23:03:14.299618  USR Send complete qo:0 prw:131072 unsent:791 tf:0 cooq:0 toqs:791 err:0
23:03:14.299639  USR Send complete qo:0 prw:131072 unsent:906 tf:0 cooq:0 toqs:906 err:0
23:03:14.389695  USR Send complete qo:0 prw:131072 unsent:960 tf:0 cooq:0 toqs:960 err:0
23:03:14.389735  USR Send complete qo:0 prw:131072 unsent:984 tf:0 cooq:0 toqs:984 err:0
23:03:14.389761  USR Send complete qo:0 prw:131072 unsent:1103 tf:0 cooq:0 toqs:1103 err:0
23:03:14.389784  USR Send complete qo:0 prw:131072 unsent:1220 tf:0 cooq:0 toqs:1220 err:0
23:03:14.499542  USR Send complete qo:0 prw:131072 unsent:1274 tf:0 cooq:0 toqs:1274 err:0
23:03:14.499583  USR Send complete qo:0 prw:131072 unsent:1298 tf:0 cooq:0 toqs:1298 err:0
23:03:14.499608  USR Send complete qo:0 prw:131072 unsent:1417 tf:0 cooq:0 toqs:1417 err:0
23:03:14.499629  USR Send complete qo:0 prw:131072 unsent:1533 tf:0 cooq:0 toqs:1533 err:0
23:03:14.589555  USR Send complete qo:0 prw:131072 unsent:1587 tf:0 cooq:0 toqs:1587 err:0
23:03:14.589604  USR Send complete qo:0 prw:131072 unsent:1611 tf:0 cooq:0 toqs:1611 err:0
23:03:14.589629  USR Send complete qo:0 prw:131072 unsent:1730 tf:0 cooq:0 toqs:1730 err:0
23:03:14.589653  USR Send complete qo:0 prw:131072 unsent:1842 tf:0 cooq:0 toqs:1842 err:0
23:03:14.699553  USR Send complete qo:0 prw:131072 unsent:1896 tf:0 cooq:0 toqs:1896 err:0
23:03:14.699594  USR Send complete qo:0 prw:131072 unsent:1920 tf:0 cooq:0 toqs:1920 err:0
23:03:14.699620  USR Send complete qo:0 prw:131072 unsent:2039 tf:0 cooq:0 toqs:2039 err:0
23:03:14.699643  USR Send complete qo:0 prw:131072 unsent:2153 tf:0 cooq:0 toqs:2153 err:0
23:03:14.779539  USR Send complete qo:0 prw:131072 unsent:2207 tf:0 cooq:0 toqs:2207 err:0
23:03:14.779588  USR Send complete qo:0 prw:131072 unsent:2272 tf:0 cooq:0 toqs:2272 err:0
23:03:14.779612  USR Send complete qo:0 prw:131072 unsent:2296 tf:0 cooq:0 toqs:2296 err:0
23:03:14.779634  USR Send complete qo:0 prw:131072 unsent:2515 tf:0 cooq:0 toqs:2515 err:0
23:03:14.779661  USR Send complete qo:0 prw:131072 unsent:3912 tf:0 cooq:0 toqs:3912 err:0
23:03:14.779686  USR Send complete qo:0 prw:131072 unsent:5374 tf:0 cooq:0 toqs:5374 err:0
23:03:14.779709  USR Send complete qo:0 prw:131072 unsent:6836 tf:0 cooq:0 toqs:6836 err:0
23:03:14.779731  USR Send complete qo:0 prw:131072 unsent:8298 tf:0 cooq:0 toqs:8298 err:0
23:03:14.779755  USR Send complete qo:0 prw:131072 unsent:9509 tf:0 cooq:0 toqs:9509 err:0
23:03:14.889627  USR Send complete qo:0 prw:131072 unsent:9563 tf:0 cooq:0 toqs:9563 err:0
23:03:14.889683  USR Send complete qo:0 prw:131072 unsent:9587 tf:0 cooq:0 toqs:9587 err:0
23:03:14.889707  USR Send complete qo:0 prw:131072 unsent:9706 tf:0 cooq:0 toqs:9706 err:0
23:03:14.889741  USR Send complete qo:0 prw:131072 unsent:10002 tf:0 cooq:0 toqs:10002 err:0
23:03:14.969608  USR Send complete qo:0 prw:131072 unsent:10056 tf:0 cooq:0 toqs:10056 err:0
23:03:14.969669  USR Send complete qo:0 prw:131072 unsent:10080 tf:0 cooq:0 toqs:10080 err:0
23:03:14.969693  USR Send complete qo:0 prw:131072 unsent:10199 tf:0 cooq:0 toqs:10199 err:0
23:03:14.969718  USR Send complete qo:0 prw:131072 unsent:10362 tf:0 cooq:0 toqs:10362 err:0
23:03:15.089532  USR Send complete qo:0 prw:131072 unsent:10416 tf:0 cooq:0 toqs:10416 err:0
23:03:15.089589  USR Send complete qo:0 prw:131072 unsent:10440 tf:0 cooq:0 toqs:10440 err:0
23:03:15.089616  USR Send complete qo:0 prw:131072 unsent:10559 tf:0 cooq:0 toqs:10559 err:0

Also, at 23:04:34.589521, the value of unsent

23:04:33.869666  USR Send complete qo:0 prw:131072 unsent:863950 tf:0 cooq:0 toqs:863950 err:0
23:04:33.869689  USR Send complete qo:0 prw:131072 unsent:864243 tf:0 cooq:0 toqs:864243 err:0
23:04:33.989552  USR Send complete qo:0 prw:131072 unsent:864297 tf:0 cooq:0 toqs:864297 err:0
23:04:33.989627  USR Send complete qo:0 prw:131072 unsent:864321 tf:0 cooq:0 toqs:864321 err:0
23:04:33.989656  USR Send complete qo:0 prw:131072 unsent:864440 tf:0 cooq:0 toqs:864440 err:0
23:04:33.989679  USR Send complete qo:0 prw:131072 unsent:864573 tf:0 cooq:0 toqs:864573 err:0
23:04:34.069649  USR Send complete qo:0 prw:131072 unsent:864627 tf:0 cooq:0 toqs:864627 err:0
23:04:34.069699  USR Send complete qo:0 prw:131072 unsent:864651 tf:0 cooq:0 toqs:864651 err:0
23:04:34.069724  USR Send complete qo:0 prw:131072 unsent:864770 tf:0 cooq:0 toqs:864770 err:0
23:04:34.069745  USR Send complete qo:0 prw:131072 unsent:864888 tf:0 cooq:0 toqs:864888 err:0
23:04:34.189493  USR Send complete qo:0 prw:131072 unsent:864942 tf:0 cooq:0 toqs:864942 err:0
23:04:34.189552  USR Send complete qo:0 prw:131072 unsent:864966 tf:0 cooq:0 toqs:864966 err:0
23:04:34.189602  USR Send complete qo:0 prw:131072 unsent:865085 tf:0 cooq:0 toqs:865085 err:0
23:04:34.189641  USR Send complete qo:0 prw:131072 unsent:865192 tf:0 cooq:0 toqs:865192 err:0
23:04:34.309529  USR Send complete qo:0 prw:131072 unsent:865246 tf:0 cooq:0 toqs:865246 err:0
23:04:34.309581  USR Send complete qo:0 prw:131072 unsent:865270 tf:0 cooq:0 toqs:865270 err:0
23:04:34.309605  USR Send complete qo:0 prw:131072 unsent:865389 tf:0 cooq:0 toqs:865389 err:0
23:04:34.309626  USR Send complete qo:0 prw:131072 unsent:865491 tf:0 cooq:0 toqs:865491 err:0
23:04:34.389695  USR Send complete qo:0 prw:131072 unsent:865545 tf:0 cooq:0 toqs:865545 err:0
23:04:34.389761  USR Send complete qo:0 prw:131072 unsent:865569 tf:0 cooq:0 toqs:865569 err:0
23:04:34.389794  USR Send complete qo:0 prw:131072 unsent:865688 tf:0 cooq:0 toqs:865688 err:0
23:04:34.389843  USR Send complete qo:0 prw:131072 unsent:865782 tf:0 cooq:0 toqs:865782 err:0
23:04:34.509533  USR Send complete qo:0 prw:131072 unsent:865836 tf:0 cooq:0 toqs:865836 err:0
23:04:34.509574  USR Send complete qo:0 prw:131072 unsent:865860 tf:0 cooq:0 toqs:865860 err:0
23:04:34.509600  USR Send complete qo:0 prw:131072 unsent:865979 tf:0 cooq:0 toqs:865979 err:0
23:04:34.509624  USR Send complete qo:0 prw:131072 unsent:866083 tf:0 cooq:0 toqs:866083 err:0
23:04:34.589521  USR Send complete qo:0 prw:131072 unsent:54 tf:0 cooq:0 toqs:54 err:0
23:04:34.589566  USR Send complete qo:0 prw:131072 unsent:78 tf:0 cooq:0 toqs:78 err:0
23:04:34.589589  USR Send complete qo:0 prw:131072 unsent:197 tf:0 cooq:0 toqs:197 err:0
23:04:34.589612  USR Send complete qo:0 prw:131072 unsent:298 tf:0 cooq:0 toqs:298 err:0
23:04:34.709538  USR Send complete qo:0 prw:131072 unsent:352 tf:0 cooq:0 toqs:352 err:0
23:04:34.709579  USR Send complete qo:0 prw:131072 unsent:376 tf:0 cooq:0 toqs:376 err:0
23:04:34.709607  USR Send complete qo:0 prw:131072 unsent:495 tf:0 cooq:0 toqs:495 err:0
23:04:34.709631  USR Send complete qo:0 prw:131072 unsent:610 tf:0 cooq:0 toqs:610 err:0
23:04:34.789578  USR Send complete qo:0 prw:131072 unsent:664 tf:0 cooq:0 toqs:664 err:0
23:04:34.789625  USR Send complete qo:0 prw:131072 unsent:729 tf:0 cooq:0 toqs:729 err:0
23:04:34.789710  USR Send complete qo:0 prw:131072 unsent:753 tf:0 cooq:0 toqs:753 err:0
23:04:34.789756  USR Send complete qo:0 prw:131072 unsent:972 tf:0 cooq:0 toqs:972 err:0

I assume that the reason why the number of unsent data keeps increasing is that SACK is not returned from the destination.
Why is the value of unsent set to 0?

OS: Linux 32bit
SendBufferSize:1.5MB
Destination:Chrome browser,Edge

@tuexen
Copy link
Member

tuexen commented Feb 24, 2022

Is this the complete logging you get while this happens? The interesting part is that the asoc.total_output_queue_size drops. Are you using PR-SCTP? If yes, which policy and which parameter are you using?

I find it also strange that the asoc.total_flight is 0.

@kiyotsu
Copy link
Author

kiyotsu commented Feb 24, 2022

@tuexen

The opening of the data channel is done by Chrome.
Reliability and order guarantees seem to be in effect.

I am running the following usrsctp system call.
usrsctp_sysctl_set_sctp_sendspace(1500*1024);
usrsctp_sysctl_set_sctp_add_more_threshold(0);
usrsctp_sysctl_set_sctp_system_free_resc_limit(0);
usrsctp_sysctl_set_sctp_asoc_free_resc_limit(0);
usrsctp_sysctl_set_sctp_nrsack_enable(1);

Also, I have enabled non-blocking and SCTP_NODELAY in socket options.

@kiyotsu
Copy link
Author

kiyotsu commented Feb 24, 2022

Only the following logs are output.

SCTPDBG(SCTP_DEBUG_OUTPUT1, "USR Send complete qo:%d prw:%d unsent:%d tf:%d cooq:%d toqs:%d err:%d\n",
queue_only, stcb->asoc.peers_rwnd, un_sent,
stcb->asoc.total_flight, stcb->asoc.chunks_on_out_queue,
stcb->asoc.total_output_queue_size, error);

Only the log of the thread where the problem occurred is extracted.

@tuexen
Copy link
Member

tuexen commented Feb 24, 2022

@tuexen

The opening of the data channel is done by Chrome. Reliability and order guarantees seem to be in effect.

I am running the following usrsctp system call. usrsctp_sysctl_set_sctp_sendspace(1500*1024); usrsctp_sysctl_set_sctp_add_more_threshold(0); usrsctp_sysctl_set_sctp_system_free_resc_limit(0); usrsctp_sysctl_set_sctp_asoc_free_resc_limit(0); usrsctp_sysctl_set_sctp_nrsack_enable(1);

Also, I have enabled non-blocking and SCTP_NODELAY in socket options.

Does it mean that the sender is a program you have written and is using usrsctp and the receiver is Chrome?

@tuexen
Copy link
Member

tuexen commented Feb 24, 2022

Only the following logs are output.

SCTPDBG(SCTP_DEBUG_OUTPUT1, "USR Send complete qo:%d prw:%d unsent:%d tf:%d cooq:%d toqs:%d err:%d\n",
queue_only, stcb->asoc.peers_rwnd, un_sent,
stcb->asoc.total_flight, stcb->asoc.chunks_on_out_queue,
stcb->asoc.total_output_queue_size, error);

Only the log of the thread where the problem occurred is extracted.

I would be interested in a complete log. If you don't want to post it here, you can send it to me privately: [email protected].

@kiyotsu
Copy link
Author

kiyotsu commented Feb 24, 2022

Does it mean that the sender is a program you have written and is using usrsctp and the receiver is Chrome?

Yes.

Get logs if possible.
This phenomenon is reproduced when the delivery is continued for a long time.
However, time is an indictment: 3 hours, 12 hours, 24 hours over...

The memory usage of the process keeps increasing during the occurrence.
It looks like usrsctp is allocating memory beyond the send buffer (1.5MB).

@kiyotsu
Copy link
Author

kiyotsu commented Feb 25, 2022

@tuexen
Is there a required log level?
Do I only need output?

/* SCTP DEBUG Switch parameters */
#define SCTP_DEBUG_TIMER1	0x00000001
#define SCTP_DEBUG_TIMER2	0x00000002	/* unused */
#define SCTP_DEBUG_TIMER3	0x00000004	/* unused */
#define SCTP_DEBUG_TIMER4	0x00000008
#define SCTP_DEBUG_OUTPUT1	0x00000010
#define SCTP_DEBUG_OUTPUT2	0x00000020
#define SCTP_DEBUG_OUTPUT3	0x00000040
#define SCTP_DEBUG_OUTPUT4	0x00000080
#define SCTP_DEBUG_UTIL1	0x00000100
#define SCTP_DEBUG_UTIL2	0x00000200	/* unused */
#define SCTP_DEBUG_AUTH1	0x00000400
#define SCTP_DEBUG_AUTH2	0x00000800	/* unused */
#define SCTP_DEBUG_INPUT1	0x00001000
#define SCTP_DEBUG_INPUT2	0x00002000
#define SCTP_DEBUG_INPUT3	0x00004000
#define SCTP_DEBUG_INPUT4	0x00008000	/* unused */
#define SCTP_DEBUG_ASCONF1	0x00010000
#define SCTP_DEBUG_ASCONF2	0x00020000
#define SCTP_DEBUG_OUTPUT5	0x00040000	/* unused */
#define SCTP_DEBUG_XXX		0x00080000	/* unused */
#define SCTP_DEBUG_PCB1		0x00100000
#define SCTP_DEBUG_PCB2		0x00200000	/* unused */
#define SCTP_DEBUG_PCB3		0x00400000
#define SCTP_DEBUG_PCB4		0x00800000
#define SCTP_DEBUG_INDATA1	0x01000000
#define SCTP_DEBUG_INDATA2	0x02000000	/* unused */
#define SCTP_DEBUG_INDATA3	0x04000000	/* unused */
3#define SCTP_DEBUG_CRCOFFLOAD	0x08000000	/* unused */
#define SCTP_DEBUG_USRREQ1	0x10000000	/* unused */
#define SCTP_DEBUG_USRREQ2	0x20000000	/* unused */
#define SCTP_DEBUG_PEEL1	0x40000000
#if defined(__Userspace__)
#define SCTP_DEBUG_USR 		0x80000000
#else
#define SCTP_DEBUG_XXXXX	0x80000000	/* unused */
#endif
#define SCTP_DEBUG_ALL		0x7ff3ffff
#define SCTP_DEBUG_NOISY	0x00040000

@kiyotsu
Copy link
Author

kiyotsu commented Feb 28, 2022

@tuexen
hi, I have reproduced it.

The environment of my application is as follows.
・Client PC and 7 P2P connections.
・7 data channel streams as well.

To me, it looks like 2 of the 7 streams stop receiving SACK.
After that, there is a phenomenon that unsent is set to 0.

Sending thread - A: 729FF470
Receiving Thread-A: 6F5BA470

 (5490283): 10:59:08.426262 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 1397
 (5490284): 10:59:08.426284 -1/ -1|[729FF470][USRSCTP] m-c-o put out 0
 (5490285): 10:59:08.426295 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 0 chunks
 (5490286): 10:59:08.426306 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:612838 tf:0 cooq:0 toqs:61283
 (5490287): 10:59:08.426321 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 1462
 (5490288): 10:59:08.426339 -1/ -1|[729FF470][USRSCTP] m-c-o put out 0
 (5490289): 10:59:08.426350 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 0 chunks
 (5490290): 10:59:08.426362 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:614300 tf:0 cooq:0 toqs:61430
 (5490291): 10:59:08.426376 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 792
 (5490292): 10:59:08.426393 -1/ -1|[729FF470][USRSCTP] m-c-o put out 0
 (5490293): 10:59:08.426403 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 0 chunks
 (5490294): 10:59:08.426414 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:615092 tf:0 cooq:0 toqs:61509
 (5490533): 10:59:08.505880 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 55
 (5490534): 10:59:08.505937 -1/ -1|[729FF470][USRSCTP] m-c-o put out 0
 (5490535): 10:59:08.505953 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 0 chunks
 (5490536): 10:59:08.505967 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:55 tf:0 cooq:0 toqs:55 err:0

Sending thread - B: 729CD470
Receiving thread-B: 70737470

 (5509445): 10:59:16.445962 -1/ -1|[729CD470][USRSCTP] Send called addr:(nil) send length 55
 (5509446): 10:59:16.446082 -1/ -1|[729CD470][USRSCTP] m-c-o put out 0
 (5509447): 10:59:16.446109 -1/ -1|[729CD470][USRSCTP] Ok, we have put out 0 chunks
 (5509448): 10:59:16.446125 -1/ -1|[729CD470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:1187690 tf:0 cooq:0 toqs:1187690 err:0
 (5509449): 10:59:16.446148 -1/ -1|[729CD470][USRSCTP] Send called addr:(nil) send length 28
 (5509450): 10:59:16.446168 -1/ -1|[729CD470][USRSCTP] m-c-o put out 0
 (5509451): 10:59:16.446182 -1/ -1|[729CD470][USRSCTP] Ok, we have put out 0 chunks
 (5509452): 10:59:16.446195 -1/ -1|[729CD470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:1187718 tf:0 cooq:0 toqs:1187718 err:0
 (5509453): 10:59:16.446212 -1/ -1|[729CD470][USRSCTP] Send called addr:(nil) send length 176
 (5509454): 10:59:16.446227 -1/ -1|[729CD470][USRSCTP] m-c-o put out 0
 (5509455): 10:59:16.446237 -1/ -1|[729CD470][USRSCTP] Ok, we have put out 0 chunks
 (5509456): 10:59:16.446249 -1/ -1|[729CD470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:1187894 tf:0 cooq:0 toqs:1187894 err:0
 (5509739): 10:59:16.555868 -1/ -1|[729CD470][USRSCTP] Send called addr:(nil) send length 55
 (5509740): 10:59:16.555915 -1/ -1|[729CD470][USRSCTP] m-c-o put out 0
 (5509741): 10:59:16.555930 -1/ -1|[729CD470][USRSCTP] Ok, we have put out 0 chunks
 (5509742): 10:59:16.555942 -1/ -1|[729CD470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:55 tf:0 cooq:0 toqs:55 err:0

The logs are attached.
The "[HEX]" in front of "[USRSCTP]" is the return value of pthread_self().

usrsctp.log

@tuexen
Copy link
Member

tuexen commented Feb 28, 2022

Just to be clear: usrsctp is running on one machine, having 7 SCTP association talking to 7 endpoints in Chrome.

When you write looks like 2 of the 7 streams stop receiving SACK do mean that no SACKs are received from the peer anymore or do you mean the peer still sends SACKs but they are not processed locally anymore?

@kiyotsu
Copy link
Author

kiyotsu commented Feb 28, 2022

I guessed from the log.

Thread 0x6F5BA470 is receiving SACKs until 10:57:26.908609.
However, after that, there is only a log of HEARTBEAT reception.

@tuexen
Copy link
Member

tuexen commented Mar 2, 2022

Some remarks:

On the sending side we have:

(5376663): 10:57:26.906661 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4717568 unsent:120 tf:252 cooq:3 toqs:252 err:0
(5376664): 10:57:26.906677 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 58
(5376665): 10:57:26.906737 -1/ -1|[729FF470][USRSCTP] m-c-o put out 1
(5376666): 10:57:26.906752 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 1 chunks
(5376667): 10:57:26.906763 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4717236 unsent:58 tf:326 cooq:4 toqs:326 err:0

Now a SACK is processed, which acknowledges all outstanding data:

(5376684): 10:57:26.908547 -1/ -1|[6F5BA470][USRSCTP] Ok, Common input processing called, m:0x6f74e228 iphlen:0 offset:12 length:28 stcb:0x6f729ca0
(5376685): 10:57:26.908564 -1/ -1|[6F5BA470][USRSCTP] stcb:0x6f729ca0 state:8
(5376686): 10:57:26.908577 -1/ -1|[6F5BA470][USRSCTP] sctp_process_control: iphlen=0, offset=12, length=28 stcb:0x6f729ca0
(5376687): 10:57:26.908588 -1/ -1|[6F5BA470][USRSCTP] sctp_process_control: processing a chunk type=3, len=16
(5376688): 10:57:26.908598 -1/ -1|[6F5BA470][USRSCTP] SCTP_SACK
(5376689): 10:57:26.908609 -1/ -1|[6F5BA470][USRSCTP] SCTP_SACK process cum_ack:be2bd940 num_seg:0 a_rwnd:4718592
(5376690): 10:57:26.908631 -1/ -1|[6F5BA470][USRSCTP] Timer type 1 stopped: inp=0x6f7295a0, stcb=0x6f729ca0, net=0x6f71d418.
(5376691): 10:57:26.908646 -1/ -1|[6F5BA470][USRSCTP] Check for chunk output prw:4718592 tqe:1 tf=0

That is why we start a retransmission timer on the next send() call:

(5376979): 10:57:27.026291 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 55
(5376983): 10:57:27.026335 -1/ -1|[729FF470][USRSCTP] Timer type 1 started: ticks=1000, inp=0x6f7295a0, stcb=0x6f729ca0, net=0x6f71d418.
(5376989): 10:57:27.026391 -1/ -1|[729FF470][USRSCTP] m-c-o put out 1
(5376990): 10:57:27.026405 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 1 chunks
(5376992): 10:57:27.026417 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4718264 unsent:55 tf:71 cooq:1 toqs:71 err:0

The next two send() calls seems OK:

(5376996): 10:57:27.026434 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 29
(5376999): 10:57:27.026486 -1/ -1|[729FF470][USRSCTP] m-c-o put out 1
(5377000): 10:57:27.026500 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 1 chunks
(5377001): 10:57:27.026512 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4717960 unsent:29 tf:116 cooq:2 toqs:116 err:0
(5377002): 10:57:27.026527 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 120
(5377003): 10:57:27.026579 -1/ -1|[729FF470][USRSCTP] m-c-o put out 1
(5377004): 10:57:27.026591 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 1 chunks
(5377005): 10:57:27.026603 -1/ -1|[729FF470][USRSCTP] USR Send complete qo:0 prw:4717568 unsent:120 tf:252 cooq:3 toqs:252 err:0

The logging of the next send() call starts OK, but is not complete. The final USR Send complete is missing.

(5377006): 10:57:27.026617 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 68
(5377007): 10:57:27.026671 -1/ -1|[729FF470][USRSCTP] m-c-o put out 1
(5377008): 10:57:27.026685 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 1 chunks
(5377047): 10:58:20.815835 -1/ -1|[729FF470][USRSCTP] Send called addr:(nil) send length 55

In addition to that, there is a gap of 53 seconds.

Investigating the gap, we see

(5377008): 10:57:27.026685 -1/ -1|[729FF470][USRSCTP] Ok, we have put out 1 chunks
(5377009): 10:57:27.026689 -1/ -1|[70737470][USRSCTP] RSCTP] Ok, we have put out 4 chunks
(5377010): 10:58:20.790351 -1/ -1|[6DEBA470][USRSCTP] chunk OUTPUT returns

So line 5377009 is incomplete, and there is a 53 seconds gap.

I agree, that things are strange. However, logging seems to stop for 53 seconds, and we have no idea what is happening during that time. When logging starts again, send() calls still happen, but nothing is put on the wire anymore.

What to do next? I would suggest to compile usrsctp with INVARIANTS enabled. This activates some plausibility checks and terminates the program as soon as the first check fails. That might allow us to catch the situation where things go off the road.

You also might to apply

diff --git a/netinet/sctp_output.c b/netinet/sctp_output.c
index d826a01..2ec2a36 100755
--- a/netinet/sctp_output.c
+++ b/netinet/sctp_output.c
@@ -10762,7 +10762,7 @@ do_it_again:
                        }
                        break;
                }
-               SCTPDBG(SCTP_DEBUG_OUTPUT3, "m-c-o put out %d\n", num_out);
+               SCTPDBG(SCTP_DEBUG_OUTPUT3, "m-c-o put out %d, reason %d\n", num_out, reason_code);
 
                tot_out += num_out;
                burst_cnt++;

which possibly gives an insight why no data is put on the wire after the time gap...

Final note: the amount of data buffered in the stack during the time gap could come from the application continuing to call send() at the rate it was doing before the gap. So the question is: what happens such that even the logging does not work for 53 seconds...

@kiyotsu
Copy link
Author

kiyotsu commented Mar 7, 2022

@tuexen
Thanks for the advice.

I have enabled INVARIANTS, recompiled and changed the logs as advised.
This phenomenon has been reproduced, but there is a gap in the log time again.
Please check the logs to be sure.

First, it occurs at 02:59:42.
The immediately preceding log is 02:58:51, so there is a 51 second gap.

usrsctp_22030307_1.log

(5255714): 02:58:51.776210 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 92
(5255715): 02:58:51.776263 -1/ -1|[728A1470][USRSCTP] m-c-o put out 1, reason 5
(5255716): 02:58:51.776274 -1/ -1|[728A1470][USRSCTP] Ok, we have put out 1 chunks
(5255717): 02:58:51.776286 -1/ -1|[728A1470][USRSCTP] USR Send complete qo:0 prw:4717204 unsent:92 tf:360 cooq:4 toqs:360 err:0
(5255988): 02:59:42.375879 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 55
(5255990): 02:59:42.375941 -1/ -1|[728A1470][USRSCTP] m-c-o put out 0, reason 9
(5255991): 02:59:42.375960 -1/ -1|[728A1470][USRSCTP] Ok, we have put out 0 chunks
(5255992): 02:59:42.375975 -1/ -1|[728A1470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:475566 tf:0 cooq:0 toqs:475566 err:0
(5255993): 02:59:42.375995 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 29
(5255994): 02:59:42.376015 -1/ -1|[728A1470][USRSCTP] m-c-o put out 0, reason 9

A short time later, unsent is initialized.

usrsctp_220307_2.log

03:00:32.155886 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 120
03:00:32.155903 -1/ -1|[728A1470][USRSCTP] m-c-o put out 0, reason 9
03:00:32.155913 -1/ -1|[728A1470][USRSCTP] Ok, we have put out 0 chunks
03:00:32.155925 -1/ -1|[728A1470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:1043505 tf:0 cooq:0 toqs:1043505 err:0
03:00:32.155938 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 87
03:00:32.155952 -1/ -1|[728A1470][USRSCTP] m-c-o put out 0, reason 9
03:00:32.155963 -1/ -1|[728A1470][USRSCTP] Ok, we have put out 0 chunks
03:00:32.155973 -1/ -1|[728A1470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:1043592 tf:0 cooq:0 toqs:1043592 err:0
03:00:32.238768 -1/ -1|[6ECBA470][USRSCTP] Ok, Common input processing called, m:0x74dbd348 iphlen:0 offset:12 length:56 stcb:0x74dd9bd0
03:00:32.238788 -1/ -1|[6ECBA470][USRSCTP] stcb:0x74dd9bd0 state:8
03:00:32.238801 -1/ -1|[6ECBA470][USRSCTP] sctp_process_control: iphlen=0, offset=12, length=56 stcb:0x74dd9bd0
03:00:32.238815 -1/ -1|[6ECBA470][USRSCTP] sctp_process_control: processing a chunk type=5, len=44
03:00:32.238826 -1/ -1|[6ECBA470][USRSCTP] SCTP_HEARTBEAT_ACK
03:00:32.238842 -1/ -1|[6ECBA470][USRSCTP] Check for chunk output prw:4718592 tqe:1 tf=0
03:00:32.266012 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 55
03:00:32.266184 -1/ -1|[728A1470][USRSCTP] m-c-o put out 0, reason 9
03:00:32.266672 -1/ -1|[728A1470][USRSCTP] Ok, we have put out 0 chunks
03:00:32.266729 -1/ -1|[728A1470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:55 tf:0 cooq:0 toqs:55 err:0
03:00:32.266786 -1/ -1|[728A1470][USRSCTP] Send called addr:(nil) send length 29
03:00:32.266819 -1/ -1|[728A1470][USRSCTP] m-c-o put out 0, reason 9
03:00:32.266831 -1/ -1|[728A1470][USRSCTP] Ok, we have put out 0 chunks
03:00:32.266843 -1/ -1|[728A1470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:84 tf:0 cooq:0 toqs:84 err:0

If this failure persists, the system will run out of memory.
Is there any way to detect or prevent this condition from happening in my application?

usrsctp_220307_1.log
usrsctp_220307_2.log
usrsctp_220307_3.log

@boivie
Copy link

boivie commented Mar 9, 2022

I just want to add that you can also get Chrome to log more - including a packet trace that can be converted to a PCAP file and opened in Wireshard:

/path/to/chrome --enable-logging=stderr --v=4 2>log.txt

The file can be converted to a pcap (requires text2pcap that comes with Wireshark):

cat log.txt| grep SCTP_PACKET | text2pcap -n -i 132 -D -t '%H:%M:%S.' - sctp.pcap

Obviously, Chrome will run slower with these debug logs enabled...

@kiyotsu
Copy link
Author

kiyotsu commented Mar 29, 2022

@tuexen
We succeeded in logging no gaps.
The thread number where this occurred is 0x75B71470

17:14:59.552463 has been successfully send

(1408127): 17:14:59.552463 -1/ -1|[75B71470][USRSCTP] Send called addr:(nil) send length 40
(1408128): 17:14:59.552515 -1/ -1|[75B71470][USRSCTP] m-c-o put out 1, reason 5
(1408129): 17:14:59.552527 -1/ -1|[75B71470][USRSCTP] Ok, we have put out 1 chunks
(1408130): 17:14:59.552539 -1/ -1|[75B71470][USRSCTP] USR Send complete qo:0 prw:4716625 unsent:40 tf:427 cooq:6 toqs:427 err:0

From the next transmission,tf 0.
Also, the "m-c-o put out" log is not being output after the first transmission when tf 0.

(1408288): 17:14:59.702171 -1/ -1|[75B71470][USRSCTP] Send called addr:(nil) send length 55
(1408289): 17:14:59.702301 -1/ -1|[74FFF470][USRSCTP] Timer type 5 goes off.
(1408290): 17:14:59.702746 -1/ -1|[75B71470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:55 tf:0 cooq:0 toqs:55 err:0
(1408291): 17:14:59.702817 -1/ -1|[75B71470][USRSCTP] Send called addr:(nil) send length 30
(1408292): 17:14:59.702857 -1/ -1|[74FFF470][USRSCTP] Timer type 5 started: ticks=30964, inp=0x72d7a520, stcb=0x72d801a0, net=0x72d80760.
(1408293): 17:14:59.702889 -1/ -1|[75B71470][USRSCTP] m-c-o put out 0, reason 9
(1408294): 17:14:59.702902 -1/ -1|[75B71470][USRSCTP] Ok, we have put out 0 chunks
(1408295): 17:14:59.702915 -1/ -1|[75B71470][USRSCTP] USR Send complete qo:0 prw:4718592 unsent:30 tf:0 cooq:0 toqs:30 err:0

no195_usrsctp.txt

@kiyotsu
Copy link
Author

kiyotsu commented Mar 29, 2022

@boivie
It seems that "--enable-logging=stderr" is not available in Chrome on Windows hosts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants