Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Division-by-Zero and Memory Allocation Error #53

Open
spearo2 opened this issue Nov 28, 2024 · 1 comment
Open

Potential Division-by-Zero and Memory Allocation Error #53

spearo2 opened this issue Nov 28, 2024 · 1 comment

Comments

@spearo2
Copy link

spearo2 commented Nov 28, 2024

Dear authors,
It seems that there exists potential division by zero error in vector_quantize function in src/programs/cdcn_train/vq.c in line 284.

Update_codebook(bin, vector, Nvecs, codes, Ncodes, Ndim);
distortion =
Distortion_and_cluster(bin, vector, Nvecs, codes,
Ncodes, Ndim);
improvement = (prevdist - distortion) / distortion;

When an arbitrary input comes from

corpus_get_generic_featurevec(&buff, &length, Ndim);

Distortion_and_cluster function can return 0

distortion = 0;
for (i = 0; i < Nvecs; ++i) {
mindist = 1.0e+32;
nearest = 0;
mindist =
prune_distance(vector[i], codes[0], Ndim, mindist);
for (j = 1; j < Ncodes; ++j) {
codedist =
prune_distance(vector[i], codes[j], Ndim,
mindist);
if (codedist < mindist) {
mindist = codedist;
nearest = j;
}
}
bins[i] = nearest;
distortion += mindist;

t = *x++ - *y++;
dist = t * t;
for (i = 1; i < Ndim; ++i) {
t = *x++ - *y++;
dist += t * t;
if (dist > mindist)
break;
}
return (dist);

The scenarios could be when input is homogeneous or when the number of codewords matches unique vectors.

A simple possible patch would be

+    if (! distortion) {
+      printf("division by zero: ...");
+      exit(...);
+    }
284    improvement = (prevdist - distortion) / distortion;

Also, there exists a potential memory allocation error causing the program to crash
at function areadfloat_part in sphinxtrain/src/libs/libio/s3io.c

When an arbitrary value r_len is given as an argument withsizeof(float) to calloc, it can return NULL

r_buf = calloc(r_len, sizeof(float));

When an arbitrary value for cur_ctl_sf comes from

if (fgets(li->buf + li->len, li->bsiz - li->len, li->fh) == NULL) {

li = lineiter_start_clean(ctl_fp);
if (li == NULL) {
E_ERROR("Must be at least one line in the control file\n");
return S3_ERROR;
}
parse_ctl_line(li->buf,
&next_ctl_path,

cur_ctl_path = next_ctl_path;

The following trace can cause the program to crash

ret = areadfloat_part(mk_filename(DATA_TYPE_MFCC, cur_ctl_path),
cur_ctl_sf * veclen,
(cur_ctl_ef + 1) * veclen - 1,
cptr, (int *)&n_c);

r_len = e_coeff - s_coeff + 1;

r_buf = calloc(r_len, sizeof(float));

if (fread(r_buf, sizeof(float), r_len, fp) != r_len) {

A simple possible patch would be

600  r_buf = calloc(r_len, sizeof(float)); 
+       if (r_len > INT_MAX / sizeof(float)) {
+           return -1;
+       }

These error was discovered by static analysis results

@mbait
Copy link
Contributor

mbait commented Dec 3, 2024

Thank you for your report. Divising by zero is not a thing for floating-point computations, as in the worst case you'll get Inf which means you fed some garbage to the func. As for the second part of the issue - if calloc() failed then you're out of memory and even if you catch that error, the program will crash soon anyways as the graceful shutdown hasn't been implemented in CMUSphinx in general. You're, however, are welcome to make a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants