Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The diarization and word assignment to the speakers are different with every run. Can anyone know how to solve this? #245

Open
IGaganpreetSingh opened this issue Oct 11, 2024 · 4 comments

Comments

@IGaganpreetSingh
Copy link

For example I got these transcriptions with 3 runs on same audio file:

1st run:

Speaker 1: In case number one, two, three, eight, or ten, who should be occurring in the general court number two today, the fourth day of September 2020, before the general magistrate for the state, Mr. . However, Mr. is standing in while . This is the matter that was on the road for the purposes of application for leave to appeal. My colleague, Mr. Chuma, was ready to proceed. However, the court had noted that if the matter is not proceeding, the court I am of the opinion that no need to proceed with the inquiry unless that was not foreseeable.

Speaker 0: Thank you. Advocate?

Speaker 2: Well, these staters just found out themselves now. Yes, of the AVR, yes. Well, we can find out again. Call the official.

Speaker 1: Yes. Who?

Speaker 2: Give him the full names.

Speaker 1: Oh, yeah. He's not in our list, sorry. He's not in quarantine, but he is not in our list. He's what? He is not on quarantine. But he just said he's on quarantine.

Speaker 2: Yes, he did.

Speaker 1: He is not on our list. No, we were off. Yesterday we were off. They didn't prepare our job.

Speaker 0: But now, when I'm looking on the list, he's not on our list. Sorry.

Speaker 2: But ask him, did he not just say that he pleased?

Speaker 1: But did he not say earlier that he was on quarantine? I did so, but now when I am checking on the list here, those who prepared the recording didn't do their proper handover.

Speaker 2: Can we just see who's speaking, please?

Speaker 1: Yes.


2nd run:

Speaker 1: In case number 1, 2, 3, 8, or 26, who should be occurring in the general court number 2 today, the 4th day of September 2020, before the general magistrate, Mr. Zeke Richner, for the State, Mr. Bonan Naftali Mukuma, however, Mr. Takalani Mukuma is standing in, while Advocate Bowen represents that case. This is the matter that was on the road for the purposes of application for leave to appeal. My colleague, Mr. Chuma, was ready to proceed. However, the court had noted that if the matter is not proceeding, the court I am of the opinion that no need to proceed with inquiries unless that's what's not foreseeable. Thank you. Advocates?

Speaker 2: Well, these staters just found out themselves now. Yes, of the AVR, yes. Well, we can find out again. Call the official.

Speaker 1: Yes. Who?

Speaker 2: Give him the full names.

Speaker 1: Oh, yeah. He's not in our list, sorry. He's not on quarantine, but he is not in our list. He is what? He is not on quarantine. There is a problem. But he just said he's on quarantine.

Speaker 2: Yes, he did.

Speaker 1: He is not on our list. No, we were off. Yesterday we were off. They didn't prepare our job. But now, when I'm looking on the list, he's not on our list. Sorry.

Speaker 2: But ask him, did he not just say...

Speaker 1: But did he not say earlier that he was on quarantine? I did so, but now the... When I am checking on the list here, those who prepared the recording didn't do the proper handover.

Speaker 2: Can we just see who's speaking, please?

Speaker 1: Yes, can we just...


3rd Run:

Speaker 1: In case number one, two, three, eight, or twenty-six individuals who should be occurring in the general court number two. today, the fourth day of September 2020, before the general court magistrate, Mr. Zeke Richner, for the State, Mr. Bonan Naftali Mukuma, however, Mr. Takalani Mukuma is standing in while advocating. Bowen, represent that. This is the matter that was on the road for the purposes of application for leave to appeal. My colleague, Mr. Chuma, was ready to proceed. However, the court had noted that if the matter is not proceeding, the court I am of the opinion that no need to proceed with the inquiry unless that was not foreseeable. Thank you. Advocate? It's the first word that we heard about this problem. It wasn't conveyed to us earlier. I heard it now for the first time.

Speaker 2: Well, these staters just found out themselves now. Yes, of the AVR, yes. Well, we can find out again. Call the official.

Speaker 1: Yes. Who?

Speaker 2: Give him the full names.

Speaker 1: Oh, yeah. He's not in our list, sorry. He's not on quarantine, but he is not in our list. He's what? He is not on quarantine. But he just said he's on quarantine.

Speaker 2: Yes, he did.

Speaker 1: He is not on our list. No, we were off. Yesterday we were off. They didn't prepare our job. But now, when I'm looking on the list, he's not on our list. Sorry.

Speaker 2: But ask him, did he not just say?

Speaker 1: Did he not say earlier that he was on quarantine? I did so, but now when I am checking on the list here, those who prepared the recording didn't do the proper handover.

Speaker 2: Can we just see who's speaking, please?

Speaker 1: Yes.

@MahmoudAshraf97
Copy link
Owner

the diarization contains clustering which is not deterministic, set the seed to a fixed number and see if it helps

@IGaganpreetSingh
Copy link
Author

Thanks for your reply @MahmoudAshraf97, but can you elaborate, where to set the seed value to fixed number. I am just newbie

@MahmoudAshraf97
Copy link
Owner

add this right after the imports in diarize.py

import random
random.seed(0)

@IGaganpreetSingh
Copy link
Author

thanks for your response. I tired this as you said but it not helping to solve issue. still same random outputs on every run, even speaker identification.
sometimes it catches there are 2 speakers sometimes 3 (actual speakers are 3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants