Implement an "unmunger" (second attempt) #84

Ronan-H · 2023-03-09T16:13:25Z

For issue #45, and a complete rewrite of my first attempt a few years ago: #55

Instead of splitting by regex, I'm trying out using a trie here. Going character by character, that lets you identify possible substitutions cleanly in linear time. I think it's also more complete in terms of edge cases (ie. where a substitution key is a substring of another sub key)

Example part of the overall trie:

Performance

In order to exhaust all possible substitutions though you do still have to build all combinations out recursively, so I think you can't escape the exponential runtime there. That's why the hard limit on total number of combinations is important. I've noticed though for very long password lengths, if you just copy paste some of the string substitution keys over and over, there's a slowdown. It can take more than a full second to estimate long passwords on my machine. But I suspect the source of the problem might be that this alg is running for every possible substring of the password on top. I'm not fully familiar with the overall structure and intentions of the project but maybe the generation of substrings and this new alg should be merged into one overall alg rather than multiplying eachother, if that makes sense.

MrWook · 2023-04-03T14:34:00Z

@Tostino you mentioned this implementation in the typescript repo.
I converted it to typescript, and it was really slow on my side at the beginning.

With the string "P4$$w0rd" it takes over 1000ms before it was a few ms for the same string.
The difference is that the previous implementation only took the complete translation and not the individual options as in the new implementation.
This way we got 16 translation for the new implementation and only 1 for the old one. This is mainly because the levenshtein matcher is kind of heavy which makes this really slow

I got around this issue a little bit by reversing the finalPasswords array and stop the loop if we got a full password match.
In this case the password is P4$$w0rd and the last entry of the finalPasswords array is password which is direct match in the password dictionary. But this only works for single word passwords.

This whole performance drop doesn't matter for random strings like 4@8({[</369&#!1/|0$5+7%2/4@8({[</369&#!1/|0$5+7%2/" or y9CxMeBod*qx@xoPFaLYE9yk_sJGMY-RdxbiGVjEK9g9!o.97!qENngzN8gbMYgHZD9dEu@[email protected]@jd.!D*hTm7JnVFFxLX but it is getting really bad with real multi word passwords like |-|or5e5+ableB4++ery.
This one is getting 320 translation which will be used for the normal dictionary matcher. I already use the levenshtein distance on a full password only but as the translations are full passwords it will be triggered 320 times.
So the last thing that I did, was using the levenshtein distance only on the first entry of the reverse array from finalPasswords which is a full substituted string. So the levenshtein distance will be only triggered on horsestablebattery.

Those are a few learnings with the new implementation. Maybe this is just the JS implementation, I don't know anything about the java port.
But maybe this will help you a little bit

TLTR:

reverse finalPasswords list => i think it is more likely to find a full substituted string instead of of a string where only the first char was changed.
skip others entries if we already found a full password
For example if we revert the list for P4$$w0rd, the first entry is password which can be found in the password dictionary. If we already have this we don't need to search for others.
use levenshtein distance only on the full substituted entry => with P4$$w0rd only use password

@Ronan-H thanks for this awesome implementation!

Implement an "unmunger" to solve GoSimpleLLC#45

066bd79

Tostino mentioned this pull request Mar 22, 2023

Change return value of feedback.warning when there is no warning given zxcvbn-ts/zxcvbn#148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement an "unmunger" (second attempt) #84

Implement an "unmunger" (second attempt) #84

Ronan-H commented Mar 9, 2023

MrWook commented Apr 3, 2023

Implement an "unmunger" (second attempt) #84

Are you sure you want to change the base?

Implement an "unmunger" (second attempt) #84

Conversation

Ronan-H commented Mar 9, 2023

Performance

MrWook commented Apr 3, 2023