free_electron:
(Regarding the response from
apis)
That has been
described in detail earlier, but a short explanation: “dictionary” in “dictionary password” refers to a class of attacks, not to language corpus/dictionary. Actually a few very short, English words may form a very good password, that is also easy to remember
(1): see diceware.
Note: I’m using generic you below.While
s·log₂n is indeed calculating entropy of something, that “something” is not sequence of translations. Unfortunately copying equations without understanding what they mean is not going to work
.
s·lon₂n is entropy of a sequence of
s randomly(!)
(2) chosen symbols from alphabet of size
n,
under conditions that the probability of chosing each is the same and the choices are independent of each other. That is not the case here. The choice of languages is not random, the probability is not equal and the choices are not independent.
Assuming for a moment, that the language would be chosen randomly and there is no other issues, the equation would be:
log₂((d - 0) · (d-1) · … · (d - s + 1)), where
d is the number of languages,
s is the number of words. That comes from the fact, that each language is used only once, so each word has one language less to choose from. So for 3 words and, let’s say, 4 languages: log₂ (4 · 3 · 2) = log₂ 24 ≈ 4.6 bits. If you would improve the method and allow reuse of languages, it would be
(3) log₂(4³) ≈ 6. But that is lots of work for little gain: for comparison adding a single, short English word provides additional 13 bits. In other words transforming “birdfeedbox” into “oiseaufutterdoos” is worse than doing “birdfeedbox” → “birdfeedbox
cat“.
But that’s not all, because the choices are not having equal probability and are unlikely to be independent
(4). Unless you are a polyglot
(5), that knows many languages very well, you will not be able to easily translate arbitrary word to another language. That limits, what language may be used on each position and hence affects the probabilities. You may try using a dictionary, but then you are introducing more things to memorize. More likely is that you’ll start taking shortcuts, decreasing entropy. The second problem is more subtle and harder to imagine, because everyone of us is nearly sure that we’re chosing symbols randomly. We’re not. Unfortunately there is no valid method for estimating entropy in that case, at least to my knowledge (someone correct me if someone found one). For years there was the famous NIST publication on that matter, but it has been disproved. It was also dependent on how brain processes language, not arbitrary symbols. However, using it as a general reference point and applying it to the proposed method, we end up with an appaling result. For 3 symbols taken
from an alphabet of 94, NIST (over)estimated the entropy as 8 bits. Our dictionary is 23 times smaller, so… um… the ballpark estimate is around 0 bits of entropy. Of course this is probably exaggeration, but it gives some taste of what to expect. From the hypothetical 4.6 bits we’re moving to a much lower value. And this is where the 3–6 bits etimate came from.
And this is only about the translation phase. This is not the only problem. The words you are chosing are not independent. There isn’t many phrases that conform to the proposed scheme. And the large number of choices is everything. I would not be surprised if a single(!) diceware word would perform better.
Of course, as it has been said multiple times in the thread and the reason why passwords managers are recommended, you should always have different passwords for different services. Even the strongest password will be useless if you use it more than once.
You do not need to believe me. Not even any authority. Just try it and experiment yourself! See what happens when you change alphabet, the number of symbols in a password (just remember what a symbol is in a given method), how dependencies between positions affect the outcome etc. If you can, consider looking at some dictionary used for actual attacks — just to get rid of the misconception, what a “dictionary word” is.
This could all be a theoretical, academic dispute, if the cost of applying the right methods would be high. But nowadays having good security is practically costless.
____
(1) Though one should remember as small number of passwords as possible.
(2) Or close enough to be considered random, for example by using a CSPRNG.
(3) Which, BTW, is what
apis has supplied, but in the original form:
log₂(n^s) =
s·log₂(n).
(4) Say “thanks” to how borked human brain is.
(5) Yes, I am making an assumption here. But since I think we’re talking about methods useful for most people and most people can’t even easily speak one foreigh language, the assumption seems justified.