Draconisaurus wrote:
Alright so I'm still confused, how do you choose what number of random characters to use? Since for example in this case we have no idea how many characters come in-between... If you select 3, doesn't that mean all your results would be SPEC-@@@RUN"? Should a high number like 9 be selected, allowing for a range of 0 to 9 in-between characters, for example?
Actually, if you choose 3, it would be "SPEC-******* RUN" (you forgot the space at the beginning of " RUN"), as there are 4 characters which are automatically calculated aside form the 3 random ones tested...
The thing is: for each and every 32-bit (8 bits per byte, 4 bytes) CRC-32 value, there is exactly one 32-bit (4 bytes) "string" that matches it (that is, if you calculate the CRC-32 hash for that 4-byte "string", you'll get that exact unique value). Those 4 bytes are the ones displayed in the 4 text boxes next to the label "ASCII".
But we are interested only in those combinations which match the valid characters. What does that mean? Well, for starters, any byte values over 127 (128 to 255) aren't standard, printable ASCII characters, so we ignore them. Then, from the remaining group from 0 to 127, Trespasser ignores uppercase letters (because it first converts everything to lowercase) and some other characters, so the ranges 0-31 (non-printable printer control characters, mostly) and 64-95 (uppercase letters and some more, including the @ symbol, [, ], \, ^ and _) are discarded too. That leaves us with 64 valid characters of the possible 256 values for a single byte, and that's the reason why not always there's a text visible in the box labeled "String" below the 4 "ASCII" ones: only when those 4 values are in the valid range, the string will be displayed. And that's 64/256 = 1/4th chance of happening for a single character, or (1/4)^4 = 1/256th chance for 4 characters in a row. CRC-32 values like 2FE21360
(corresponding to the 4-character string "va16"), C9508DDA ("vh10"), 3546BF77 ("bone") or E4C41E16 ("amb1") will display results immediately, without needing to go testing through combinations of random characters.
Now let's say we add a single character in front of our 4-byte calculated string, testing for 64 different alternatives (actually, we are only using 41 possible characters by default, but let's assume it's 64 to simplify things), then our 1/256th chance becomes [(64/256)^4]*64, or 1/4th. With 2 additional characters to test, each one cycling through 64 possibilities, it becomes [(64/256)^4]*(64^2) = 16, which mean we will find exactly 16 valid matching strings. With 3, it becomes [(64/256)^4]*(64^3) = 1024, meaning we'll get a list of 1024 possible matching strings, each one 7 characters long. And so on.
The discrepancies in the results are due to the fact that we are testing not through 64 possible characters for each position, but a lower number (41 by default, as I mentioned, or less if we further limit the allowed characters in the "Charset" text box). With 1 character we get [(64/256)^4]*41 or ~0.16 = 16.016% chance; with 2 characters, [(64/256)^4]*(41^2) or ~6.566 = 6 strings plus 56.641% chance of a 7th; with 3 characters, [(64/256)^4]*(41^3) or ~269.22 (meaning we should get at least 269 results, with a 22.27% chance of a 270th), etc. With each character added, the possible valid results increase more or less tenfold.
There may be an error somewhere in my math, but that's more or less it...