LtSten wrote:
Right, I've finally added some proper command line arguments to my tool (and done a little bit of optimisation). I've also written a couple of scripts to extract updated word lists and CRC values, rather than doing it manually each time. All the fun stuff is in the attached zip.
Firstly, there are two Python scripts (you'll need
xlrd to run them, however). Both operate on all xls files in the current working directory:
- ExtractWords.py goes through each sheet of each workbook and extracts values from the columns containing "names", and outputs the results to words.csv (which is just one entry per line) - the result of running it on all of machf's xls files as of today gives the attached words.csv.
- ExtractCRCs.py is similar, extracting all name-CRC pairs (where the CRC is the "ID" column - that is, the one that isn't CRCrev), outputting to "crc.csv". I've included the result of this as above.
Now comes the C++ CRCFinder, which is what I've been using to find these CRCs. It's an entirely command-line tool (x64 binary included) and likely requires the latest VC++ runtime
available from here. You can run "CRCFinder.exe --help" to get an idea of what the possible parameters are. An example use is
Code:
CRCFinder.exe -t 8 -d TresWordsTrim.txt -c crc.csv -s "-| | - |" -e "1|01|a" -a 3
CRCFinder.exe --threads 8 --dictionary TresWordsTrim.txt --crc crc.csv --separators "-| | - |" --endings "1|01|a" --appendcount 3
where I've also given the somewhat more understandable long-form version, equivalent to the first, and you should choose the thread count to match the number of threads you have available (or want to use).
As a minimum, you should specify:
- -d / --dictionary, the text file containing one word per line, each of which is used as an allowable word (these should be lowercase)
- -c / --crc, a CSV of crc,text pairs, one per line. Each text string equal to [unknown] is used to populate the list of CRCs to find.
- -s / --separators, a string delimited by vertical pipes (i.e. the "|" character) of separators to place between words
- -e / --endings, as with separators, except these are appended only at the very end (i.e. never between words)
- -a / --appendcount, the maximum number of words to try and stick together. 2 or 3 is a sensible choice - I'd only recommend 4 with a very short word list. For four-word searches, it's often better to specify somethng like --prefix "gun - " and get three words appended to these combinations.
There's also a "--generatewordlist filename" option, which takes a name list generated by ExtractWords.py and splits them up into individual words on spaces and dashes (so the name "GUN - Pistol - Drop - Wood" would get converted into four words: "gun", "pistol", "drop", and "wood"). This is how I generated the attached TresWords.txt. This often needs a little bit of a manual pass (it tries to remove duplicates, those ending in numbers, but a few anneammophrase get through since they end in letters instead) to filter a few remaining irrelevant entries.
Running the CRC finder generates an output file (defaults to "output.csv", but you can override this with --output, and I've included the results of the running the above too) containing matches for CRCs provided in the CRC list. Usually most of these are not correct, but if you've specified a good enough word list (just the right balance between possibilities and final results), then you should have a reasonable number of results to sift through manually. If one looks plausible, it's often worth finding it in machf's spreadsheet and checking what the nearby strings are, since usually it's part of a relevant group.
Goung to look at all that later. Thanks!
Quote:
If you run the above, then there's a few it finds that I think are new, too:
- be298162,"testambient1" (present in PCGamerTPAs, E3TPAs, V55TPAs only - Ambient.tpa). 2720D0D8 is then given by "testambient2".
- 86fa9db4, "spec-car squeak" (present in TresTPAs, DemoTPAs, and Beta 96/97/99/103)
- f22b45c0, "spec-generator motor" (V55, Beta 96, 97, 99, 103)
So... they used "car" instead of "vehicle". I should have gueesed it. It's even simpler.
"Testambient" makes a lot of sense... AND it's also buried among my results for 8 random characters (crackcrc2-8char.txt), if you search for it. It's just that that list is SO long I didn't get to go through all of it yet...
"Generator"... hmmm.. I always thought it was some othe rpiece of the GroThermal Plant...
Quote:
372D5110 and AB624FEB sound like they pair with "spec-car squeak" - I had a very quick go but I haven't been able to crack them yet.
Well, using as a reference the "trailer fall" and "trailer squeak", my guess is that they're some variation of "car fall", then...
(And indeed, judging by your later post, I see so it was)
Quote:
As well as a Trespasser word list, I've included the two other word lists I've been using: commonwords.txt, which has the 10000 most commonly used words in English, and words.txt, which is a huge list of 466550 words. commonwords.txt works best with 2 appendings, and words.txt with just 1.
Lastly, I've included the source code under the Source_CRCFinder_Standalone folder in the zip. It's a Visual Studio (2019) solution, requires reasonable C++20 support (concepts, mostly - I'm building against 16.8.3, I think you'll require at least 16.

and I'd recommend building it in Release for now. The main bits of interest will be Source/CRCFinder/CRCFinder.cpp and Source/CommonCore/CRC32.h.
Going to look at that, as I said.
LtSten wrote:
Ran "spec-car " through using commonwords.txt with 2 appendings. This gave me 372d5110 as "spec-car fall final". I then went and guessed "spec-car fall impact" for ab624feb. These definitely fit the sounds!
See? They follow the same pattern...
Quote:
Update: Combined the Tres list with the common words. c7d81246 seems to be "spec-workstation boot".
8e748be0 (non-retail) could be "spec-water pipes01". I'm not very convinced though - I've checked the sound and it's not obviously correct (at least to me).
Ah, well, "workstation" coming after steam, makes sense, and it WAS something starting,
As for "water pipes"... I'll listen to the sound. It does fit with the previous ones, though...
EDIT: well, it's steaming hot water, after all. Doesn't sound like regular cold water. I'm including it as the name is too fitting not to be it. And that was one of the parts of the GeoThermal Plant for which I was looking for a possible sound... but I was trying things like "production well" (as opposed to "re-injection well").
Quote:
Update 2: 2720d0d8 could be "amb-large forest - test-a" from the E3, PCGamer, and V55 TPAs. This gives "amb-large forest - test-b" for be298162.
Nah, those are most likely the previous ones, "testambient1" and "testambient2" as the earlier builds have several names like that...