First, let's start by describing what a CRC is. CRC stands for "Cyclic Redundancy Code". In data transmissions, CRCs are used as a way to detect possible errors during the transmission (there are other error detection codes aside from CRCs, though). The CRC for a data packet is calculated at the source and transmitted together with the data packet, then at the receiving end the CRC is recalculated and compared to the transmitted one. If they don't match, it means either the data or the CRC were corrupted during transmission, and they need to be retransmitted. It is possible for an error to go undetected, though, if the data and the CRC are corrupted in such a way that the received CRC matches the CRC recalculated from the received data; the probability of this kind of undetected error happening depends on the CRC polynomial used. The simplest CRC is a single parity bit. Commonly used ones are called CRC-16 and CRC-32, though technically ther are several possible ones. Take a look at this article on Wikipedia for some more details, I don't want to get to deep into it here.
Andres James had determined some years ago that Trespasser used the standard CRC-32 algorithm to calculate hash values for texture and sound names, though he didn't give details. I ran some tests and found out that it was first converting the strings to lowercase, before calculating the standard CRC-32 value.
So, why does Trespasser use CRC-32 hashes to store the sound names, instead of the actual names themselves? The answer appears to be: to save space on the audio data tables (the .tpa files). A CRC-32 value is only 4 bytes (32 bits) long, while apparently, Trespasser would use up to 32 characters for a sound name.
Some sound names themselves appear on the different levels where they are used. But, since Trespasser only stores the CRC-32 values in the tables and not all the sounds from the tables are used in the retail levels, we don't know the actual names of the unused sounds in the table. There have been trial-and-error attempts to find more sound names, which have been successful to some degree, but still several one remained unknown.
That's when I started to analyze the CRC-32 algorithm to see if there was any way to obtain the original string back from the CRC-32 value. At first sight, the answer seemed to be "no", since the CRC-32 algorithm discards data over the process...
Taking the table-driven algorithm, which processes the data 8 bits at a time and uses a 256-entry table of precalculated 32-bit values to obtain the CRC faster, I noticed the topmost 8 bits of each entry in that table were unique. So, if you start from the final CRC value, you can find which table entry was used to obtain it, and as you go back from the last character, you can do the same for a total of 4 entries. After that, you obtain a 4-byte sequence which has the same CRC-32 value you started with - it's not necessarily the original string (unless it was 4 characters long), but it can be used in its place. Of course, if they aren't 'printable' characters, they won't bee too useful for our Trespasser-specific purpose, but never mind... we'll deal with that later.
The 'lookback' process can be made faster by using a second table, instead of having to look through the entries on the original table for one with the matching upper 8 bits. Here's some JavaScript code that generates both tables simultaneously:
Code: Select all
crc_32_poly=0xedb88320;
crc_table = new Array(256);
crcRev_table = new Array(256);
for (var i=0; i<256; i++) {
crc=i;
for (j=8; j>0; j--) {
if ((crc & 1) == 1)
crc = (crc >>> 1) ^ crc_32_poly;
else
crc >>>= 1;
}
crc_table[i]=crc;
j = crc_table[i] >>> 24;
crcRev_table[j] = i + ((crc_table[i] & 0x00ffffff) << 8);
}
http://www.woodmann.com/fravia/crctut1.htm
Basically, he's arriving at the same conclusions I did, although he implements a somewhat different calculation algorithm. So, in order to find a string of more than 4 bytes, you'd have to resort to cycling through possible charcater combinations and checking the resulting CRC-32 value. More on that later.
What if you know (or suspect, rather) the last N characters of the original string? Well, you can go back from the last one and obtain a new CRC-32 value corresponding to the original string minus those N final characters, basically following the same operations used to calculate the 4-byte sequence mentioned previously. Here's the function I wrote in C++ to handdle both at the same time:
Code: Select all
unsigned long CRC32rev(wxString anyTextString, unsigned long Crc_value) {
int StringLength=anyTextString.Length();
for (int i=0; i<StringLength+4; i++) {
Crc_value = Crc_value << 8 ^ crc32rev_table[Crc_value >> 24] ^ ((i<StringLength)?anyTextString.GetChar(StringLength - i -1):0);
}
return Crc_value;
}
Similarly, what if you know (suspect) the first M characters of the original string? Well, let's see. Usually, the standard CRC-32 algorithm is described as having an initial value of 0xFFFFFFFF, a polynomial of 0x04C11DB7 (0xEDB88320 if reflected) and a final XOR value of 0xFFFFFFFF. But I prefer to see it as an initial value of 0x00000000, and an initial and final XOR value of 0xFFFFFFFF. Why? Because then you can concatenate CRC calculation blocks nicely. Since applying two concatenated CRC operations will have a final and initial XOR value of 0xFFFFFFFF, they cancel each other, and the same happens with any further blocks. This means you can omit the XOR operations and just make it so each consecutive block is getting the previous block's result as it initial value - you only need to XOR the first block's initial value with 0xFFFFFFFF and the result of the last block likewise. So, here's my C++ function for that:
Code: Select all
unsigned long CRC32(wxString anyTextString, unsigned long Crc_value) {
int StringLength=anyTextString.Length();
for (int i=0; i<StringLength; i++) {
Crc_value = Crc_value >> 8 ^ crc32_table[(anyTextString.GetChar(i) ^ Crc_value) & 0xFF];
}
return Crc_value;
}
Here are the functions used to check the input data (leading and trailing strings) and perform the described operations correspondingly:
Code: Select all
/*
* LeadStringUpdateUI
*/
void wxCRCrevNonFrm::LeadStringUpdateUI(wxUpdateUIEvent& event)
{
//
// calculate the CRC-32 for the initial string
//
wxString anyTextString=LeadString->GetValue();
crc0 = CRC32(anyTextString,0xFFFFFFFF);
CRC_buf[0]=crc0;
revCRC();
}
/*
* TrailStringUpdateUI
*/
void wxCRCrevNonFrm::TrailStringUpdateUI(wxUpdateUIEvent& event)
{
//
// calculate the CRC-32 without the final string
//
wxString anyTextString=TrailString->GetValue();
crc1=CRC32rev(anyTextString,~hex2dec(CRCinput->GetValue()));
revCRC();
}
Code: Select all
void wxCRCrevNonFrm::revCRC(void) {
unsigned long crc;
//
// find the 4-byte string
//
crc = crc0 ^ crc1;
//
// display the results
//
wxString CRCstring=dec2hex(crc);
std::string sOutput;
Code1->ChangeValue(CRCstring.Mid(6,2));
Code2->ChangeValue(CRCstring.Mid(4,2));
Code3->ChangeValue(CRCstring.Mid(2,2));
Code4->ChangeValue(CRCstring.Mid(0,2));
sOutput+=(crc&0xff);
sOutput+=((crc&0xff00)>>8);
sOutput+=((crc&0xff0000)>>16);
sOutput+=((crc&0xff000000)>>24);
Code->ChangeValue(sOutput);
}
So, this covers the initial part of my CRC-32 reversing program. I'll add the details and story of the rest in a later post, this one is already too long and I'm getting tired of typing.