Page 1 of 1

Reversing CRC-32 values

Posted: Fri Sep 21, 2007 8:21 pm
by machf
So, you may have already seen my app for reversing the CRC-32 values used for the sound names in Trespasser. Here's a description of the algorithms used on it and how they work.

First, let's start by describing what a CRC is. CRC stands for "Cyclic Redundancy Code". In data transmissions, CRCs are used as a way to detect possible errors during the transmission (there are other error detection codes aside from CRCs, though). The CRC for a data packet is calculated at the source and transmitted together with the data packet, then at the receiving end the CRC is recalculated and compared to the transmitted one. If they don't match, it means either the data or the CRC were corrupted during transmission, and they need to be retransmitted. It is possible for an error to go undetected, though, if the data and the CRC are corrupted in such a way that the received CRC matches the CRC recalculated from the received data; the probability of this kind of undetected error happening depends on the CRC polynomial used. The simplest CRC is a single parity bit. Commonly used ones are called CRC-16 and CRC-32, though technically ther are several possible ones. Take a look at this article on Wikipedia for some more details, I don't want to get to deep into it here.

Andres James had determined some years ago that Trespasser used the standard CRC-32 algorithm to calculate hash values for texture and sound names, though he didn't give details. I ran some tests and found out that it was first converting the strings to lowercase, before calculating the standard CRC-32 value.

So, why does Trespasser use CRC-32 hashes to store the sound names, instead of the actual names themselves? The answer appears to be: to save space on the audio data tables (the .tpa files). A CRC-32 value is only 4 bytes (32 bits) long, while apparently, Trespasser would use up to 32 characters for a sound name.

Some sound names themselves appear on the different levels where they are used. But, since Trespasser only stores the CRC-32 values in the tables and not all the sounds from the tables are used in the retail levels, we don't know the actual names of the unused sounds in the table. There have been trial-and-error attempts to find more sound names, which have been successful to some degree, but still several one remained unknown.

That's when I started to analyze the CRC-32 algorithm to see if there was any way to obtain the original string back from the CRC-32 value. At first sight, the answer seemed to be "no", since the CRC-32 algorithm discards data over the process...

Taking the table-driven algorithm, which processes the data 8 bits at a time and uses a 256-entry table of precalculated 32-bit values to obtain the CRC faster, I noticed the topmost 8 bits of each entry in that table were unique. So, if you start from the final CRC value, you can find which table entry was used to obtain it, and as you go back from the last character, you can do the same for a total of 4 entries. After that, you obtain a 4-byte sequence which has the same CRC-32 value you started with - it's not necessarily the original string (unless it was 4 characters long), but it can be used in its place. Of course, if they aren't 'printable' characters, they won't bee too useful for our Trespasser-specific purpose, but never mind... we'll deal with that later.

The 'lookback' process can be made faster by using a second table, instead of having to look through the entries on the original table for one with the matching upper 8 bits. Here's some JavaScript code that generates both tables simultaneously:

Code: Select all

crc_32_poly=0xedb88320;
crc_table = new Array(256);
crcRev_table = new Array(256);
for (var i=0; i<256; i++) {
	crc=i;
	for (j=8; j>0; j--) {
		if ((crc & 1) == 1)
			crc = (crc >>> 1) ^ crc_32_poly;
		else
			crc >>>= 1;
	}
	crc_table[i]=crc;
	j = crc_table[i] >>> 24;
	crcRev_table[j] = i + ((crc_table[i] & 0x00ffffff) << 8);
}
I searched the Internet for possible alternative methods that would allow me to find a sequence of more than 4 bytes. What I found was this article:

http://www.woodmann.com/fravia/crctut1.htm

Basically, he's arriving at the same conclusions I did, although he implements a somewhat different calculation algorithm. So, in order to find a string of more than 4 bytes, you'd have to resort to cycling through possible charcater combinations and checking the resulting CRC-32 value. More on that later.

What if you know (or suspect, rather) the last N characters of the original string? Well, you can go back from the last one and obtain a new CRC-32 value corresponding to the original string minus those N final characters, basically following the same operations used to calculate the 4-byte sequence mentioned previously. Here's the function I wrote in C++ to handdle both at the same time:

Code: Select all

unsigned long CRC32rev(wxString anyTextString, unsigned long Crc_value) {
	int StringLength=anyTextString.Length();
	for (int i=0; i<StringLength+4; i++) {
		Crc_value = Crc_value << 8 ^ crc32rev_table[Crc_value >> 24] ^ ((i<StringLength)?anyTextString.GetChar(StringLength - i -1):0);
	}
    return Crc_value;
}
It takes two parameters, a string (the known N last characters) and the known CRC-32 values, and returns a new 4-byte sequence which has that CRC-32 value when added in front of the given string.

Similarly, what if you know (suspect) the first M characters of the original string? Well, let's see. Usually, the standard CRC-32 algorithm is described as having an initial value of 0xFFFFFFFF, a polynomial of 0x04C11DB7 (0xEDB88320 if reflected) and a final XOR value of 0xFFFFFFFF. But I prefer to see it as an initial value of 0x00000000, and an initial and final XOR value of 0xFFFFFFFF. Why? Because then you can concatenate CRC calculation blocks nicely. Since applying two concatenated CRC operations will have a final and initial XOR value of 0xFFFFFFFF, they cancel each other, and the same happens with any further blocks. This means you can omit the XOR operations and just make it so each consecutive block is getting the previous block's result as it initial value - you only need to XOR the first block's initial value with 0xFFFFFFFF and the result of the last block likewise. So, here's my C++ function for that:

Code: Select all

unsigned long CRC32(wxString anyTextString, unsigned long Crc_value) {
int StringLength=anyTextString.Length();
for (int i=0; i<StringLength; i++) {
                Crc_value = Crc_value >> 8 ^ crc32_table[(anyTextString.GetChar(i) ^ Crc_value) & 0xFF];
}
return Crc_value;
}
It also takes two parameters, a string (the known M first characters) and an initial value, and returns a new CRC-32 value which would be the initial value for the given CRC-32 if the given string was separated from the rest.

Here are the functions used to check the input data (leading and trailing strings) and perform the described operations correspondingly:

Code: Select all

/*
 * LeadStringUpdateUI
 */
void wxCRCrevNonFrm::LeadStringUpdateUI(wxUpdateUIEvent& event)
{
//
// calculate the CRC-32 for the initial string
//
    wxString anyTextString=LeadString->GetValue();
    crc0 = CRC32(anyTextString,0xFFFFFFFF);
    CRC_buf[0]=crc0;
    revCRC();
}

/*
 * TrailStringUpdateUI
 */
void wxCRCrevNonFrm::TrailStringUpdateUI(wxUpdateUIEvent& event)
{
//
// calculate the CRC-32 without the final string
//
	wxString anyTextString=TrailString->GetValue();
    crc1=CRC32rev(anyTextString,~hex2dec(CRCinput->GetValue()));
    revCRC();
}
The function they both call at the end is this one:

Code: Select all

void wxCRCrevNonFrm::revCRC(void) {
unsigned long crc;
//
// find the 4-byte string
//
	crc = crc0 ^ crc1;
//
// display the results
//
	wxString CRCstring=dec2hex(crc);
	std::string sOutput;
	Code1->ChangeValue(CRCstring.Mid(6,2));
	Code2->ChangeValue(CRCstring.Mid(4,2));
	Code3->ChangeValue(CRCstring.Mid(2,2));
	Code4->ChangeValue(CRCstring.Mid(0,2));
	sOutput+=(crc&0xff);
    sOutput+=((crc&0xff00)>>8);
    sOutput+=((crc&0xff0000)>>16);
    sOutput+=((crc&0xff000000)>>24);
	Code->ChangeValue(sOutput);
}
It first XORs the results of both of them, and then displays the resulting 4-byte sequence as both 4 bytes in hexadecimal and a set of 4 characters. That's the 4-byte sequence we obtain that gives us the desired CRC-32 value.

So, this covers the initial part of my CRC-32 reversing program. I'll add the details and story of the rest in a later post, this one is already too long and I'm getting tired of typing.

Posted: Fri Sep 21, 2007 11:24 pm
by machf
As I said previously, you can only find a 4-byte sequence the way I've described so far. If you want a longer string (for instance, if the 4-byte sequence contains 'non-printable' characters), you have no choice but to cycle through all possible character combinations and test each one's CRC value for a match.
(Although right now I'm thinking of a possible way of finding not 4 but 8 bytes. Or 12, or 16... more on that later.)

Performing a search on the Internet, I came across this little program:

http://w-nz.com/projects/reversecrc/
http://files.seriouszone.com/download.p ... -win32.zip

which originated from here:

http://blog.w-nz.com/archives/2005/07/15/reversing-crc/
http://blog.w-nz.com/archives/2005/07/3 ... rsing-crc/

What it does is brute-force a given number of characters, and calculate a further 4 ones in the way already described. It was used for a very similar purpose, finding matching strings for CRC values from some game (forgot which right now)... but it didn't allow you to indicate a leading or trailing string, so you have to cycle through the characters for the entire string even if you know some of them. Beyond 6 random characters, it starts to take too long for practical purposes.

So, I decided to go ahead and integrate a looping function into my program. I designed a recursive function which would go through all the 'random' characters, increasing the last one progressively until it gets back to the first character in the valid character set (for Trespasser, that character set apparently comprises the lowercase letters a-z, the numbers 0-9, and a few characters more, for a total of 41), then increasing the second-to-last character once and cycling through all possibles for the last one again, increasing the third-to-last once the second has looped through all possible characters, etc. (BTW, that other program does it in the opposite order, which makes the results a bit confusing... my way allows to have them sorted alphabetically, which provides better readability IMO, and also allowe a further improvement which I'll mention later). The function looks like this, after some further additions I'll explain later:

Code: Select all

void wxCRCrevRecFrm::perm(unsigned int z) {
unsigned long crc;
unsigned long b;
wxString anyTextString;

	for (b=ValidChars.Find(TestString.GetChar(z)); b<numChars; b++) {
			TestString.SetChar(z,ValidChars.GetChar(b));
            CRC_buf[z+1]=CRC32((z==LChars-1)?TestString.GetChar(z)+MidString:TestString.GetChar(z),CRC_buf[z]);
		if (z<RNDLength-1) {
			perm(z+1);
		} else {
			crc = crc1 ^ CRC_buf[z+1];
	        if ((ValidChars.Find(crc&0xff)>= 0) & (ValidChars.Find((crc&0xff00)>>8)>= 0) &(ValidChars.Find((crc&0xff0000)>>16)>= 0) &(ValidChars.Find((crc&0xff000000)>>24)>= 0)){
            anyTextString=TestString.Left(LChars)+MidString+TestString.Right(RChars);
				fputs(anyTextString,fp);
            	fputc(crc&0xff,fp);
                fputc((crc&0xff00)>>8,fp);
                fputc((crc&0xff0000)>>16,fp);
                fputc((crc&0xff000000)>>24,fp);
				fputs("\n",fp);
            }
        }
            if (TestString==EndString) return;
    }
			TestString.SetChar(z,ValidChars.GetChar(0));
    return;
}
I also decided to alternatively implement a non-recursive loop that would do the same as the recursive function. I thought I could use a single counter and modulo arithmetics to change the characters of the 'random' string based on the number of valid characters, but the numbers involved would have been to big for the datatypes available in C++. SO, in the end, I used a "while" loop structure very similar to the one used by the reversecrc program (but again, in the opposite direction):

Code: Select all

//
// start the non-recursive loop to go through the random characters
//
    while (true) {
            z=RNDLength-1;
			crc = crc1 ^ CRC_buf[z+1];
	        if ((ValidChars.Find(crc&0xff)>= 0) & (ValidChars.Find((crc&0xff00)>>8)>= 0) &(ValidChars.Find((crc&0xff0000)>>16)>= 0) &(ValidChars.Find((crc&0xff000000)>>24)>= 0)){
            anyTextString=TestString.Left(LChars)+MidString+TestString.Right(RChars);
				fputs(anyTextString,fp);
            	fputc(crc&0xff,fp);
                fputc((crc&0xff00)>>8,fp);
                fputc((crc&0xff0000)>>16,fp);
                fputc((crc&0xff000000)>>24,fp);
				fputs("\n",fp);
            }
            if (TestString==EndString) break;
            do {
                b=ValidChars.Find(TestString.GetChar(z));
                b++;
                b = b % numChars;
                TestString.SetChar(z,ValidChars.GetChar(b));
                for (i=z; i<RNDLength; i++) {
                    CRC_buf[i+1]=CRC32((i==LChars-1)?(TestString.GetChar(i)+MidString):TestString.GetChar(i),CRC_buf[i]);
                }
                z--;
            } while ((b==0) && (z>=0) );

    }
And the function that allows you to change the character set you ant to loop though is this one:

Code: Select all

/*
 * CharSetUpdateUI
 */
void wxCRCrevRecFrm::CharSetUpdateUI(wxUpdateUIEvent& event)
{
	if (ValidChars!=CharSet->GetValue()) {
    ValidChars=CharSet->GetValue();
	numChars=ValidChars.Length();
    reCalcStrings();
    reCalcCRCs();
    }
}
I also added a few buttons that would allow you to easily select some predetermined charcter sets, but those are unimportant. Here are the corresponding functions, anyway:

Code: Select all

/*
 * fullAlphaNumCharSetClick
 */
void wxCRCrevRecFrm::fullAlphaNumCharSetClick(wxCommandEvent& event)
{
CharSet->ChangeValue("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
}

/*
 * AlphaNumCharSetClick
 */
void wxCRCrevRecFrm::AlphaNumCharSetClick(wxCommandEvent& event)
{
CharSet->ChangeValue("0123456789abcdefghijklmnopqrstuvwxyz");
}

/*
 * AlphaCharSetClick
 */
void wxCRCrevRecFrm::AlphaCharSetClick(wxCommandEvent& event)
{
CharSet->ChangeValue("abcdefghijklmnopqrstuvwxyz");
}
So, now I had what I wanted, a program that would allow me to indicate a leading string, a trailing string, and a number of 'random' characters to be brute-forced (it has no point that said number were 0, as then we'd just have the previous case where we only calculated the 4-byte sequence). But then I realized it could be further improved, if we knew a part in the middle of the string we were looking for. Particularly, I had come across a number of entries which had a common string before the 4 final characters. So, I added the possibility to indicate a "middle string", a known string that would go between the 'random' characters and the calculated 4-byte sequence. Now you see how it's so useful to be able to concatenate CRC-32 calculations? But also, there was another series of entries which had a common string before the 6 final characters, not 4. So, I made a further addition, where you could indicate how many of the 'random' characters go AFTER the "middle string". So, if you have indicated X random characters and theen indicate that R should go after (or to the right) of the "middle string", you'd have L=X-R before (or to the left) of it. That's what this line is about:

Code: Select all

anyTextString=TestString.Left(LChars)+MidString+TestString.Right(RChars);
The functions which read both the middle string, total number of 'random' characters and number off characters to the 'right' of the middle string are these:

Code: Select all

/*
 * MidString2UpdateUI
 */
void wxCRCrevRecFrm::MidString2UpdateUI(wxUpdateUIEvent& event)
{
	MidString=MidString2->GetValue();
    reCalcCRCs();
}

/*
 * RNDcharsUpdateUI
 */
void wxCRCrevRecFrm::RNDcharsUpdateUI(wxUpdateUIEvent& event)
{
    RNDLength=wxAtoi(RNDchars->GetValue());
	RChars=wxAtoi(RightChars->GetValue());
	LChars=RNDLength-RChars;
	if (TestString.Length()!=RNDLength || EndString.Length()!=RNDLength) {
    reCalcStrings();
    reCalcCRCs();
    }
}
You surely noticed tha call to the reCalcStrings() function. That one initializes the 'test' string (the one that's going to havee its characters looped through all possible combinations) and the 'end' string, against which the 'test' string is compared to see if the loop is finished (the recursive version didn't really need the 'end' string originally, but it was very convenient to have it anyway as you'll see when i explain some further changes I made later). So, this is it:

Code: Select all

void wxCRCrevRecFrm::reCalcStrings(void) {
unsigned int i;
//
// prepare everything to start going through the random characters
//
    TestString="";
    EndString="";
    for (i=0; i<RNDLength; i++) {
	   TestString+=ValidChars.GetChar(0);
	   EndString+=ValidChars.GetChar(numChars-1);
    }
    if (StartString->GetValue()!=TestString) StartString->SetValue(TestString);
    if (FinalString->GetValue()!=EndString) FinalString->SetValue(EndString);
}
I'll explain those two "if" sentences later.

You've also noticed the call to the reCalcCRC() function. That's another idea I had to further optimize the speed of the calculations. As I already mentioned, it's possible to concatenate CRC calculations. And since we are looping the last character while the previous remain unchanged... why not precalculate the initial values for each position of the 'random' string and store them in a buffer instead of calculating the CRC for the whole string each time a character is changed? That way, when a character is changed, we only need to recalculate the initial values for the following charcaters in the 'random' string. Here is the function:

Code: Select all

void wxCRCrevRecFrm::reCalcCRCs(void) {
unsigned int i;
//
// prepare everything to start going through the random characters
//
    for (i=0; i<RNDLength; i++) {
       CRC_buf[i+1]=CRC32((i==LChars-1)?(TestString.GetChar(i)+MidString):TestString.GetChar(i),CRC_buf[i]);
    }
}
You'll notice it updates the values starting at index 1, not 0. The value for index 0 was already calculated when we got the CRC value from the trailing string. Remember this line:

Code: Select all

    CRC_buf[0]=crc0;
Now you see why it's there...

Finally we come to those to "if" lines from a while earlier. The final addition to the program was to add the possibility to manually indicate a starting and ending string for the loop, so that it wouldn't go necessarily from, for example, "aaa" to "zzz" (for 3 random characters and a character set from 'a' to 'z'), but, say, "bud" to "you" (or whatever you want). This is useful if you already know the string you're looking for lies alphabetically between another to strings, or if you don't want to leave the computer on al the while and prefer to do the loop in steps. Here are the functions that read those values:

Code: Select all

/*
 * StartStringUpdateUI
 */
void wxCRCrevRecFrm::StartStringUpdateUI(wxUpdateUIEvent& event)
{
    TestString=StartString->GetValue();
}


/*
 * FinalStringUpdateUI
 */
void wxCRCrevRecFrm::FinalStringUpdateUI(wxUpdateUIEvent& event)
{
    EndString=FinalString->GetValue();
}
The "if" lines simply reset the input fields of the starting and ending strings every time the character set or the number of 'random' characters are changed.

So, that's it. As I mentioned, I've made two non-Trespassser-specific versions of my program, too. Here they are:

Recursive version:
http://rapidshare.com/files/57330213/wxCRCrevRec.zip

Non-recursive version:
http://rapidshare.com/files/57331217/wxCRCrevNon.zip

Posted: Sun Sep 23, 2007 7:10 pm
by Remdul
Cool stuff Machf! 8)

Posted: Sun Sep 23, 2007 10:33 pm
by machf
Hmmm... there are a few things I forgot to mention, like the purpose of that ?: operator in the buffer calculations - it accounts for the middle string, if present.

I was also thinking of maybe mentioning other articles on CRCs which I read, though I didn't apply their principles on this program. Who knows, maybe later I'll do something with them...

And I started thinking of a method to find an 8-byte sequence instead of a 4-byte one... it's simpler to implement than a character loop, but it may take longer. Consider the following: as I already mentioned, CRC calculations can be concatenated, with the final CRC of a 'block' becoming the initial value of the next one. So... let's say we use a loop through all possible 32-bit values, 0x00000000 to 0xFFFFFFFF and find the corresponding 4-byte sequence that has each of those values as its CRC-32. Then, if a given 4-byte sequence is composed of characters from our valid set, use that value as the initial value to find another 4-byte sequence with our desired CRC-32 value. If the resulting sequence is also composed of valid characters, add the complete string (both 4-byte sequences) to our results. Of course, this can be further expanded to find strings of not just 8 characters but 12, 16 or more multiples of 4, by adding more 'blocks' to the process. Since on each block several results are discarded and the following blocks won't be processed for them, the total number of iterations won't be (2^32)^(N-1) -where N is the number of 4-byte blocks-, but it will decrease proportionally as N increases. As I said, it probably may take longer than a conventional character loop, but the advantage is that it's simpler to program, since you only have to run a series of numerical loops using 32-bit integers (unsigned long) and check if the results match your valid character set...

I also wonder if by using a 65536-entry table instead of a 256-entry one you could find 8 bytes at once instead of 4... I'll take a look at that too.

Re: Reversing CRC-32 values

Posted: Fri Apr 04, 2008 9:28 pm
by machf
Here are two updated versions:

Recursive version:
http://rapidshare.com/files/104906017/wxCRCrevRec.zip

Non-recursive version:
http://rapidshare.com/files/104907635/wxCRCrevNon.zip

basically, what I've added is a check so that if the 4-byte string contains non-ASCII characters, it doesn't get displayed on the screen, they are replaced by 4 empty spaces instead (because some characters were making the computer slow down if displayed, apparently):

Code: Select all

    if ((crc & 0x80808080)== 0) {
	sOutput+=(crc&0xff);
    sOutput+=((crc&0xff00)>>8);
    sOutput+=((crc&0xff0000)>>16);
    sOutput+=((crc&0xff000000)>>24);
    } else sOutput+=0x20202020;
and the other addition is a couple of text boxes next to the input boxes for the Leading and Trailing strings which display the corresponding (equivalent) new Init and CRC-32 values when they are present. The latter is particularly useful because you may recognize the CRC-32 value for an already known string in it.

Another thing I'm wondering now is whether it's mathematically possible to calculate a CRC-32 value going though the input data in reverse order (not just bitwise, but bytewise, I mean).