Archive infectors : Generalities
By Unknown MnemoniK/iKx


Introduction :

Archive infectors aren't common in the wild . Anyway scanners scan archive files because archives are the center of sotfware/data exchanges, archives walk through diskettes , cdroms , network and of course the internet . You see, a good way for spreading ? There are already some archive infectors , like Zhengxi ( the best one I think ) , but there's also a zip infector disassembled in 29A#2. Few month ago, I took my interest on it, so I began to study archives. In fact,there's no complex point in it, just understand the packet storing and the CRC32

How Compression format works in general ?

In the 99% of archivers , organization of datas work as in this scheme :

Now lets take a look on compressed data blocks ,its divided in two blocks Header datas and compressed zone , I only interest myself on the Header part,the compressed zone will be just the virus , lets how it's generally organized (for common type, I will precise how it's works for ZIP RAR and ARJ later)

NB: After the compressed datas, there's sometimes extra datas,this is not used for the file but for comments or used for some shadow datas like zip for mac

This is a common header, there's sometimes added some things that are not really important ,for faking an header , you just need to pick one or use a model , after that ,you create the name and render the new CRC32 of the virus , drop the header , drop the virus and it's done.

It's appears really too simple, yeah, in rar or arj , you use this scheme but for zip , you have more complex things to resolve

The Crc 32

The Crc is one method to be sure of the integrity of the file, think that your message transit over a non-secure network, you can't be sure of the message you have received , so you will make a calculation :

The host have received this message I _ L O C K _ Y O U ,ermmm... but the secure way to see if this was the good message is to see if the CRC are the same , in this case , the CRC equal 687 , understand ?

In reality , stuff are more complex , but are the same for all kinda compressors,the CRC32 calculation is done in two steps,first the creation of a table, second the calculation of each byte of datas

The Table

The table is 1024 bytes length , it's built with a 32 bits algorithm. The idea of the table is great because it will be the same on each computer , lets see the code, I pick this one from PKUNZJR.COM I have disassembled .

After that , you will have a zone of 1024 bytes wich fix datas on all computers , the method of calculation is the same for the majority of archives format ( and the most common )

The Calulation

The calculation works with the table ,you get the current byte in the buffer , make some operations and get the dword in the table that corresponds at the location of the result . Then you XOR the CRC with this dword and when finished , you NOT the CRC , note that you start with FFFFFFFFh in the CRC counter , if you don't understand , don't panic and see code

In some words :

Infection

We can build algorithms of infection :

Common Algorithm :

This methods are quite okay , now lets see how to infect the most common archive like ZIP ARJ and RAR , their structures are different by two or three things , but this is a good occasion to see and understand code.

go to the next chapter !