Encryption, Scan Strings & You
by Feathered Serpents


Intro:

So, you've written the perfect polymorphic engine, have you? You've spent countless hours perfecting it so that utilizes most of the instructions available to the CPU, make the junk instructions look real, create working subprocedures, encrypt in multiple layers, use self modifying code, and to vary the algorithm and registers used for decryption. You've really put a lot of work into this to make it as hard as possible for the anti-virus guys to find you. Unfortunately, you never were that good at encryption, thus you stuck with the basic methods of encrypting (xor, add, sub, etc). Well, guess what. You might as well not have bothered writing the polymorphic engine in the first place! That's right! The anti-virus chaps can scan string you as if your poly engine was not there at all.

How can this be? Well let's look at some examples, starting with the most lame and working our way up.

SCAN STRINGS

NOT/NEG:

Some people, which will remain nameless, think that the "not" and "neg" instructions are a form of encryption. So let's see what happens.

Virus Body:
	01 80 65 2A 31 ...
Not "Encrypted" Virus:
	FE 7F 9A D5 CE ...

Since there in no possible variation in the "encrypted" pattern, why should the AVs bother to scan for "01 80 65..." when the "FE 7F 9A..." number sequence is sitting right in front of them? They just will use "FE 7F 9A..." as the scan string.

ROL/ROR:

Other people like to use ror and a number from 0 to 7 (when working with bytes). There are two ways this can be tackled. Since there are only 8 possible variants to ror/rol, the AVs can add 8 signatures to their data files and claim another 8 viruses that they detect (how do you think we've got so many viruses?). The other method I'll get to later,

Of course, other methods are not so easy. This led AVs to develop a way look at the scrambled sections of code and find patterns that do not change. Relax. It's not that difficult. Watch:

XOR:

Ah, everyones favorite method of encryption. So, let's say you have your virus, and you encrypt it with any old number.

Encrypted with:  7A
Virus Body:
	01 80 65 2A 31 7B ...
Xor Encryption:
	7B FA 1F 50 4B 09 ...

So, what does this mean? We have (where ^ = xor, and K= key):

	(01^K) (80^K) (65^K) (2A^K) (31^K) (7B^K) ...

An interesting phenomenon was noticed. Let's xor the first and second terms of our encrypted virus. We get:

	(01^K)^(80^K)

Now, xors can be done in any order. So we can remove the brackets:

	= 01^K^80^K

Let's rearrange:

	= 01^80^K^K

But since any number xored with itself is 0, K^K=0. Thus, this is:

	= 01^80
	= 81

Oh oh. The Key value is now totally irrelevant! Let's make a series of numbers by taking the first and second, second and third, etc.

	81 (80^01)
	E5 (65^80)
	4F (2A^65)
	1B (31^2A)
	4A (7B^31)

So we get:

	81 E5 4F 1B 4A

This can be used as a scan string. Unclear? Let's go back to the example:

(Number Encrypted with:  7A)
Virus Body:
	01 80 65 2A 31 7B ...
Xor Encyption:
	7B FA 1F 50 4B 01 ...
Scan String:
	81 E5 4F 1B 4A ...	(notice it's one shorter than the above)

So, without knowing the key, let's see if "7B FA..." can be scan stringed. To start xor the first and second numbers:

	7B^FA= 81

And the next two:

	1F^FA= E5

Next three sets:

	50^1F= 4F
	4B^50= 1B
	01^4B= 4A

Scan strings match. Game over.

To sum up what happened, the AV guys figured out that you don't need to be looking for specific bytes, but rather a relationship between certain bytes. This relation does not vary with the key used in the encryption. This RELATION is then used as a scan string. That means they go along calculating the xors of consectutive bytes and store it someplace. If they get a match in what they are storing to a known relation of a virus, it's flagged. It's not quite as accurate as a regular scan string, but it's pretty close.

So, would upgrading to a 16 or 32 bit key save the XOR method? Afraid not. This is why:

(16-bit Number Encrypted with:  7A 21)
Virus Body:
	01 80 65 2A 31 7B ...
Encrypt by:
	7A 21 7A 21 7A 21 ...
Encrypted:
	7B A1 1F 0B 4B 5A ...

Now notice that the 1st and 3rd entries are encrypted with the same byte. Thus, a scan string is generated from every second byte of the encrypted virus in the same manner as above (1F^7B= 64, 4B^1F= 54, ...). This causes the area in the virus that is to be a scan string to have to be twice as long. But since a scan string is only like 15-30 bytes (?), that's not all that much. Even a 32-bit encryption key can be easily scan stringed by simply taking the first, fifth, ninth... bytes.
Since the first, fifth, ninth... bytes are always going to be encrypted by the same byte value regardless of whether an 8-bit, 16-bit or 32-bit key is used, it is a good bet that the AVs use every forth byte to scan string all xor encrypted sections.

Since increasing the key length does not help, maybe xor/adding the previous value of the plaintext together before xoring might help. I call it the "Previous Value XOR". It doesn't work either - watch:

Virus Body:
	01 80 65 2A 31 7B ...
Xor Encyption:
	(00^01)^K= 01^7A= 7B
	(01^80)^K= 81^7A= FB
	(80^65)^K= E5^75= 90
	...

The problem here is that the stuff in the brackets is a constant. It does not change, hence for all this algorithm cares, there may as well be an 81 instead of 80 in position 2 of the virus (likewise with the E5 instead of the 65). Therefore, the same principle as above works to scan string this.

ADD/SUB:

So, does the add/sub method fare any better? Nope. Let's look at an add example.

Add Encyption:

	= (7A+01) (7A+80) (7A+65) (7A+2A) (7A+31) (7A+7B)
	= 7B FA DF A4 AB F5 ...

Now, let's see what happens when you subtract the first and second elements.

	= (7A+01) - (7A+80)
	= (7A+01) - 7A - 80		[Carry minus thru brackets]
	= 7A + 01 - 7A - 80		[Remove useless brackets]
	= 01 - 80 + 7A - 7A		[Can rearrange now]
	= 01 - 80			[Cancel the two keys - Oh oh]
	= 81

Yes, that last "subtraction" looks weird, but this math is modulo 100h. That means that once you add past FFh, you start again at 0 and once you subtract past 0, you start again at FFh.

	81 1B 3B F9 B6 ...

And if we test this, we see that indeed it does indeed catch every possible encryption value for add.
Sub encryption suffers the same flaw, but instead of subtracting to get the scan string, you add the bytes.

So, does a longer key help us here? Not really. This process easily applies to 16-bit and 32-bit keys by reading every 2nd or 4th byte to get the scan strings. This makes it hard to scan for the parts encrypted by the upper half or 3/4 of the key. However, it has the same flaw as the xor - you can always apply this to every 4th bytes, and thus be able to scan all the add/sub encrypted parts with one pass (remember: speed is an issue to scanners - I doubt they can afford to make useless passes of every file they scan).

So, with that, there go the 3 most classic methods of simple encryption. Anything else that creates these scan strings? Yup!

ROR/ROL:

Back to bit rotation. There is another method, that, despite probably being slower and less accurate, is worth mentioning. For each of the bytes, rotate by some number. Choose the number of bits to rotate by to make the largest number. For example:

Virus Body:
	01 80 65 2A 31 7B ...

Since 01h = 00000001b, the largest number by rotation is 10000000b = 80h Similarly:

        80 = 10000000b -> 10000000b = 80h
        65 = 01100101b -> 11001010b = CAh
        2A = 00101010b -> 10101000b = A8h
        31 = 00110001b -> 11000100b = C4h
        7B = 01111011b -> 11110110b = F6h

Thus the scan string is:

	80 80 CA A8 C4 F6 ...

If you check against any rotation of the virus body, you will see that this method does indeed catch it.

Note 1) This scan is more likely to capture other strings as well as this scan string will just as easily flag "20 08 56 2A 4C F6" as being a match. This string could not result from the rotation encryption. This necessitates longer strings and/or checking if the string is really present by other methods.

Note 2) This scan is not actually as cumbersome as it may seem. It can be sped up drastically by generating a look up table for each number's maximum attainable value resulting from rotation.

Well, that sums up how the most popular methods can be overcome. So, is there a solution? Of course there is! If all cryptography was this easy to play with, PGP would be out of business.

SOLUTION?

A few simple things that don't scan string easily are as follows:

Look Up Table:

The classic idea of replacing one thing with another. For example, if you replace all the letter 'A's in a text with 'S's and all 'Z's with 'Y's, etc. To do this, you need a simple look up table:

LookUpTable:
	db	0
	db	1
	db	2
	...
	db	0FFh

A lot of typing? Lazy bastard! Define this macro then:

	LUmacro macro
	  cntr = 0
	  rept 100h
	    db cntr
	    cntr = cntr + 1
	  endm
	endm

then define your look up table like this:

LookUpTable:
	LUmacro

Now with everything so nicely defined, how would you go about using it?

Decryption:
	xor	eax, eax		; Clear top of eax
	mov	esi, Source		; Set up the pointers
	mov	edi, Destination
	mov	ecx, SizeToDecrypt	; Size of encrypted part
DecryptLoop:
	lodsb				; Load the encypted byte
	mov	edx, offset LookUpTable	; Point to start of look up table
	mov	al, byte ptr [eax+edx]	; Look up what the byte should be
	stosb				; Store it
	loop	DecryptLoop		; Next!

The encryption process is a little trickier. You must do the reverse process and scan the lookup table for the value, and write out the offset from the lookup table.

Encryption:
	xor	eax, eax		; Clear top of eax
	mov	esi, Source		; Set up Source & Dest
	mov	edx, Dest
	mov	ebx, SizeToEncrypt	; Size
EncryptLoop:
	lodsb				; read in value to encrypt
	mov	edi, offset LookUpTable	; edi = LookUp Table
	mov	ecx, 100h		; 256 entries in table
	repnz	scasb			; Find value in table
	not	cl			; Find offset in table(quick and dirty)
	mov	byte ptr [edx], cl	; Store it
	inc	edx			; Point to next
	dec	ebx			; Decrement counter
	jnz	EncryptLoop		; Repeat if more to encrypt

Changing the lookup table from generation to generation is fairly easy. There are two possible ways:

  1. random, but make sure no entry is there two or more times.
  2. randomly start exchanging entries in the look up table. Think of this like shuffling a deck of cards :-)

Important Reminders:

  1. DO NOT ENCRYPT THE LOOK UP TABLE! Remember, the lookup table changes from run to run, so despite it being in the open, it cannot be scan stringed.
  2. Each entry in the look up table has to be distinct and all the possible offsets into the table must be used. In math terms, the relation must be 1-1. Since you're usually going to be using a look up table with 256, this simply means that NONE of the entries in the table is allowed to match with any other entry in the table AND all values 00 through 0FFh must be used, lest decryption is not possible.

Previous Value Jumping Encryption:

Remember the section above, where I showed that just xoring in the previous value you encrypted into the next encryption? Well, if you add one little trick to that, it will not be scan stringable. The idea comes from group theory in math.

Lets say you have your virus:

	10 20 05 FE 87 CA 21 09

As you can see, this "virus" is 8 bytes long. :-) The idea is that you encrypt the current byte with the previously encrypted byte, then jump past a few bytes and repeat the process. Once you get past the end of the virus, you go back and start again. So, in this case, let's say we use the a jump value of 3 and start at the 20. That means you encrypt the 20 using the previous byte (in this case some initial value), and then you "jump" 3. This takes you to the 87. You encrypt the 87 with the 20, and jump 3 more. This takes you to the 09 which you encrypt with the 87. Now you jump 3 more, but you when you get to the end of the file, you keep on counting from the beginning. This takes you to the 05 which you encrypt with the 09. You repeat this process for the number of bytes you wish to encrypt (8 times in this case).

There are a few very important things to note about this however:

  1. The jump value (the number of bytes to jump - 3 in my example) MUST vary with each encryption, otherwise this suffers the same flaw as the previous value xor. Just think of the jump value as the encryption key. The starting position does not have to vary, but it can.
  2. The jump value MUST be relatively prime to the size of the thing you are encrypting. That means that the largest number that evenly divides both the jump value and the size HAS to be 1. If it isn't, then you are not going to cover the entire region you are trying to encrypt (and the example code I give, will probably not work). A good way to ensure this, is to make the size of the part being encrypted a prime number (get the next largest prime from a list then add a few bytes to your code to make it that big - you'll probably use less bytes then if you try to do the division tests).
  3. The jump value cannot be 0 (duh!), and it works best if it's less than the size of the region you are encrypting.

So, with that out of the way, let's do an example:
[Note: I'm using XORs, but it can be anything]

The virus:
	10 20 05 FE 87 CA 21 09
Jump Value: 3
Initial 'Last Value':  10
Starting position:  The 05

So, the 05 would become:
	10 (LastValue) xor 05 (Position)= 15
	...and last value would become 05, and we jump to the CA
CA->    05 xor CA = CF (Last Value becomes CA.  Goto 10)
10->    CA xor 10 = DA (Last Value becomes 10.  Goto FE)
FE->    10 xor FE = EE (Last Value becomes FE.  Goto 21)
21->    FE xor 21 = DF (Last Value becomes 21.  Goto 20)
20->    21 xor 20 = 01 (Last Value becomes 20.  Goto 87)
87->    20 xor 87 = A7 (Last Value becomes 87.  Goto 09)
09->    87 xor 09 = 8E (Since we did this 8 times, we are now done.)

Thus, the encrypted virus:
	DA 01 15 EE A7 CF DF 8E

Simple, no? Well, anyways, lets see some code:

Decryption:
	mov	esi, Source			; Init Source & Dest
	mov	edi, Destination
	mov	ah, InitialValue		; Initialize Previous value
	mov	ecx, SizeToDecrypt		; Init Size
	mov	edx, StartingPosition		; Init Starting Position
DecryptLoop:
	mov	al, byte ptr [esi+edx]		; Read byte
	xor	ah, al				; Decrypt with last byte
						; Last byte= unencrypted byte
	mov	byte ptr [edi+edx], ah		; Write it out
	add	edx, JumpValue			; Jump by some value
	cmp	edx, SizeToDecrypt		; Is [esi+edx] in bounds?
	jb	short DecryptSkipSub		; If not,
	sub	edx, SizeToDecrypt		;  then make it in bounds
DecryptSkipSub:
	loop	DecryptLoop

Encryption:
	mov	esi, Source			; Init Source & Dest
	mov	edi, Destination
	mov	ah, InitialValue		; Initialize Last Value
	mov	ecx, SizeToEncrypt		; Init Size
	mov	edx, StartingPosition		; Init Starting Position
EncryptLoop:
	mov	al, byte ptr [esi+edx]		; Read in byte to Encrypt
	xor	ah, al				; Encrypt with last byte
	mov	byte ptr [edi+edx], ah		; Store encrypted
	mov	ah, al				; Save last byte (unencrypted)
	add	edx, JumpValue			; Jump by some value
	cmp	edx, SizeToEncrypt		; See if esi+edx is in bounds
	jb	short EncryptSkipSub		; If not,
	sub	edx, SizeToEncrypt		;   then make it in bounds
EncryptSkipSub:
	loop	EncryptLoop

There are a lot of variations on the above code. The trickiest part is to make sure that the numbers are relatively prime. I don't feel like writing out the code, so I'll only give an outline (this can all be avoided by SizeToEncrypt being prime):

	Pick a number X to try for Jump Value between 1 and the size
	In a loop Y of 2 upto and including X:
		divide X by Y.
		If X does not divide Y evenly (remainder not 0)
			then continue in loop
		else
			Divide The Size by Y
			If this divides evenly, pick a new X and start over
			If not, continue.
If you finish the inner loop for any X, X is usable.

Multiple Encryption Instructions:

This is perhaps one of the simplest. Instead of using just one xor or add, it uses multiple instructions. It is doubtful that AV companies will go to the trouble and expense - not to mention slowing down their scanners - by testing all files with some weird algorithm for a single virus. That means that combinations such as:

	xor	al, SomeValue
	add	al, SomeOtherValue

while, theoretically, scan stringable are probably not going to be.

That said, there are a few things that must be said:

1) The instruction order for an encryption has to be the reverse of a decryption. For example, if the encryptor is:

		xor	al, SomeNum
		add	al, SomeNum2

Then the Decryptor has to be:

		sub	al, SomeNum2
		xor	al, SomeNum

2) The keys for all encryption instructions have to be random and independent.

3) Having 2 consecutive instructions of the same type is worthless as instructions of the same type can be effectively combined into one. For example a ror by 1 followed by a rol by 3 is in essence a rol by 2.

If you're making a polymorphic engine, then it's not too difficult to expand on this idea. Make the encrypting instructions themselves random. To accomplish this you do something like this:

	- ...
	- create encryption/decryption instruction pair
	- create junk
	- loop on the previous two until you are satisfied there are enough
	- ...

Creating the encryption and corresponding decryption instruction is easy with an accurate opcode listing. The problem, however, is with Rule #1 above. The instructions to encrypt must be executed in the reverse order than the ones used to decrypt. That said, there is an elegant solution. You create your own little "stack". Just define:

Stack		db	MaxStackSize dup (?)	; 30 for size should be enough
EndOfStack	equ	$
StackPtr	dd	EndOfStack

Before you start, simply "push" a RET onto the stack

	dec	StackPtr
	mov	edi, StackPtr
	mov	byte ptr [edi], 0C3h		; Put on the ret instruction

Now, whenever you add a decryption instruction to the decryptor, simply "push" the corresponding encryption instruction onto this stack. For example, if you added an "add cl, 45h" (80 C1 45) to your decryptor, then you add: "sub AL, 45h" (2C 45) to your stack. Please note, since the encryptor is in the encrypted part of the virus, you can make your life easy on yourselves and not have to worry about using different registers for your encryptor. Keep it simple and use AL. So, the code would be something like:

	; ...
	; write "80 C1 45" (add cl, 45h) to decryptor
	; ...
	sub	StackPtr, 2			; Push two bytes onto "stack"
	mov	edi, StackPtr
	mov	word ptr [edi], 452Ch		; "sub al, 45" in rev. order

When you are done creating your encryption instructions, your "stack" might look something like this:

	add	al, 0B2h
	xor	al, 97h
	ror	al, 2
	sub	al, 45
	ret

Do I really have tell you that for each byte you are encrypting you should just call this "function"? Remember, your stack pointer is neatly pointing to the first instruction. How convinient.

BTW - I fully realize that the code presented is unoptimized, but that's not the aim of this tutorial now, is it?

Shuffling:

This isn't really encryption, but it does evade scan strings. You can combine this method with just about any other, to add in encryption if you so desire.

The problem with shuffling, is not so much how to shuffle the bytes, but rather how to un-shuffle them. To do this, one of many ways is to take one byte at a time and swap it with a byte pointed to by some relation. This relation has to be random from generation to generation. If you don't vary the relation, then the shuffled part will not change from generation to generation allowing use of a simple scan string to pick it up. I recommend using the same idea as the "Previous Value Jumping Encryption" mentioned above. For example:

JumpKey: 3
Starting offset: 2 (pointing to the 03)
Encrypted Virus (call this array T):
	01 02 03 04 05
To decrypt, do:
        T[0] <-> T[2]
	= 03 02 01 04 05
        T[1] <-> T[2+3] = T[1] <-> T[0]
	= 02 03 01 04 05
        T[2] <-> T[0+3] = T[2] <-> T[3]
	= 02 03 04 01 05
        T[3] <-> T[3+3] = T[3] <-> T[1]
	= 02 01 04 03 05
        T[4] <-> T[1+3] = T[4] <-> T[3]
	= 02 01 04 05 03

All clear? There is one small caveat however. The shuffling for Encryption and Decryption have to be done in the opposite order, lest it will not un-shuffle right. I hope it's clear from the example code:

Decryption:
;----------
	mov	esi, offset Source		; Place to De-shuffle
	mov	ecx, Size
	mov	edx, StartingPosition		; Same idea as above
	xor	ebx, ebx			; Clear Current position

DecryptLoop:
	mov	al, [esi+ebx]			; Grab byte
	xchg	[esi+edx], al			; Exchange it with the other
	mov	[esi+ebx], al			; Finish the change

	add	edx, JumpValue			; Jump Around the file.
	cmp	edx, Size			; If edx is not out of bounds
	jb	SkipDecryptSub			;  then ok
	sub	edx, Size			;  else make it in bounds
SkipDecryptSub:
	inc	ebx				; Shuffle Next Byte with the
	loop	DecryptLoop			;  byte at [esi+edx]


Encryption:
;----------
	mov	esi, Source			; First you're probably going
	mov	edi, Destination		;  to have to make a copy
	mov	ecx, Size			;  (Can't scramble original)
	pusha					; These regs. we'll need later
	rep	movsb
	popa
	mov	ebx, StartingPosition		; Which register the first
						; register will be swapped with
EncryptLoop:
	sub	ebx, JumpValue			; Jump Value same idea as above
						;  but must go backwards
	jns	SkipEncryptAdd			; Jump if not Negative
	add	ebx, Size			;  Make positive

SkipEncryptAdd:
	mov	al, [edi+ebx]			; Grab byte at Jump Value
	xchg	[edi+ecx-1], al			; Replace it with current byte
						; Remember:  we need to shuffle
						;  backwards.  Thus we count
						;  down (1 less than ecx).
	mov	[edi+ebx], al			; Finish the swap
	loop	EncryptLoop

Summary:

Well, that's about all to this article. I'm sure there are many more methods than the few I described. Be creative. Combine. The possibilities really are endless. Just don't let me see and more simple XOR/ADD/SUB encryption routines...


This article brought to you by your dear friends at Feathered Serpents.

Disclaimers:
We are not responsible for anything you do from the information in this article. It's YOUR life, YOUR choices. What you choose to do with this information is something YOU have to live with. The information in this article is as accurate as possible, but due to the nature of the subject, there can be no guarantees about correctness.