New Instructions Tutorial
by Qozah


Objectives on this article

Watching most virus writers' code, I've noticed a lack of imagination when using ASM instructions: there is a cool bunch of new of them since 486 and since PPro implementations, even not keeping in mind that the new MMX is here. You should use at least the new 486 instructions. Who is going to have win32 with a 386 ? CE ? <g>. 486+ instructions work in all AMD processors, and even the Pentium ones also do ( not the Pentium Pro and above ones... but they will ).

So, I'm trying to describe some of them, as they can be very useful to you for code optimization, in polymorphism ( I'm telling :) ) and why not, to fool AVs.

Thanks to Intel for not beeing the Microsoft way hiding what these stuff does ( though I had to make research and testing to get this working ) providing me info to make this article, and to all the people I tested CPUID on.

Format used

    - rX  : "register, X bytes", be X 8, 16 or 32.
    - rmX : "register or memory, X bytes", be X 8, 16 or 32.
    - immX: "immediate, X bytes", be X 8, 16 or 32

  INSTRUCTIONS
  ~~~~~~~~~~~~

   Bxx
   ~~~
    Format: Bxx rm16,r16
            Bxx rm32,r32
            Bxx rm16,imm8
            Bxx rm32,imm32

    Processors: 486+

Description: These are single bit operations, which can be really useful when writing a polymorphic engine: for example, if you have a table which stores anything in 10 bit chunks, this is the mode to access it.

        There are these ones:

        * BSWAP: Bit Swap
        * BT: Bit Test
        * BTR: Bit Test and Reset
        * BTS: Bit Test and Set

These are some examples:

           - BTS [esi],15h 

The processor goes to ESI address, adds it 15h bits and tests that specific bit. If it's a 1, it sets the Carry Flag, otherwise it clears it. Then, as it's "Bit Test and SET", sets that bit to 1 ( Bit Test just tests it and keeps result in CF, and BTR does the same but making the specified bit be 0 instead of 1 ).

           - BSWAP [edi],ebx 

Processor goes to EDI address, adds EBX to it and swaps that bit: if it was 0 it's now 1, otherwise it's now a 0.

So, you can see the first value ( that can of course be a register plus an offset ) is where we begin, then oount X bits from there. Forget all the 8/16/32 limitations :P

Don't you begin realizing how useful is this ?

CMOVcc

    Format: As mov, but cc has to be substituted by a condition, same asconditional jumps

    Processors: Pentium Pro +

Description: Fuck, conditional movs ! Have you ever thought about the possibilities we have with that stuff ? One thing to keep in mind, check the CPUID instruction before to know if the processor admits them. Right now I think I won't recommend it's use as they would only work in Pentium Pro and Pentium II processors, but as soon as people start to get them as a standard and AMD adds them to it's processors, you should change your mind.

CMPXCHG

    Format: CMPXCHG rm8,r8
            CMPXCHG rm16,r16
            CMPXCHG rm32,r32

    Processors: 486+

Description: Looking at the name there's no further explanation to be added. Compares them and exchanges them.

CPUID

    Format: CPUID

    Processors: 486+

Description: Forget all that flag checking and stuff to know which processor do you have. CPUID ( CPU IDentification ) gives you info in registers EAX, EBX, ECX and EDX, depending on the value in EAX. If Intel tells us the truth, this is:

      EAX = 0
      -------

EAX: Maximum CPUID value for EAX ( it's 2 now in PPro, while normal value is 1 )

Intel gives us this:

      EBX = 'Genu'
      ECX = 'ineI'
      EDX = 'ntel'

AMD comes with this, making the order EBX-EDX-ECX:

      EBX = 'Auth'
      EDX = 'enti'
      ECX = 'cAMD'

Genuine wouldn't fit :)

      EAX = 1
      -------

This is the important stuff. EAX holds version, while EDX holds information.

      EAX = Version Information:

        31-14: Useless
        13-12: Processor Type
        11-8: Processor Family
        7-4: Processor Model
        3-0: Stepping ID

Let's watch how does this work. Of course, thanks to all the people who helped me to test this :). This are some value to that EAX information.

                      type  family    Model         Stepping ID
       PPro:    000   00    110       0001          ?
       K6-2:    000   00    101       1000          0000
       P133:    000   00    101       0010          1100
       P75:     000   00    101       0010          0100

This is some info I researched by means of various people and complemented by some Intel little extracts I could get. It's not complete, but DX2 486 for example doesn't use CPUID instruction, and the other models are just OverDrive enhanced versions of these ones.

    .---------------------------------------------------------------.
    | Family      |  Model | Processor                              |
    |-------------+--------+----------------------------------------|
    |             |        |                                        |
    |   INTEL     |        |                                        |
    |   -----     |        |                                        |
    |             |        |                                        |
    | 0100        |  1000  | Intel DX4                              |
    | 0101        |  0001  | Pentium processors ( 60 or 66 MHZ )    |
    | 0101        |  0010  | Pentium processors ( 75 to 200 MHZ )   |
    | 0101        |  0100  | Pentium MMX ( 166, 200)                |
    | 0110        |  0001  | Pentium II processor                   |
    | 0110        |  0011  | Pentium II processor, model 3          |
    | 0110        |  0101  | Pentium II model 5, and Celeron        |
    |             |        |                                        |
    |   AMD       |        |                                        |
    |   ---       |        |                                        |
    |             |        |                                        |
    | 0101        |  0110  | K6 processors                          |
    | 0101        |  1000  | K6-2 processors                        |
    |             |        |                                        |
    '---------------------------------------------------------------'

Type: There are four kinds: 00 means a normal processor, as 01 is an OverDrive Processor, 10 is a dual processor, and 11 is still reserved.

Family: Indicates if it's a 486, 586 or 686; for a Pentium, it will be 101, for a 486 it will be 100, Pentium Pro and II gets 110. K6 models have the same as Pentium ( 101 )

Model: Models in the same family. You can check out the results, as for example P75 and P133 are the same Model for example.

Stepping ID: Revisions in the same model.

        EDX = Feature Information:

I'm showing just the most interesting.

        bit    feature

        00     If there is an FPU ( co-processor )
        04     If RDTSC instruction works
        15     If CMOVcc instructions work.

Anyway, there are lots of fields ( most of em useless for us ), specially awaiting to be filled as new instructions arise.

ICEBP

        Format: db 0f1h

Description: This undocumented opcode works on every Intel processor, but not in AMD or Cyrix ones. It just generates an int1h instruction. Cool as Vecna said for polymorphic engines, as it fooled even Softice non recognizing it as a valid opcode.

Jcc

        Format: As usual conditional jumps

        Processors: 486+

Description: You just forget all that shit about only using short relative jumps. You can easily jump to rel16 or rel32 offsets, so it seems the big bad stuff about relative jumping is completely eliminated. Code optimization and very big decryptors in your polymorphic code.

MOVZX,MOVSX

        Format: MOV?X r16,r/m8
                MOV?X r32,r/m8
                MOV?X r32,r/m16

        Processors: 486+

Description: MOVe with Zero eXtend or MOVe with Sign eXtend. This instruction lets you move a r/m to a register from a bigger size, zero extending the high part or sign extending it. This means you won't need to do a xor cx,cx/mov cl,bl if you for example got a random number in bl to make a loop, and some other applications you would think.

RDTSC

    Format: RDTSC

    Processors: Pentium or +. AMD K6 also supports it.

Description: Forget all the ways on getting a random value: this instruction gets the processor's time-stamp counter into EDX:EAX, a 64 bit long random number ( it increments in each clock cycle ). It's problem is that if the TSD flag ( time stamp disable, in register CR4 ) is set, you only can exec this from ring 0.

I recommend anyway just getting the value in eax if you want values that fill completely the register, as edx lasts a lot to be filled ( well, maybe for slow poly it could be great ). Of course, you can modify the CR4 register as the other Control Registers, but you must be ring0 again to perform that :)

Who would want to hide us the Stamp Counter ???

XADD

    Format: XADD rm8,r8
            XADD rm16,r16
            XADD rm32,r32

    Processors: 486+

Description: It exchanges the source and destiny operands, loading the sum into the rmX ( the destiny ). Cool for optimizing code or just making it "another way".

VERR/VERW

    Format: VERX rm16

    Processors: Pentium Pro+

Description: Checks if a segment is readable or writable ( with VERR or VERW ) from the current prilege level. This segment can be code or data, and the operand is an address ( register or immediate ) that contains the segment selector for the one you want to check. If you can read or write it ( depending on if it's VERR or VERW ), the ZF will be = 1, otherwise it will be = 0. The cool thing on this is that this won't generate any exception, of course :), so you could stop having to use the SEH for searching the kernel32.dll base address.

Bad news are that it's not supported by AMD processors and Pentium ones; just a bet for the future.