Computer Virus Collecting:
Fun or Folly?

[by Cicatrix - March 1999]


Why does someone collect small series of 1's and 0's, also known as files, which are known to have the ability of completely ruining a otherwise perfectly fine day? At least someone who collects stamps, Barbie dolls or other assorted knick-knacks can show off his collection to anyone that is interested without creating havoc in his albums or showcases.
What is the attraction of having thousands of these little files spread around your hard disk with the ever present possibility of "double-clicking" the, Oops Godd*mn, wrong file? For many virus collectors this underlying sense of relatively "harmless" danger is a large part of the intense fascination with these gremlins of the computer world, the computer viruses. Other factors that can create this fascination with computer viruses are the inherent underground image ("the forbidden fruit") and the pure technical features ("artificial life") of these mostly tiny files. Underlying this fascination is the pure human habit of collecting anything that can be collected.

With this generic little essay I am going to give you my views and philosophies on the virus collecting and my personal experiences of collecting viruses for the last 6 to 7 years. Since Vesselin Bontchev has written a very nice essay on collecting / researching computer viruses I am not going cover every possible detail but just give you my view on certain topics. Subjects I intend to cover are as follows:

  1. My personal history of collecting computer viruses. (Info only, skip if not interested)
  2. Why collect computer viruses?
  3. How to collect computer viruses?
  4. Collection content
  5. Philosophies
  6. Information

1. My personal history of collecting computer viruses

As I have already mentioned in a couple of interviews my virus collection started with one single virus. My personal attempts at mastering a computer and the rise of the computer virus happened almost at the same time. Seeing many 10 year olds around me using computer keyboards as if it was a part of their body was pretty frustrating initially. I had always been interested in anything technical or futuristic and the acceptance of the personal computer, as an integral part of society was something I did not want to miss.

My first computer was an Acorn Electron with a whopping 32K and a cassette recorder as a storage device and it was on this machine that I attempted my first QWERTY induced programs. BASIC was the language to go with and at the time it looked pretty advanced to me. I had never even heard of other languages like Cobol, Fortran or assembly and to be honest all the programming stuff was way over my head at the time.

Around the same time one computer in particular was becoming very popular, the Commodore C-64. This computer could be equipped with a disk drive and it was this computer that sparked the first rumors about a new phenomenon, the computer virus. Initially this phenomenon did not really interest me too much since my Acorn was not really susceptible to it. Also it was mostly associated with illegal software and I did not have any illegal software, so for me the problem was non existent.

About a year later I had experienced a second hand Spectrum ZX that was OK but through my employer I got the opportunity to buy a relatively cheap IBM compatible PC. This "amazing" 8086 machine got me started in the world of Microsoft, floppies and hard disks. In the mean time the available software skyrocketed and I totally skipped trying to program myself. I heard many more stories about the computer virus issue but the technical side did not interest me too much. Although I thought it was probably a nuisance I would never encounter I still decided to run AV software on my "new" 386.

Then one day I borrowed a piece of software to try something new and just for the hell of it I scanned the borrowed diskettes with the McAfee AV software. To my amazement the scanner told me one of the diskettes was "INFECTED" with the Cascade virus. Now, this really got my attention. Something I thought that would never touch me had found me. This brush with a computer virus told me I could no longer avoid the issue if I was to continue my computer hobby.

After many hours browsing BBS's (Bulletin Boards for those not familiar with the pre-Internet period) in the area I collected a lot of information about computer viruses, the problems they could cause and the best ways to defeat/combat them. Reading all that material sparked my interest and I started looking for more material. In my search for this material I ran across a BBS that had about 20 viruses online in a limited access area. I wrote the sysop of the BBS requesting access to that area and I sent him my single Cascade virus as proof of my "expertise". I guess he was convinced because I promptly got access to the virus area. This really started my virus collection because these viruses again got me access to a bigger VX BBS with even more viruses.

In the beginning I was sort of paranoid about computer viruses, I thought they would infect everything by just copying them from disk to disk. I used separate floppy disks marked with red marker to store them and I ran a virus scanner just about every ten minutes. Something else I did not really appreciate in the beginning was the existence of variants of a particular virus. I thought that when I had one copy/variant of a particular virus I had enough for collecting purposes. Boy, was I wrong! Nowadays variants account for most viruses necessary to complete a virus collection.

With my meager virus collection of about 200 viruses I departed for the US early 1994. Soon I was back to visiting BBS's and within no time I found several VX BBS's. My 200 viruses got me access to most of them and now my virus collection was really growing fast. Names like Chiba City, Black Axis and Arrested Development are nostalgia to me and some of the other early virus collectors that are still around. Then all of a sudden the ultimate VX BBS popped up; the West Coast Institute of Virus Research (WCIVR) operated by Falcon and Polt. The number of viruses online was intimidating and caused many collectors to start salivating in front of their monitors. Like so often with BBS's dependant on outside contributions the users leeched more than they contributed, but it was hard to contribute something to such a vast collection. After a while WCIVR also made its appearance on the WWW which made leeching even easier. Ultimately this caused Falcon to fall back on limited access through monetary donation and in the end even that could not prevent the downfall of the greatest VX collection site ever.

When the number of viruses in a collection passes a certain limit any logical structure tends to disappear and a form of organization is paramount. There are many ways to organize a collection and I will get back to some of those later on. When I determined I had to start organizing my collection in picked what I thought was the easiest way, I used a popular virus scanner that was known to ID in an organized manner. F-Prot, a well-known DOS-based freeware virus scanner/remover is still the main utility I use to organize my collection today.

Separate from the scanner based organization I started using VSUM, a much maligned product by Patti Hoffman, to create small subcollections based on identical backgrounds of some viruses. It was easy to search the VSUM database for a certain phrase and then look for and research the found viruses. This way of organization produced collections based on a particular virus writer, a virus writing group or any other common feature that was interesting. Since I believed (and still do) in sharing I made these collections and updates to them available through VX BBS's and later through Internet to anyone that visited them. Initially I re-released updated collections but that caused a lot of unnecessary downloading so ultimately I just released the updated part.
Although I later found out that VSUM was not the accurate product I once thought it to be it was a big help in organizing my collection and ultimately it was the spark that got me started on VDAT. Today, although still updated at irregular intervals, VSUM is desperately behind in the number of viruses covered and is still inaccurate in many ways.

While looking for new viruses I ran across many computer virus tools and utilities. Although I had no clue what some of them did I decided to collect them anyway. Now I will collect anything that touches the subject of computer viruses: virus creators, polymorphic/mutation engines, e-zines, tutorials and essays/papers. The number of diskettes containing viral material started approaching a hundred and whenever I wanted to read/find something particular I knew I had it really started to piss me off when I couldn't find it. Mostly I ended up downloading the file in question again and storing it on a new diskette. Thus the number of diskettes increased and the level of organization decreased.

While browsing through VSUM one evening the idea came. Wouldn't it be cool to have all my collected information in one easy accessible, hypertext based, database? It would certainly save a lot of searching and frustration. After some searching I found a DOS-based hypertext development tool and in August 1995 I released the first attempt at VDAT.

I am the first to acknowledge that the first couples of releases were pretty lame but I think I managed to improve the product with every release. Somewhere during this process I decided that the DOS hypertext version I was releasing was just not good enough for some graphic features I wanted to include. Also it was a pain in the butt to update.

When I started looking for an equivalent Windows based hypertext tool I came across a small database created with InfoCourier. After trying out the trial version for some time I concluded that using HTML was not only easy but a lot of HTML editing tools were available which made updating VDAT a lot less complicated. So I bought (no kidding) the full version of InfoCourier and started the conversion of VDAT for DOS to VDAT for Windows.

Version 1.9 for DOS and version 1.0 for Windows were each other's equivalent and although some people asked me to maintain a DOS version soon I noticed keeping both updated was just not feasible and I decided to discontinue the DOS version. In the mean time the "phenomenon" of VDAT was finding it's way to more users, both in the VX scene and the among the general computer user. Most reactions I got were supportive and many people supplied me with material I could use for VDAT.

While keeping VDAT updated I was still very much into collecting viruses. When I started making my collections and their updates available through a limited access web site a lot of people contacted me with either requests or offers for trade. All this created a nice flow of new stuff that allowed me to release updates to my collections every once in a while. Although these collections and their updates were created for my own purposes many fellow collectors and maybe some newbie virus writers seem to enjoy them too. After a period of limited access I decided to make everything I produced available to whoever wanted it although I found that 10 Mb was not enough to run a satisfying web site. My ISP during that period was very "open" about all this and I never had any complaints or requests to remove my virus files even though I had expected them.

Since I had a "normal" job too and all this computer virus collecting was just a hobby not a lot of time was left to give my web site the look I wanted it to have. All over the Web new virus sites popped up and died out, most with better looks than mine had. For a while the WCIVR had a gigantic number of computer viruses available for download, initially for free later they started using ratios. These were the fun times for collecting viruses.

All my efforts of keeping my collections, my web site and VDAT updated were decreasing the available time for keeping my virus collection up to date. Although I still collected viruses I could find no time to ID, scan and organize them properly. Also the increasing numbers of different viruses played havoc with my initial method of organizing my collection, it was all very time consuming.

Although most reactions to my efforts were positive the number of negative reactions was increasing. Apparently some people felt that their intellectual property was violated when I decided to include some of their material in one of my collections. The lack of time, the collection that was in disarray, the effort of updating an unsatisfying web site, the lack of available web space and the increasing number of negative reactions caused me to rethink my goals. After an interim period of a web site with only VDAT available I decided to fully pull the plug. The only thing I was going to release publicly in the future was VDAT and that would have its own web site. To the contrary to what most people thought the negative reactions were not the sole reason for me quitting my site, they were just the last straw.

In the wake of all this I was flooded with e-mail and although I must say all of them were supportive some thought I was "giving in" and that I should "stand up" to the negative reactions. Like I mentioned the above, these reactions were not the main reason just the straw that broke the camel's back.

After the demise of my main web site I finally had some time to rearrange my virus collection. While doing so I found out that the lack of attention to my collection had caused me to fall behind in the numbers of viruses I had collected and I frantically looked for the missing material. Luckily some of my more solid contacts from the past came through and helped me out bringing my collection back up to speed.

The main emphasis of my hobby is again collecting viruses and a by-product is the continued release of VDAT. My subcollections and their updates have stopped, maybe not forever but certainly for the near future. If they ever are reinstated they will not be available to the general collector but only to a select few.

In the years I have been collecting viruses I have never been a member of a group. I always thought that maintaining the "middle-of-the-road" and being an independent produced the best results. Being a member of a particular group tends to limit access to work from another group. After the demise of my web site I found that at a certain point in one's virus collection it is impossible to add a substantial number of new viruses without being part of a certain traders world. That is why I decided for, and requested, membership of VTC (Virus Trading Center), the only such group around. Many of the members I knew from the past and although some have been members of virus writing groups none are actively producing viruses. I must say that it was a good move, the exchange of new virus material is solid and worthwhile.

In the meantime I have moved back from the US to Europe where I sometimes have my collecting attempts rudely interrupted by my employer who insists on sending me to weird courses and places. With the addition of a new hard disk and CD writer my collection slowly gets more breathing room. Exchanges of regular CD updates to our collections between VTC members greatly enhance collection stability and prevent possible data loss.

2. Why collect viruses?

When asked about my hobbies the answer "I collect computer viruses" produces reactions that range from "Are you insane?" to "Haven't you got anything better to do?" to "Cool, tell me more!" Like I said before, I tend to compare collecting viruses to collecting other wide ranging categories of objects. Throughout the ages people have been known to collect anything they could get their hands on. The physical form of an object, or the lack thereof it in this case, should not limit someone in collecting it. Most collections will center around someone's fascination with a certain subject and the same holds true for a computer virus collection. This fascination can range from the illicit nature of viruses, the sometimes-innovative programming techniques used to a fascination with so called "artificial life". The "why" does not really matter, the satisfaction it generates does.

3. How to collect computer viruses?

I have told you how I started collecting viruses a couple of years ago but with the rapid growth of the Internet there are now far better ways to start a collection. It is now a lot easier to find a large number of viruses, it just takes a little effort and a little patience. With the demise of BBS's Internet is now the best choice to start a new collection. Getting access to Internet is therefore paramount and a basic knowledge on how to surf the Web is a prerequisite. A lot of viruses, necessary utilities and AV programs can be found on the World Wide Web and IRC. Since many virus oriented channels on IRC can be newbie unfriendly the WWW is presently the best bet to start a virus collection.

When you start collecting viruses you have to set some goals and determine your area of interest. Just like with most collections determining certain boundaries will prevent an overwhelming chaos. Like I mentioned earlier I initially just collected one variant per family. After some time I found that that was too limited since the bulk of viruses are variants of another virus. Another limitation I initially imposed was to collect viruses and only viruses. The underground scene of which computer viruses are considered a part also has areas of hacking, cracking and phreaking. The products of these areas get mixed and this creates confusion for some collectors.

One of the most important features of a computer virus is its replicative nature. Something that is not meant to replicate is not a virus, so a Trojan Horse is not virus. Many Trojan Horses have virus like behavior or payloads but since they do not replicate they do not belong in a pure virus collection. Initially the well-known AV software F-Prot did not even support detection and removal of Trojan Horse software. With the recent rise of many "remote access" Trojan Horses most AV products will now detect and remove this "malware".

Another limitation some collectors have adopted is the technical background of a virus. Many do not consider macro viruses or script viruses "real" viruses since they are so "easy" to produce and the author does not need a large programming background. This is a choice some make but personally I think such a limitation is not realistic. Macro viruses replicate and are now one of the bigger virus problems for the average computer user and should be included in a virus collection. A grey area in virus collection is the so-called "intended" virus. It is not really a virus because for some reason, programming error or compiling error, it malfunctions and does not replicate. Most AV software will detect these samples as such.

Virus collection samples exist in many forms. The original source code, the assembled object file, the linked first generation binary, an infected victim binary, an infected goat file, debug script, UUE encoded script, boot sector images, disk images (Teledisk TDO), incorrectly compiled code and disassembled source code. So what to collect? One of the lessons learned in the years I have been collecting viruses is to save everything you get your hands on. Fill up those floppies (yeah, I know), ZIP disks or CDR's with all the files you get even if you think you are collecting duplicates. Some day one of the files in your archive will scan as a new virus, it still happens to me every now and then. Also I still sometimes delete stuff I should not delete and it is always good to have a backup somewhere.

So even when I basically collect everything, meaning everything ends up on some kind of storage medium the "real" collection is limited by my boundaries. Basically I will include every sample that is detected by a scanner (primarily F-Prot) in my collection. This includes macro viruses, script viruses "intended" viruses, some virus construction kits and now also "Trojan horses" detected by that scanner. Since most scanners will scan inside ZIP files I alphabetically sort every sample into a ZIP file after I have generically renamed the files (FA000001.EXE to FA000002.COM etc.). I have created separate ZIP files for Macro, Batch, Script and Windows viruses since these viruses represent readily identifiable viral techniques.

As I mentioned before I collect anything that touches on the subject of computer viruses. That includes virus creation software, specific polymorphic and mutation engines and any specific tool created in the development of computer viruses (tunneling tools, boot sector dropper tools, routines and other engines). I think this software is essential to a proper virus collection and should be included in the organization of such a collection.
Many of the virus creation kits are very popular in the "underground" scene, are very easy to find and are available through many Internet sources. Due to the nature of the software and the fact that these files "travel" around accidental or even intentional infection of virus creation files is very possible. Like with all files handled while collecting viruses care has to be taken to avoid infection but special care has be taken to include "clean" samples of virus creation software in a collection.

All samples collected will not be of the same quality. Like I mentioned before, virus samples exist in many forms and many of these samples have, to put it mildly, been "around". While going through all these stations many virus samples will lose some if their quality, some will be corrupted and many will be in their umpteenth generation. Arguably the cleanest virus sample is the one generated with the original source code, a so-called first generation sample. Some first generation samples will show different behavior than subsequent generations and many newer and sophisticated viruses will change with every generation (polymorphic). As long as these different generations will be detected as one and the same virus variant they will be detected as duplicates in a collection even though they might not be. Some viruses change so much between generations that they will be detected as different variants of that virus (WM.CAP being a good example).

Ideally a virus collection should have the original source code, a first generation sample, an infected goat file and an infected "real life" victim of every virus. You do not need me to tell you that this is just not feasible. So how much are you willing to let your standards slide? Personally I will accept any sample of a virus that is not in my collection even if it is identified as being "damaged". Some of the earlier VX BBS's had the annoying habit of "stamping" their virus samples with their BBS "addy". This could cause virus scanners to ID the sample as a new variant or even totally miss it. After deleting the "addy" from the sample with a hex editor the sample will scan fine but it remains a unnecessary hassle. So when given the choice between 2 similar samples I will keep the one that is as close as possible to the original.

Compared to a "normal" virus to get from the source code to the binary form of a boot sector virus an extra step is required. A "dropper" will drop a compiled image of the boot sector virus to the target boot sector. I consider these droppers legitimate virus collection material. Ideally they should be combined with images from the infected boot sector.
Sometimes poorly compiled samples (by persons unfamiliar with TASM/MASM) will often be identified as a "dropper" or an "unknown" and until correctly identified will muddy a collection.

Sometimes collectors, mistakenly or on purpose, have the nasty habit of renaming object files to either .COM or .EXE files. Some scanners will identify these samples as "unknowns". One way of determining the value of this sample to a collection is simply linking the object file and then scanning the resulting code. The majority of such treated files are known virus samples. You can also use any hex viewer to quickly view the header of the file, which will often show the file name of the original source code file. After seeing hundreds of these source codes the name often is familiar. Since they are just an intermediate stage of a virus object files, renamed or not, are not material for a "clean" virus collection.

There are many ways of cleaning a file infected by an prepending or appending virus. Depending on the virus and on the software used to clean the file the resulting code can range from "not functional" to "functional" to "pristine". Sometimes a file can be recovered to its original state but often it is just the functionality that is recovered. Remnants of a virus can be left behind and depending on the AV software be incorrectly identified as an new "unknown" variant or even as the original infecting virus. Determining if this "hit" by a scanner really is a virus needs more in depth research e.g. disassembling the virus. Most virus collectors do not have the expertise and the time, nor feel the need to put in this extra effort. Since the majority of the virus researchers and AV companies do have the required expertise and manpower they will research these questionable samples further. The result is that virus collections of researchers and the AV community not only tend to be larger but also of a higher quality than collectors' collections.

Just as there is a wide range of personalities and characters in the general public the same is true for virus writers. Many virus authors name their viruses and if you believe the interviews with these authors many of them think it is one of the hardest parts of virus writing. Other authors do not really care about naming their creations and emphasize the technical features of their viruses. AV scanner IDs often differ from names given to viruses by their creators and it appears that this is many times done on purpose. In this continuous ritual between AV and VX some virus authors even include suggestions for names to AV companies in their viruses. Since the origin of viruses does not have a high priority in naming computer viruses AV companies will group viruses with similar structures, layout or operating ways in families even though these viruses might have different authors and might have originated from different parts of the world. Variants of the Jerusalem virus might be given totally different names by their authors but if the only difference is found in the text contained in the virus an AV product will ID the viruses in the same family / group and will give them similar names.

Virus naming is one of the more confusing parts of virus collecting. During the early period of computer viruses, the late '80's, there was no consensus between AV companies on how to name detected computer viruses and different names were used for the same virus depending on the AV product used. Around the change of the decade an organization known as CARO (Computer Anti-virus Researchers' Organization) formed a committee with the objective of reducing the confusion in virus naming. The virus naming convention that was chosen can be found in a paper called "The CARO way of naming viruses" available on the WWW and included in VDAT.

The convention that was chosen certainly decreased the naming confusion and that was especially useful for a virus collection based on scanner IDs. Although most well known AV companies are members of CARO the virus naming habits of many of these companies have remained confusing and have never reached a CARO structure. There are still major differences in naming between many AV products and even companies like Frisk Software International (F-Prot) and Kaspersky Lab (AVP) that have strong liaisons and technology exchange have a widely differing naming convention. For collection purposes I recommend using the shareware (freeware for private use) DOS version of F-Prot. It detects and cleans a wide-ranging number of viruses of all kinds (DOS, Windows, Macro, and Script) and a limited number of Trojan Horses. The F-Prot scanner is especially useful to structure a virus collection because it detects a large number of variants and names these variants the most consistent of all AV products. Another well known product that detects a large number of the same viruses will e.g. name most viruses produced with the IVP construction kit as "IVP based" while F-Prot will ID specific variants IVP viruses.

Like I mentioned before I mainly use F-Prot to organize my collection. I have established alphabetized ZIP archives with F-Prot ID'd viruses. I have also created separate archives for batch, macro and Windows viruses. The resulting scan log of the archives is then processed by my favorite, VS2000. This successor to the classic Virsort processes scan logs by major AV products and builds database files that can be used to look for new additions to the collection. Whenever I get a bunch of "new" viruses I will scan them with F-Prot, process the log with VS2000 and then have VS2000 produce an ASCII listing of the collection. Looking at the list I can determine what viruses are new to my collection and I will move these new files to my \NEWSCAN directory. I will rename all these new files to a new format FNEWXXXX.XXX and lastly I will scan the \NEWSCAN with F-Prot. After one of these minor updates I end up with two logs, my main collection FPXXXXXX.LOG and my \NEWSCAN log NEW.LOG. These logs processed by VS2000 produce a current collection inventory. Once every two months or so I will incorporate the files in the \NEWSCAN directory in my main archives and after scanning the archives again with F-Prot create a new FPXXXXXX.LOG. My intent is to build a similar archive database for the other popular scanner, AVP. This because AVP will detect many viruses that F-Prot does not detect and visa versa.
All the actual or suspected virus samples are kept on a hard disk partition that I rescan with every new scanner release or new signature definition. This will produce newly detected viruses in many "old" virus samples after a new scanner release.
After every new release of VDAT I create a CDR with all the computer virus related material that I have collected. Not only is it a valuable backup in case of system failures or an accidental infection but it is also easy to have everything together on one single medium.
This is basically the way I sort my collection but that does not mean it is THE way to do it. I know some collectors that have a totally different approach to a similar collection. Another fine tool to organize a virus collection is Tally's VirusKeeper available on his web site. My experiences with it are limited but I am thinking about using it. I suggest you use the method you feel happy with, that lets you keep the collection organized and that keeps the required maintenance to a minimum.

Weeding

You would be amazed at how many times a particular virus sample has been spread, renamed, spread and renamed again. The resulting glut of virus material is immense and to keep unnecessary duplication to a minimum weeding the viral material is a must. Many programs to weed any kind of file base are available, most will use some kind of CRC method to find duplicate files. The best known among virus collectors is TBWEEDER, which is still available all around the scene. Other suitable programs are PGWEED, VWEED and Rose's File Weeder (also capable of file renaming). I suggest collectors start using file-weeding software from the outset of their collection. It saves considerable disk space and more importantly, scan time.

Unpacking / Decrypting

File packing software like PKLITE, Diet, LZEXE etc. will considerably change the binary image of a file. Some virus authors incorporate the packing technique in their creations, which will result in a "packed" virus. These "packed" viruses should be detected by software that have "unpacking" support. Sometimes viruses are, intentionally or unintentionally, packed after their initial compilation. This will produce varying results, some scanners will properly detect the sample, some will detect variants and some samples will be completely missed. For collection purposes it sometimes helps to unpack a packed sample just to see if any telling ASCII strings can be found in the sample. Packing a virus should not change the functionality of a virus, sometimes it does change its detectability.
Encrypting a virus sample with software like Protect, ProtEXE etc. has similar effects on functionality and detectability but the objective of using this software is totally different. While packing software is used to decrease the size of virus samples or to limit the file size increase when using prepending or appending viruses, file encryption software is mainly used to hide the "tricks" the virus uses. Similar to incorporating packing features in the virus source many viruses today use an internal encryption scheme to avoid detection. Many mutation and polymorphic engines used in viruses today have been developed to make viruses harder to detect and if detected harder to unravel.
Many tools to unpack or decrypt popular schemes are available, the ones I use are UNP and GTR (Generic Tracer) available as free/shareware.

Disassembling

Disassembling virus samples is a time consuming task that requires some knowledge of assembler language. Quite a few virus authors use disassembling software to understand a particular virus and to learn new tricks. Certainly virus researchers of AV companies will use the disassembling method to find ways to detect specific viruses. For a virus collector the task of disassembling viruses if far too time consuming to do effectively. With the number of new viruses found every week it is impossible for one or two persons to disassemble all of them and the level of assembler knowledge is not always as required. Best to leave it to the professionals.

Goat files

One of the hurdles of researching or dissecting a new virus is that a victim file often blurs the exact specifics of that virus. Victim files are often large and the exact binary image of the virus is often hard to find since the exact image of the clean victim file might not be known. For the purpose of exactly determining what a virus looks like researchers have developed so called goat files. These files have an exactly known layout and length and when infected by a particular virus the difference between the clean and the infected goat file will tell researchers a lot about the virus. Infected goat files can also be used to determine the differences between certain virus generations.
Many goat files are specific to the file structure (COM, EXE, NE, PE, SYS, Word document etc.) and mostly contain instructions to create a certain file length (NOPs, etc.). Goat file sets will have multiple pre-determined file lengths since some viruses have instructions to avoid target files smaller or bigger than a certain length. Many viruses have anti goat routines programmed into them causing goat file programmers to create even more sophisticated goat file generators.
Popular goat file packages are Goat File Creator Package (COM, EXE, NE, PE, SYS), MakeGoat (Word) and RoseGoat (several formats). A considerable part of most virus collections are infected goat files and they are often easily identified by their structure and some inserted strings.

So how do you start a virus collection? With the popularity of Internet it has become a lot easier to find virus samples but even now the cry "I need a virus, where can I get one?" can be heard many times. The simple use of a search engine with "virii" as a search condition will produce hundreds of hits. This completely incorrect and dyslexic plural form of the word "virus" (How about the plural plural of "viriis" I have seen being used) will filter out many legitimate hits concerning human or animal viruses. Looking for "computer virus" or "computer viruses" will most often produce links to AV sites. When one pro-virus site has been found the available links will most likely produce more and then it is just a matter of following links and downloading material. A shortcut to finding your own links is to visit Tally's Link Reference, the largest collection of computer virus related links available on the web. A VDAT version is available too.
Now is just a matter of getting the large and available virus archives and the application of method and organization.

Trading

Just like with any other collection (stamps, dolls, you name it) at a certain point the collection stagnates due to lack of new material. The best way to increase the quality and quantity of your collection is to start trading. I know it must seem pretty hopeless to start out as a new collector and hear about all these collections of tens of thousands of viruses. Perseverance and luck should help. Sometimes you come across something everyone has missed and this can be used trading for other stuff. Having the right contacts also helps, don't start yelling at and demanding from them though. You'd be amazed how much goodwill a little politeness and sincerity will get you. Do not expect to go from 10 to 10000 viruses within a day either, not only is it not feasible but you would also skip part of the fun of collecting.
Even after collecting for years a certain "dead zone" can be reached. One has collected the majority of the available viruses and the collection seems to stagnate. When this happens to several collectors with similar objectives it is time to organize the trading. It is time to find new sources and to find especially hard to get samples. The resulting collective of virus collectors (is this case the Virus Trade Center) seems to produce better results than the single members put together. Here perseverance and luck added to reputations and connections that have been build throughout the years help. The result is a collection of viruses hard to equal in quality and quantity outside AV archives.

Self protection

Handling computer viruses can be hazardous, even though I think less hazardous than most people would like to make you believe. The majority of computer viruses is harmless and can be easily detected and cleaned if necessary but some can definitely ruin a perfectly fine day.
All precautions you can imagine will not eliminate the risks of a computer virus creating havoc on your own (collector's) computer. If you are not prepared to handle this eventuality DO NOT start collecting viruses. I have had many requests for viruses from people who did not even know computer basics ("How do I start an ASM file?" etc.). Do not collect viruses because you think it is cool or for malicious purposes. Even without a virus collection you run a small risk of being infected by a computer virus but inviting them on your computer certainly increases this risk.
There are many ways to minimize the risk of infection and not all of them are expensive. Here are a couple of my suggestions:

  1. Use a separate computer for all your virus material. (Not cheap but feasible)
  2. Use separate hard disks / partitions for all your virus material.
  3. Use several (more than one) AV programs. Resident and/or on-demand.
  4. Use a behavior blocker.
  5. Use of ZIP or other archives helps prevent accidental execution while still being able to scan the collection
  6. Have a clean boot disk with an AV program and any needed tools and drivers.
  7. Perform regular clean-boot scans
  8. Limit any form of auto-execution

4. Collection contents

Code

This should of course be the main emphasis of any virus collection since this is the essence and ultimate form of a computer virus. As mentioned before computer virus code (binary) can have a varying degree of quality depending on where the sample has come from and where the sample has been. Ideally the sample you collect should be as close to the first generation as possible but as long as the sample is identical it is good enough. With the large number of new viruses detected every week it is very time consuming to compare multiple samples of the same virus and to select the one with the highest quality. Personally I save every sample I come across, maybe someday I will have time to do a quality comparison of viruses in my collection.
Most collectors today will measure their collection using the number of hits they get in their scan results. Many of these scan results contain hits that show "unknown", "damaged" or "dropper" viruses. While detected as viruses they might not be fully functional or even new. Quite a few "unknown" viruses will later be detected as "known" viruses when an updated signature file is used. Until other samples of these viruses can be found you should include these samples of lesser quality but whenever possible they should be exchanged for better samples.
While scanners like F-Prot are known for detecting a multitude of variants they are not perfect. My original collection contained virus samples straight from the source (the author) that were never detected by F-Prot. Maybe these samples have never reached the AV researchers, maybe they are not viruses (I doubt it since they replicate). Some samples I have in my collection have been detected as "new or modified" and "could be" for the several years without ever being positively identified. Also I have some samples of similar viruses that still are different (when disassembled or hex viewed) that are ID'd as one and the same virus.
The very good scanner AVP will identify many viruses it detects based on its origin. Most viruses created with virus creation kits, while different, will be identified with the same ID. (NRLG-based, DREG-based etc.). This system is not always used consistently since some viruses created by these kits will be identified in detail. If used for collection purposes this lack of detail in identifying samples will hamper the sorting of a virus collection and necessitate the use of multiple scanners to produce multiple sortable scan lists.
This leaves the collector with the dilemma on how and what to collect. Personally I mainly collect using F-Prot scan results while building a parallel collection using AVP scan results and one of material which I know to be viral but that is either not detected or not identified to the necessary detail. I'll leave the choice for your collection up to you.

Sources

Source codes are the recipes of computer viruses and will tell a lot about the techniques used in and features of viruses. They exist in many forms ranging from the original (author's) code to many (commented or not) disassemblies. While not all source codes will allow for (re)creation of the virus (quite a few will produce errors) many will give a collector the opportunity to compile a clean virus sample. The availability of these source codes has aided the development of many virus variants, some virus author "wannabes" just change some ASCII strings to produce their "own" viruses resulting in virus Xxxx.ZZ. The lack of technical know how with most of these "wannabes" has led to the large number of garbage and intended viruses available in most collections. Knowing how to use an assembler and linking software is a prerequisite to creating viruses and the improper use of this software is a contributing factor to some of the virus junk around.
While certainly a part of my virus collection source codes are not the main emphasis of it. I will save all I can get my hands on and I used to pair source code and binary for my subcollections when I still released those. Nowadays I simply don't have the time to actively research all source codes I have and I limit myself to using them for research for VDAT.

Zines

One of the most important sources of information for collecting viruses is the continuous release of new VX e-zines. Not only do most contain code and sources of many new viruses, they also offer the collector a lot of important insider news about the scene. Even if the information offered is not important or even rehashed old information the zines themselves are collector's items. The diversity of origins is large and finding a new or unknown zine from some new source is always fun. The quality of e-zines runs from the basic few Kb ASCII based text file to the large, graphically complex and user friendly (mouse and/or menu driven) professional product. A disadvantage of the large diversity of languages used is that a lot of them need advanced lingual skills to read. While the majority is written using the English language, zines using Spanish, Russian, Korean, Mandarin, Portuguese and Italian are known.

Tools & engines

Next to the pure binary code and source code of a virus any tool that is used to improve or create a virus can be a part of a virus collection, it depends on where the collector in question draws the line. Many of these tools have been developed as part of creating a computer virus but quite a few have been developed as a separate tool, which can be used in any virus by either incorporating it, or by applying it after the initial virus development.
Most tools and engines originate from the continuos intellectual battle between AV and VX. Any techniques developed by one side will often cause the development of opposing techniques from the other side. The VX aim to avoid detection has caused the emergence of many polymorphic and mutation engines. The ability of anti virus companies to detect the viruses using these engines has led to even more sophisticate engines.
Other tools for a virus collection are so-called boot sector injectors that will allow for the insertion of specific boot sector viruses on selected boot sectors. Although called virus creation kits by some they are just virus manipulation tools since no new viruses are created in the process.
For collection purposes I use VDAT as a guideline on what to collect and I update VDAT with collection material I find that has not been included. I have heard of quite a few other collectors that also use VDAT as a collection guideline so it is a good place to start if your are looking for guidance.

Creation kits

In collecting virus creation kits for a computer virus collection you will notice some ambiguity. One of the first items a new collector or newbie author will try to get is a virus creation kit. While most virus authors denounce the use of these kits as being "lame" almost all of them are created by these same authors. Most will see them as an exercise in programming created for and to be used by "lamers". Some will acknowledge that although they can be a good tool to "learn" how to make viruses they lack the originality and technological prowess many virus authors pride themselves on.
Started by NuKE's Virus Creation Laboratory (VCL) and Phalcon Skism's Mass Produced Code Generator (PS-MPC) they were quickly followed by others. Most viruses produced with virus creation kits can be detected and cleaned by any self respecting AV software but the simple fact that you did not have to have any knowledge of computer viruses to create one has always been an attraction. Due to the ease of virus production with a construction kit a large percentage of viruses detected have been created by such kits. Although not inherently dangerous when using a proper AV program the mere number of them creates slower scan times, bigger signature files and just a nuisance in general.
The virus creation software I have seen ranges from the simple, generic virus cloning tool that just reproduces existing viruses to relatively sophisticated software that produces original code with many user selectable features.

With the popularity of Windows 95/98 many DOS viruses became limited in their functionality and since most popular virus creation kits produce DOS viruses their use was similarly limited. The relative ease with which authors started creating macro viruses caused some of them to try their luck at creating a macro virus creation kit. It appeared that the creation of such kits was just as easy as creating viruses. The popularity of Microsoft Word and the ignorance of its users and Microsoft caused the macro virus to be the most encountered virus of today's computer world. Futile AV patches from MS Word 95 to MS Word 97 did not even slow down the construction of macro viruses or of macro virus construction kits. On the eve of the release of MS Office 2000 the first virus compatible with it has already been released. This relative ease with which these viruses are developed and the multiple features of VBA has caused the release of many macro virus generators, including some for Excel and Access.
The first kit creating Windows viruses was supposedly released by the Hackerz Networkx but the group disappeared before this could be confirmed. No doubt someone else will pick up where they left off and we will see a Windows virus creation kit in the near future.
For more detail on virus creation software I suggest you browse through VDAT's Creators section.

Tutorials & Essays

Tutorials and essays can be considered pillars of the pro virus community. Most virus creators are only too happy to share their knowledge with the rest of the world and will write extensive tutorials on a wide-ranging number of pro virus subjects. A lot of virus authors are self-taught and use many of these tutorials as learning material. The main source of these tutorials are the ever-present e-zines release by many groups in the VX community.
As a virus collector these tutorials can be collection subjects and sources of information. While I do not collect them separately (I keep them with their source / e-zine) I definitely use them in creating every new release of VDAT and as such VDAT is my complete tutorial / essay collection.

5. Philosophies

Codes of conduct

To put it mildly computer viruses are frowned upon by the majority of computer users, professional or amateur. The main cause for this attitude I think is the popular (mis)conception that all computer viruses are destructive. Depending on the exact language used computer viruses can be considered invasive and an uninvited waste of valuable computer resources like CPU cycles, storage space, memory or bandwidth. That is why being anything but opposed to computer viruses is at least considered poor judgement. Opinions range from "bad taste" to "criminal" and some countries and some US states have included computer viruses and their usage in computer crime laws. Since computer viruses don't observe geographical or political borders they appear and originate all over the world. Collecting viruses can range from legal in one country, to not illegal in another to punishable by law in a third country/state. Regardless of the legalities around computer viruses I think certain codes of conduct have to be observed to be able to successfully start, maintain, improve and enlarge a virus collection:

  1. Screen requests for virus samples:
    • Do not purposely spread viruses to unsuspecting computer users.
    • Do not honor requests for viruses with the purpose of destroying data.
    • Determine level of know how of requester and act accordingly.
    • Honor requests to keep samples private / within a certain scope.
  2. Protect identities of virus sample suppliers when requested to do so.
  3. When asked to trade observe realistic trading quota/ratios.
  4. Do not "pad" scanner logs or trades with junk.

AV connections

Connections and exchange of files / know how within the triangle of the anti-virus community (commercial or otherwise), virus collectors and the pro-virus community are generally more an exception than a rule. Most members of the AV community tend to be hesitant to show any such connection publicly although I'm sure these connections exist at least at some levels. Rumors that AV companies and personalities monitor and visit the WWW and IRC using pseudonyms/handles can not be confirmed. Personally I've been approached a couple of times by members of the AV community with some general questions but never with specific requests.

Chances of getting virus samples from AV researchers / companies are slim. The agreed codes of conduct within the AV community generally preclude exchange of virus samples with someone outside this AV community.

Collector vs. researcher

A virus collector is not by default a virus researcher although some would like to think they are. The reverse does not have to be true either but often is. Both touch on certain similar areas of interest but they have different goals. While the collector does not always care how a particular virus works as long as it is a "new" virus for his collection the researcher's main objective should be to research the specific techniques used in a virus and as a result possibly find a cure for the virus. Some researchers are only interested in the workings of a virus as a computer program or as a perceived form of artificial life.

Collector vs. writer

Similar to the previous relationship a virus collector is not by default a virus writer but many virus writers tend to collect viruses at some level. Some writers will collect for the sake of collecting but most collect only certain viruses with the intent to learn / leech techniques for future use. Outside the general archives of viruses available on the WWW virus writers are a primary source for new viruses and relationships between collectors and writers should be carefully cultured. These often close relationships will cause the anti virus community to think of collectors and writers as part of one and the same "underground" faction. Most virus collectors though are not virus writers and should not be labeled with the same morals as the standard "disgruntled, frustrated, revenge motivated teen".

Legalities

Depending on what part of the world you are from any act involving a computer virus might be illegal. Sometimes laws are straightforward and specifically mention the term or phenomenon of a computer virus but more often the laws become a matter of interpretation. Depending on the exact terminology used in a law, computer viruses might be considered part of computer crime or there might be a definite legal difference between e.g. creating or spreading a virus. Some laws will allow for consensual virus exchange as long as all parties are aware of the material being exchanged. Most laws that address the phenomenon will prohibit the malicious or subversive use of a computer virus.
Unless you are looking for an exciting hassle with your local (federal, national or state) law enforcement I suggest you read up on the latest computer crime / legalities / laws of your area.

Common views

Only a small part of people using computers knows more than average about computer viruses. The majority considers them dangerous and an invasion of their "computer privacy". The facts about viruses most known to the world not involved with viruses are the facts that are the least common to those that are involved. The majority of virus creators consider damaging payloads unnecessary, lame or even unethical (yes, even they have ethics even though some others might not agree). There will always be a certain tension between what all parties that are involved consider to be true and what really is true. This will also lead to a lot of hype, some harmless, some pretty stupid, and some intentional. Many a scare has been proved harmless but somewhere someone will have benefited from the scare while the general audience is getting more and more confused.
Another feature of these virus hypes is the virus hoax, the ignorance and gullibility of many people has created waves of these virus hoaxes. For more information on these hoaxes check out The Hoax FAQ, It's the end of the world as we know it, Hoaxes and Hypes and visit Rob Rosenberg's site on the subject.
The less than positive views on computer viruses lead to many knee jerk reactions. Most web space providers will not allow viral code on their member's web sites and some will not allow any material that could be considered pro computer virus (tutorials, ezines etc.). Books written about the subject immediately receive the "underground" stamp and Mark Ludwig's book Little Black Book of Computer Viruses has been banned in France because it is too specific in explaining computer viruses.
For me this entire hubbub adds to the joy of collecting viruses, it definitely never is boring. Others don't like this part of collecting viruses and they just accept it, but there is no way to avoid it if you want to collect viruses.

6. Information

Quite a lot of detailed information can be found to help shape a virus collection. Most AV companies maintain online virus databases that can be a great help when initially structuring or when maintaining a collection. I personally find the online database AVPVE by AVP (http://www.avp.ch/avpve) the most useful. Many fellow collectors find my virus scene database VDAT a helpful tool to use as a guideline. Even though VSUM has lost a lot of its value it still can be of use to find small details that help collecting. It is still updated at irregular intervals (http://www.vsum.com).

As mentioned before Vesselin Bontchev, a well-known AV guru, has written a virus collecting / research essay that can be very helpful to a virus collector although it is aimed more at the virus researcher. Called "Analysis and Maintenance of a Clean Virus Library" it can be found on several web sites and in VDAT.

Closing remarks

I hope this little essay will be of help to others starting or maintaining a virus collection. It is by no means meant as the sole solution, many other techniques are available and specific personal preferences will lead to variants of this solution.

Additionally I hope that I have also explained and confirmed the position I think I have in the sometimes-murky scene of computer viruses.

As always I'm open to any positive or negative suggestions or comments in general.

Cicatrix
March 1999