Mid-Infection on relocations
by b0z0/iKX, Padania 1998


1) The story so far:

Midfile infection is undoubtely one of the most interesting but not yet totally explored topics in virus writing. There aren't actually many midfile infectors around and there are even less "real" midfile infectors. With "real" I mean viruses that get control in a random moment from the host, not just viruses that places themselves physically in the middle of the infected file but anyway gain the control immediately at the start of execution or viruses that gain control at the beginning of the execution after jumps or pieces of code placed somewhere in the middle of the host. This kind of midinfection has been more developed and is anyway easier to implement. Of course it is also easier to detect such a virus with a good code emulator, since it gets control directly or after some garbage code quite soon. While the other kind of midfile infectors that gain control somewhere in the middle of the host execution are undoubtely more complicated and more interesting. For example there a few viruses (Nexiv_Der, Sailor_Moon and The_Bugger) that put a CALL in the infected COM file that passes the control to the virus from a random position of the file. Of course they have to execute the host step-by-step to be sure that a valid instruction is real starting there, check that the COM files aren't selfmodifying and so on. Of course the CALL to the virus may not always be executed, depending on the actual program execution. For an antivirus is very hard to find such a virus, since to get to it's entrypoint the antivirus should have to emulate the entire COM file, trying all the possible combinations of the program execution. And this is quite impossible. At least some antiviruses will try to examine some amount of bytes at the end of the file, even without being sure it will became executed, to scan for viruses. But this lame scanning is simply avoidable by the virus by using a good poly engine. But COM files are fading away, there are just a few of them around on newer operating systems and programs. Under both DOS and Windows (3.x/95/98/NT/...) you quite don't have anymore any COM files, but rather more types of EXE files. The format and organization of the DOS EXE and other Windows executables are anyway more complex and midfile infection should not be as easy as in COM files. But more complex doesn't mean impossible to implement, and some features expecially when working under Windows in protected mode can came in our help.

Ah, from this point ahead some knowledge of EXEs infection and EXE structure is required, but I'll try to put some references to other material and to explain as easier as possible the main concepts.

2) Infection on relocations:

Since on all kind of EXEs, from DOS to various Windows ones, we don't have anymore just one 64k segment to work on like with COMs, we can't just put a CALL to our code, since it should be too far away (with DOS EXE) or should be just in another code segment (for example when dealing with NE files). So one way to get control from somewhere in the user code we should use a CALL FAR or a JMP FAR that will jump to a given pair segment:offset. Of course since we don't know where the operating system will load the various segments in memory we need to use a relocation, this is correct the adress where we are going to jump depending on where the stuff has been loaded. The calculation of this adress in memory should be done in some cases manually with a few calculations, but it is always done at load time by the operating system. Infact when the operating system loads the executable that is going to be run it checks in its tables (I'll be more precise for various EXEs later) for adresses that need to be corrected, or with another word need to be relocated.

So if you would like to call your code from the original host you should put a JMP/CALL FAR using the physical offset (this is the offset expressed as pair segment:offset in the real file on the disk) and then add an entry in the relocation table that will make sure the real adress of the code loaded in memory will be put instead of the physical one. There is noone that prevents you from adding some relocation entry, you will just have to mess around with various structures and look for avaiable space, depending on the type of EXE. While you should get some ideas from this I'll now focus the attention on a method for NE infection based on relocations, but that is slightly different from the concept introduced above.

3) Implementation in the NewEXE:

It's well known that Windows programs are everyday bigger and bigger and so as you can imagine jumps from one NE segment to another (each has a limit of 64k) are very common. Even more common under Windows is the use of APIs to perform various tasks. In both cases the programmer has to code an adress that will be dinamically at load time updated with the actual real position of the needed piece of code in memory.

In this method of infection we will modify one of those entries to point to our virus code. In this way when the original host will have to call that API (or jump somewhere or whatever) the control will pass to our virus. The virus will do its work and then will do the call (or whatever) the host was executing before infection.

It's not so hard, so let's go step-by-step considering we already have our file to infect. I won't present code here since it should became messy and you can anyway find quite commented code in the Free_Padania virus that comes with the Xine that uses this method. Now go on:

So shortly this restoration part:

Of course there is just a minor thing: the adress after the JMP FAR needs to be relocated as the old funcion! So we can just simply add a relocation that is just the same as the old infected one, just with the difference that we have to put the right offset to it in at offset 02h.

And that's it. Now when the host will call the infected call (very usually an API under Windows) it will pass the control to the virus. This will do its work, execute the original call and then continue the host code. Pay attention that the virus should be called more than once during the execution of the program! So for example if your virus is polymorphic after the first execution you must change the code in memory so the decryptor won't be executed again or the effects will be crashes and such like, since the virus has been already decrypted in the previous call! This is anyway easy to solve with a JMP over the decryptor. This will also make the execution faster, since just the first time the virus is called it should be nice to check to be activated. As for the rest of the times the virus is called (if you infected a very used call it should be called even twenty or more times!) it should just directly jump to the JMP FAR to old code without losing time at residency check and such, because anyway we know we are resident.

Before discussing of the aspects of this way of infection I would like to point out about some things I founded with relocations when coding this method. On a lot of specifications and books I haven't founded about this, but it should be interesting at least for those, like me, that don't have this documentation :) This should be of use to someone in future projects maybe.

You'll be able to find written that to make a call to an API for example in pure assembly you'll have to do something like a:

        call far ptr 0000h:0ffffh       (err, db 9a,ff,ff,00,00 of course)

and then put in the relocation entry at the bottom of the file that the adress where the ffffh is must be relocated (as a 32bit PTR). This is true, but isn't the only way it is done. While studing NExes I founded that with one relocation entry you can relocate as many pointers as you want. Infact if instead of putting the 0ffffh you put the offset (calculated from the start of the segment) to another pointer then it will be relocated too. So:

By putting just one relocation entry on 0001h:0101h (the first of the "chain") all the other ones in the example will be also relocated, since the first points to the second, the second to the fourth and the fourth to the third that, since it has 0ffffh as offset, is the last to be relocated with the used relocation entry. Of course this works when relocations are in the same segment (module).

Of course this a great advantage for this method of infection, since by infecting the first relocation entry in the previous example the virus will be called from four different points, so the probability to get a chance to get to virus code is even bigger. Since all the four points originally wanted to be relocated as the same call then also the restoration code in the virus will be just one for all of them. Of course we also don't have to change (and that's better, one work less :)) ) the code that is going to be relocated, since if we would put a typical call far 0:ffff to the virus it should break the chain and some code shouldn't be relocated thus making windows crash (and here is where I had initally problems and so I had to investigate WTF are that values different to ffffh even if the fucked manual doesn't talk about them :) ).

4) Pro and cons about this way of infection:

First let's try to see the good aspects of this type of infection. It is simple to see that the virus should get control in a quite random moment of the execution. The virus should trigger just on some special events and so it is not so simple to spot it. For example it should just activate when a dialog box is going to be drawn (if it infected a call of the COMMDLG.xx) or just when the user will exit from the application or when the user uses the cos function in the calculator or something else. The time the virus needs do its work in this way isn't concentrated anymore at the beginning of the program execution, so this should also decrease possible user preoccupations.

On the other side in this way the virus should not always get activated. If it infects a rarely used call it should not get control. But of course this should be solved by infecting the file more than once or by scanning for some typical APIs to be infected (this should be anyway harder since you should have to find first the module reference number and then look for the typical function. But with a few given assumpions it should be done quite easily), like an initialization or exit procedure.

Another "bad" aspect that many should notice is that anyway there is a reference to our virus from at least one relocation, unlike with the COM infection mentioned before where absolutely nothing points to the CALL. Well, yes, an antivirus should definitely scan all the relocations of all the segments of the executable and analyze the code the calls points to. But, man, even the most banal NExe has hundreds of relocations (for ex. CLOCK has about 90 and CHARMAP about 120... and they are just 20kb files... try to give a look to something like Word ;)) ) and scanning up and down the entire EXE would make the user waste a lot of time and shouldn't be reailable. Infact if the virus should have a quite good poly engine then the antivirus should need to emulate kbs of code of every relocation. Actual AVs don't do something like that, but I don't think that AVs will going to implement this sort of scanning over all possible relocations as entry points to the virus. Should get too much time on scanning of every file. Heh, no, the user will eat the AV producer's hat ;))

AVs should anyway dumbly try to check the last segment (or the segments that are shorter than a predefined value) for viruses. Well, yea this should work fine with non polymorphic viruses but with a little of work from the VW this method should be easily made ineffective. Just put some bytes of nonsense code at the start of the virus segment, then make the poly decryptor (where the call will point to, of course not to the nonsense bytes before :)) ) that decrypts the body and maybe again some nonsense bytes after it. Of course you'll need to anyway keep clear the dwords to be relocated, but that's not a big flaw. The Free_Padania sample virus uses the scheme described above, so the virus segment looks like

.----------------.-----------------------------------.----------------.-------.
| nonsense bytes | poly decryptor and encrypted code | nonsense bytes | reloc |
'----------------'-----------------------------------'----------------'-------'

Where the reloc is the relocation item entry. The dword to be relocated (the 0:ffff) is put somewhere in the first chunk of the nonsense bytes and the offset to it is stored in the encrypted code. The nonsense bytes have the job to make the scanning for the entry point of the poly harder, so just trying to begin emulating from the beginning of the segment won't work and a scan of all the relocations in the executable should be needed. Of course this must be used with a good poly.

5) Possible add-ons:

Some things that should be done to make the stuff better:

That's a lot you should do in NExes, just let your mind go wild :) Even if NExes are not used anymore, the format is very interesting. It is quite complicated, thus a lot of strange things can be done to make the life of AVs look like hell :) Even in the newest Windowses (like 98) there are many NExes (some of which very used, like SOL.EXE :)) ), so at least they should be used to keep the system permanelty infected with a good infection scheme.

6) Implementation in other executables:

I explained and brought the code for a NE infector on relocations. What about with normal DOS executables? The things there are a lot worse. The general idea of the relocation should work, but the problem is that DOS isn't in protected mode, so you must also pay attention that the original program doesn't overwrite your code! This is the big advantage in working in protected mode, we don't have to worry that someone will acidentally overwrite our code since if someone should try to do a general protection fault will occour. In protected mode applications are a lot more correct than the DOS ones. Infact a DOS application should just write without any limitation anywhere. And it is very usual that DOS programs just assume that after their body (the original one, not the infected) there isn't anything vital and so they just write there (and should corrupt our virus). This of course should be prevented by working like the Nexiv_Der, that is by putting a definite char after the loaded program in memory and at the end of the execution try to look for a place big enough that hasn't been modified. But, as for personal tests (I still have a sorta prototype of infector on relocations for normal DOS exes that I wrote some time ago somewhere), the number of possible victims will be very limted since a lot of them use memory (at least for stack) after the body. Yea, you should just search for space a lot later in memory (and so leaving some hole between the host and the virus code where the original host should write) and hope all will be ok. But it is definitely a not too secure way of infection even if many checks are done (since the code should be executed differently from the time when the infection was done) and a lot of code is needed to implement it. But it actually should be done of course. As for midinfection in PE since the methods and theories behind that are quite different I'll talk about that in another article, look for it!

7) Appendix:

Here are some tables and such thing as I refeered mostly in the text. For more (expecially about the NE header which you'll need to undestand too to add the virus code) check out Ralph Brown Interrupt Lists (AX=4b00) and Qark and Quantum article about NewEXE infection in VLAD #5.

Format of new executable segment table record: