Foreword

Many Information and Cyber Security professionals have overlap with their responsibilities regardless of title. It’s important to break down the one skill that every technical security professional should have, which is the art of High Impact Security Analysis and Communication.

Intro: Improvise, Adapt, Overcome

Bear Grylls - Adapt Improvise Overcome

“Once you know how to do something it’s easy, but the moment something changes in how you do it, it becomes hard.”

Think of the skills you have and include those you take for granted (e.g. walking, cooking, driving, communicating), did you learn them overnight? Of course not! You spent time and energy to master them in a way that was natural to you before it became second nature.

Now what happens when you fundamentally change your ability to perform these actions:

  • Walking: Imagine you break your ankle and cannot walk, so you need to use crutches.
  • Cooking: Imagine you have no power available or your stove/oven breaks.
  • Driving: Imagine you jump in someone’s manual vehicle when you have only driven an automatic car.
  • Communicating: Imagine you need to resort to sign language or another form of communication because you lose your voice or are trying to communicate with someone in another language.

Whilst still possible, without some extra training or tools to get you started the task becomes exponentially more difficult; the same applies to being a security analyst.

Defining an Analyst

Charlie Conspiracy - Always Sunny in Philadelphia

The definition of an analyst is someone who performs analysis, and the definition of analysis according to the Cambridge dictionary is:

“The act of studying or examining something in detail, in order to discover or understand more about it, or your opinion and judgment after doing this”

Because security of a system, identity, or an organisation’s information relates to everything that poses a risk to it, the idea of a security analyst is akin to that of a miracle worker, where the scope of what you are responsible for and trying to protect can be as small or as large as someone makes it.

With a seemingly endless amount of software, systems, and threats posed to these systems it’s no wonder that security analysts can experience burnout and not understand exactly what they need to know to be successful in their role. Some other roles with unique titles have spawned from this ambiguity, but at the end of the day they simply indicate a niche area that someone is to focus their time performing analysis.

Some examples are:

  • Vulnerability Analyst
  • Malware Analyst
  • SOC Analyst
  • EDR/MDR Analyst
  • Intelligence Analyst
  • Threat Analyst
  • Forensic Analyst
  • Information Security Analyst

So, what generally happens in these roles? Their required skill sets become blurred and generally wind up requesting similar forms of training and certifications.

On the Job Training and Certifications

Meme

Often when beginning in a new role you receive some on the job training that may teach you about workflows, tools available, and how to retrieve the information required to perform in your role. Sometimes a business may also put forward money to get you industry recognised certifications, and although this may teach you some technical skills and tools that can be used to solve certain problems it doesn’t necessarily teach you how to perform analysis on something you have never seen before.

Coming back to the definition of analysis, an analyst’s job is to study or examine something in detail to understand more about it and make a judgement on it. If you have seen something before then your unique experience means you will rapidly be able to make a decision and know what analysis needs to be done to come to an outcome; however, if you haven’t seen it before then how do you perform effective analysis?

Fast, Good, and Cheap - Pick Any 2

Meme

In project management there’s a concept known as the iron triangle or project management triangle. A common derivative of this which is used in many businesses today is simply that you cannot have something that is fast, good, and cheap, everything comes with a trade-off. This same premise can easily be applied to working as a security analyst, but it comes with some caveats we’ll touch on in just a second.

From here on out I’m going to use the example of a Security Analyst who works in a Security Operations Centre (SOC) or a managed EDR/MDR function and use these terms interchangeably. At the time of writing I have spent the majority of my career having held Senior, Principal, and Manager roles in this space, and something I frequently get asked is “what skills do I need need to excel in this field”, so figure this is a good opportunity to shed some light.

Imagine you’re a business and pay for a new SOC analyst, let’s call him Jack, and there’s already a senior SOC analyst, let’s call her Sally. Both analysts are working a shift together and see an alert come through from a server:

EDR Alert: Mimikatz was detected - A command commonly associated with Mimikatz was run on the system
Process Executable: C:\Users\Gavin\Music\1\mimikatz.exe
Process Command Line: C:\Users\Gavin\Music\1\mimikatz.exe privilege::debug
Signature: Unsigned
User: Gavin (SID: S-1-5-21-1001356378-1477238915-642007331-1000)
Logon Type: 10
Host Services: RDP (3389), HTTPS (8080)

In this instance Jack picks up the alert and begins to perform analysis on what he is seeing.

  • He first takes a look at what the executable hash is on VirusTotal, but it hasn’t been seen before.
  • He begins to google mimikatz, as this isn’t something he is familiar with and begins reading up on it.
  • Whilst Jack is reading into this, Sally notices the alert and immediately isolates the server from other systems in the environment, locks Gavin’s user account, and logs him off of the system.
  • Sally begins working with Jack to explain what this is and write up a report on what they are seeing and further actions that can be taken by the impacted organisation.

Why did Sally take an action to lock the system down when Jack didn’t? Fast, Good, and Cheap - Pick Any 2.

In this instance Sally had previous experience that allowed her to confidently take an action to respond Fast, and Good but her salary as a senior means that it won’t be as cheap for the business to do so, and the aggressive response actions taken means it may not be cheap for the business impacted given a server has now been taken offline. Inversely Jack would have been more thorough and have had a good, and cheap outcome, but it wouldn’t have been fast which would have been detrimental to the impacted business.

This analogy does breakdown in that many analysts don’t come to a good outcome even after considerable time is put in, and as we know it time is money, so you can’t have a good and cheap analyst because they would be slower and so fundamentally not cheap. As an analyst though we can have a good outcome for cheap, it just takes time to learn and get exposure to different technologies.

It’s also unlikely that you’ll have all the information required to make a decision in this single pane of glass, but for the purpose of simplification we’re assuming all of these details are in the alert.

Time is not Experience, but to Have Experience You Need Time

Meme

“Time doesn’t equal experience, but you can’t have experience without time.”

I’ve interviewed anywhere from 50-100 candidates in my career at the time of writing and have gone through hundreds of CVs. One thing that remains true is that the time spent in information security does not determine your experience in a given role, it’s all about your exposure to various technologies, tools, and tradecraft. Not only this but experience can come from your own time spent learning about a concept, no seriously this can make you a much better analyst than someone who has done the bare minimum in a working role for a year or 2.

The primary reason time can go in without a good outcome is because an analyst doesn’t have enough foundational knowledge to build off of based on what they are looking at, or they haven’t learnt how to perform the art of High Impact Security Analysis and Communication.

In the previous example with Jack and Sally there’s multiple approaches Jack can now take to move forward:

  • A: Jack could let imposter syndrome kick in for not knowing what he was looking at and sheepishly continue trying to work.
  • B: Jack could make a note to read more into Mimikatz and understand it when he gets some free time after work.
  • C: Jack could get a mentoring session from Sally to learn from her experience.
  • D: Jack could read Sally’s report sent to the impacted organisation to learn from it.

Only one of these options doesn’t result in growth (Option A), the other 3 options all result in growth, and doing all of them often results in the most growth (but takes the most time). This is where as an analyst, regardless of the time we have available in a working day, if we want to grow time needs to be spent on self-learning and understanding.

Taking hypotheticals away, the way I grew as an analyst was often to take approach B and D. I would read, experiment, read some more, and write to confirm my understanding of a particular concept. As an analyst working in a fast-paced environment you will often not be granted the time to do this during work, so study in your free time becomes increasingly important.

Pattern Recognition and the Stool

Meme

In the above example Sally recognised a number of patterns she had seen before based on her unique knowledge of techniques known to be used by threat actors:

  • Gavin was running an executable from the Music directory
  • Gavin had a logon type of 10 which meant it was via Remote Desktop Protocol (RDP)
  • A command known to be used by the Mimikatz executable privilege::debug was seen in the command line
  • Mimikatz was the name of a tool publicly available that is used for stealing plaintext passwords from system memory
  • The executable was unsigned so not verified by a trusted software vendor
  • Threat actors often use Mimikatz to obtain passwords to move laterally
  • The server was exposing RDP to the internet which is commonly brute forced by threat actors
  • The account identifier for Gavin ended in 1000 which indicates it is likely a legitimate user in the domain rather than a newly created one as user domain identifiers begin at 1000.

As an analyst we can use the premise of a stool where if there’s 3 or more unique clear-cut attributes (legs) are there to stand on, then this generally indicates something may be malicious based on deviations to what you see as normal.

You could spend thousands of dollars gaining industry recognised certifications that teach you technical concepts, where to look for information on a system, what tools to use and so forth, but if you don’t know what data you have available then you’ll be paralysed trying to think of what to do with a detection or situation you’re unfamiliar with.

Regardless of Recognition and Data, Context is Key

Meme

Every single decision and data point you have as an analyst should guide you towards establishing context. Too often analysts will see a detection they’re unfamiliar with and if they can’t determine it to be malicious by looking at the hash of the executable, or some other basic checks then they’ll close it out as a false positive if nothing says it is malicious, THIS IS NOT HOW WE PERFORM HIGH IMPACT SECURITY ANALYSIS.

High Impact Security Analysis is all about having enough of a foundational knowledge to be able to establish context in unfamiliar situations. Take for example another alert below this time on a user workstation:

AV Alert: Malware:CS:Hueristic
Process Executable: C:\Windows\System32\svchost.exe
Process Command Line: C:\Windows\System32\svchost.exe -k UnistackSvcGroup -s WpnUserService
Signature: Signed (Microsoft)
User: Amanda (SID: S-1-5-21-1001356378-1477238915-642007331-500)
Logon Type: 2
Host Services: N/A

In this instance Sally picks up the alert and begins investigating it. She sees a user which is interactively logged onto a system with a logon type of 2, and there is an Antivirus alert which is pointing to an executable signed by Microsoft, svchost.exe. She also sees this is the local administrator account on the system by its SID.

  • Sally looks at the executable hash on VirusTotal and finds it has been seen a lot of times with no AV vendors marking it as malicious, this looks to be a legitimately signed svchost.exe executable
  • Sally looks back on their case management system and find other analysts have been closing similar signals off as a false positive for a few days now

Antivirus products have been known for a long time to have some false positives, but Sally can’t see exactly why the AV product has raised the alert so Sally begins to think this could be a false positive based on the analysis of others; however, unlike many junior analysts before her Sally is going to investigate further because contextually this is the only system they’re seeing the alert on, and the Antivirus product is on many systems.

  • Sally begins to formulate a potential hypothesis on what she is looking at:
    • A: The Antivirus product is causing a false positive
    • B: The legitimate svchost process is in some way running malicious code

At this point Sally is in an unfamiliar situation; however, she gathers enough foundational knowledge through online searches to kickstart her analysis.

  • First Sally knows that a process is made up of a number of threads, and any one of those threads could be performing malicious actions whilst the others perform legitimate actions
  • Sally finds out that svchost.exe is an executable which runs service DLLs
    • Sally finds out that the service specified in the command line (WpnUserService) loads a service DLL from the Windows Registry at HKLM\SYSTEM\CurrentControlSet\Services\WpnUserService\Parameters
    • Sally finds the relevant value in this key ServiceDll - REG_EXPAND_SZ - %SystemRoot%\System32\WpnUserService.dll
    • Sally understands that system variables are indicated by % and that the default variable here means the DLL is located at: C:\Windows\System32\WpnUserService.dll
    • Sally retrieves the DLL from disk and runs a hash search only to find it is known, signed, and a valid Microsoft DLL
  • At this point Sally takes a look at active network connections on the system and finds that the svchost process has an active network connection to an IP address
    • Examining the IP address on VirusTotal reveals it is an IP tied to VULTR, a provider of Virtual Private Servers which is commonly abused by threat actors
    • Sally retrieves a memory dump of this process for analysis
    • Looking at the strings in this memory dump and examining in WinDbg, Sally identifies the string MZARUH followed by what looks to be a PE file
    • A search for this reveals that MZAR is the default magic bytes for a 64-bit Cobalt Strike beacon
  • Suddenly the assumption begins to shift, and maybe Malware:CS:Hueristic indicates a likely Cobalt Strike beacon being identified in memory

There’s a lot that goes into the above, and without enough foundational knowledge to build upon it’s unlikely Sally would have come to the correct conclusion as to why this AV product was raising an alert.

Even though a lot of analysis has been done at this stage to confirm Sally’s possible hypothesis, the cause of the Cobalt Strike beacon in memory is still unknown and it’s likely to have taken a fair amount of time to get to this point given Sally was essentially learning on the fly. When you consider that most analysts are aiming to resolve an alert in 30-60 minutes it’s no wonder that many turn to the fallacy of believing if they can’t confirm something is malicious that it must be a false positive rather than gathering context and validating findings.

If you fail to identify malice where malice is present then you at best didn’t have the skills, time, tools, experience, or knowledge in identifying it, and at worst you didn’t do your due diligence and were negligent. The Cambridge definition of incompetent is as follows:

“Lacking the skills or knowledge to do a job or perform an action correctly or to a satisfactory standard”

I don’t know of any analyst that would like to be incompetent or negligent in their analysis, and yet when we only go as far as to do basic checks without context, we wind up falling into this bucket.

Even though analysis can become faster with experience, your process and technology is equally as important in reducing the amount of time performing triage and analysis and coming to the right outcome.

People, Process, Technology

Meme

Without people, process, and technology your security operations will suffer, and security analysts will have a hard time getting to the right decision when faced with unknown alerts or situations. In the above example Sally was able to use deductive reasoning and context to infer that the svchost process had been injected into and was running a Cobalt Strike beacon. What if there was more context, would this have changed the time required to get to the same outcome?

Take for example the below alert instead:

EDR Alert: Possible Cobalt Strike Bytes Reflectively Loaded in Memory
Process Executable: C:\Windows\System32\svchost.exe
Process Command Line: C:\Windows\System32\svchost.exe -k UnistackSvcGroup -s WpnUserService
Signature: Signed (Microsoft)
Source Executable Command Line: C:\Program Files\Intel\Chipset.exe
Signature: Unsigned
Source Executable Parent Command Line: C:\Windows\System32\svchost.exe -k netsvcs -p -s Schedule
User: Amanda
LogonType: 2
Host Services: N/A
Byte Pattern: MZARUH....<snip>

Suddenly Sally starts this investigation with far more context than she had previously. Not only does the EDR Alert contain more clarifying information (and ideally an investigation guide), but it also paints a picture of where she should begin her investigation. In the above example it’s clear a process has been injected into and the source of this injection appears to be C:\Program Files\Intel\Chipset.exe which is unsigned. This seems to be run from a scheduled task as indicated by the Schedule service.

The sheer fact that the technology gives more context means that less time is having to be spent by the analyst to gather necessary context.

Now with all of the scenarios mentioned above, how could you possibly be expected to perform High Impact Security Analysis without having at least some knowledge or be able to find it out quickly in the following concepts:

  • Processes
  • Threads
  • Scheduled Tasks
  • Command Line Arguments
  • DLL and Executable Metadata
  • Memory
  • Logon Types
  • SIDs/RIDs
  • Reflective Loading and Injection
  • Signed and Unsigned Executables
  • RDP
  • Mimikatz and LSASS
  • Svchost and Service DLLs

This is where documenting enough of your investigation pays off dividends over time. In the initial instance Sally had to do a lot of investigation; however, this is now knowledge she has which can be shared with others, and further to this the report she sent the impacted business can be examined in the future to understand what investigation Sally had performed based on her investigation notes. This creates a cycle of continuous improvement and knowledge sharing, but it takes time in the initial instance after which it becomes exponentially swifter if the same activity is seen again.

Without a process to follow it can be hard to perform enough analysis to determine if something is malicious or not, and even harder to know when to stop going down the analysis rabbit hole.

With the above let’s look at a generic process that would speed up analysis:

Step 1: Get context on the alert and form a hypothesis

An alert has gone off, how can you begin your investigation if you don’t know what the alert is supposed to be for? Every professional who creates detections has an obligation to guide an analyst in what they should look for and why the detection exists.

In the first example there was only ambiguous information on why the detection was raised, and not enough context on what caused it to be raised. In the second example there was far more context given. Try to gather as much context on why an alert may have been raised and what it is trying to detect, and form some possible different hypothesis on why the alert may be raised. This doesn’t need to be formally documented every step of the way like a research PHD would be, it’s simply keeping in the back of your mind that a number of possible causes could be present.

Step 2: Have we seen this before?

You have a certain amount of data at your fingertips, use this to see if you’ve seen anything before in your case management system by pivoting on an indicator. Anything is fair game here but for some inspiration try searching for key terms based on:

  • Alert Name/ID
  • Command Line
  • IP address
  • User
  • Signing Certificate
  • Byte pattern
  • DLL/Executable metadata
  • Domain/URL

If you don’t have the data then you need to acknowledge where your data limits are and work from what you have or can get (this is where having the ability to analyse different data sets is important). As you see more yourself and what is normal as an analyst you can begin to identify when something deviates from the normal and becomes a little bit unusual. This is where you’re able to use pattern recognition to make swift decisions with high accuracy. At the end of the day multiple analysts looking at a single alert will present different viewpoints which may be enough to comprehensively infer what has happened on a system.

Step 3: Get context on what is involved in the alert

If you’ve got no analysis that has taken place previously that helps infer what you’re looking at, this is when you really need to get your hands dirty.

Starting with the data you have available you may wish to answer the following questions as a starting point:

  • What user is involved, and how are they logged onto the system?
  • Is the system a workstation or a server? Does it have any services or applications exposed to the internet that could have been exploited?
  • What processes are involved? Are any of them suspicious? Are any of them known to be abused by other processes?
  • What activity occurred in the lead-up to, and after the alert? Are there any unusual processes which ran?
    • Look at processes that ran anywhere from 5 minutes to even days before the alert occurred if required. You should be able to identify what likely caused the alert, and what other actions have been taken.
  • Do any of the executables or DLLs involved show suspicious characteristics?
  • What persistence has been established on the system?

How are you supposed to do any of these things properly and swiftly without having broad knowledge on the type of alert you’re dealing with, possible threats to systems, and the tools or data required to perform a level of analysis? You can’t, and this is part of the reason it all comes back to Fast, Good, and Cheap - Pick Any 2, or in some places you get what you pay for.

Step 4: Confirm Hypothesis and Take Action

What needs to be done to stop the activity and restore normal operations? If it’s malicious then clear actions will need to be taken to respond, some of which may be to involve a devoted DF/IR firm. If it’s a false positive than working with the vendor who created the alert or adding exclusions may be required.

Where do I Stop?

Meme

I’m commonly asked by analysts how much analysis is enough, and when do I stop? Quite simply it depends on your scope, what you’re hoping to achieve from your analysis, and what outcomes you’re going to deliver.

For example, in the above scenarios where Sally identified a Cobalt Strike beacon in memory, she could have continued her analysis. Using the byte pattern which has been matched, Sally may also be able to extract the configuration of the Cobalt Strike beacon using free, publicly available tools. This is important because it can not only give more context on what IP address it uses for command-and-control, but also provides a watermark (license) that can be used to track unique licenses assigned to Cobalt Strike, and even information about what process will be injected into as part of normal operations.

So, should she get this information or not? Once again, it depends. The command-and-control IP may be useful for blocking across an environment, finding other infections, or maybe the license key will come in handy to track future infections of cobalt strike and threat actors.

When in doubt, your priority is always Containment, Eradication, and Recovery which is the standard process put out by NIST SP 800-61R2 the National Institute of Standards and Technology guide to handling computer security incidents.

In the above scenario containing the situation may involve:

  • Isolating the system
  • Taking memory dumps and killing the identified malicious processes in memory
  • Identifying and removing persistence on the system
  • Restarting the system to clear everything from memory

Eradication may involve:

  • Removing all malware still present
  • Disabling user accounts involved and changing credentials
  • Identifying other compromised systems and identities and containing those entities

Recovery may involve:

  • Confirming no malware is still present
  • Restoring from backups
  • Hardening systems and educating users to prevent the root cause from occurring again
  • Installing patches
  • Implementing security tools and firewalls etc

In short, for Sally to initially contain the situation, she doesn’t need to extract the configuration of the Cobalt Strike beacon, and she already has an IP address the process connected back to. To properly eradicate and recover though, extracting the Cobalt Strike beacon configuration may be useful in pinpointing other infected systems, gaining threat intelligence, and identifying other C2 infrastructure she may have missed.

Taking ownership of an alert and performing the best analysis you can from start to finish will always benefit you, other analysts, and any organisation impacted by the alert. The sad reality is that much of the information that was shown above may not be available without looking at forensic artifacts, taking actions on an endpoint, or consolidating OSINT and other information sources, and so the balance comes with how quickly you can use the data at your fingertips to gain the necessary context required.

Communication is Essential

Meme

Now that we know the art of High Impact Security Analysis, the most essential part of it comes into play; Communication.

It doesn’t matter how smart and technical you are if you aren’t able to convey risk or benefit to others. The problem is that in a managed SOC/MDR capability you often won’t know who you’re communicating with, or what their technical knowledge is. Some ways you may be able to get this context is to look them up on LinkedIn or talk with someone that knows them, but this isn’t always practical or possible, and that’s why it should be distilled down into 3 main sections:

  • What happened
  • What we did about it
  • What you need to do about it or follow up on.

If you can breakdown all of your analysis into these sections into actions that others can relate to, then you’ll be able to adequately communicate to a wide range of stakeholders. Sometimes the individual involved may want more information, and that’s okay because your notes associated with an investigation should help to convey the technical details of what analysis you have performed.

It’s hard being an analyst as everything comes from experience or continuous education. I can tell you now that I’ve met a lot of very intelligent analysts in my time, and I’ve met a lot of lacking or inexperienced analysts in my time but every single person had something unique to offer and had knowledge that others didn’t. Because analysts often don’t understand the value of High Impact Security Analysis, are unable to communicate their findings, or just don’t undertake the due diligence required based on their skills and time available, this can often lead to an outside perception that being an analyst means you are inexperienced in the field and in an entry level role which is simply not the case.

No matter your experience or role, you need to know when to be humble, and how to communicate in a way that’s not assuming others don’t know what they’re talking about or invalidating their ideas. This in itself can be a challenging concept to come to terms with for many technical professionals, and something better reserved for its own blog.

A Drop in the Water

Meme

In the above scenarios we touched on only a couple of alert types that could come through and what they could indicate, but this is continuously evolving as threat actor tradecraft evolves or changes and you come across different threats during your analysis.

Some other areas that would be good to brush up on are the following:

  • Web Shells (Not only what they are, but what technologies they can exist on e.g. .NET/ASPX, PHP etc)
  • Tunnels (Not only what they are used for, but what common ones are used by threat actors and what protocols do they often tunnel?)
  • Information Stealers (Not only what they are used for, but what extra actions should be taken if one were to successfully be run in an environment)
  • Ransomware Threat Actors (Not only what ransomware there is in the wild, but what is commonly done BEFORE ransomware is run in an environment? How can you detect this?)
  • DLL Sideloading/Search Order Hijacking (Not only what it is, but what it’s commonly used for, how do you identify when it may be occurring?).
  • Rogue RMM Tools (Not only what ones are commonly used, but what artifacts show when they were installed, who is taking an action on a system, and from what IP)
  • MITRE ATT&CK Tactics (Not only what they mean, but what is commonly used during these phases? How would you spot them?)
  • Commonly abused externally facing services (Not only what they are e.g. SQL, RDP, SMB, IIS/Web Applications etc, but how do you identify exploitation of these services? What artifacts do you have available?)
  • Networking and System Internals (This is something often skipped over and can become incredibly complicated the more you know, but if you don’t understand concepts such as packets and their types, processes, scripts, executables, threads, drivers, trust levels, event logs (ETW), the registry, LOLBAS, COM objects, WMI, AMSI, UAC, SIDs, DNS, IP addresses, ARP, File hashes, signing certificates it will lead to gaps in your knowledge that need filling during analysis)
  • Tools for accomplishing tasks (Not just what tools are available, but when to use them and what information they may provide)

The secret to being a well-rounded analyst is to try to gain a basic to intermediate understanding of as many technologies and systems as you can, and tie this back to known threat actor tradecraft.

In Summary

  • Look Out: When you get an alert don’t reinvent the wheel if it isn’t required, use existing knowledge to respond swiftly.
  • Look In: Gather context of what is involved in the alert and what activity has happened around that time to try and determine what may have caused it.
  • Look Out: Using newly found intelligence and indicators look at other systems and binaries involved to try and get a feeling of the scope of the incident, any persistence, or otherwise.
  • Look In: Document your learning outcomes and analysis process, this will help everyone in the long run

In closing, mistakes and malicious activity will be missed due to brief oversights, individual bias, individual experience, and competing priorities to be fast and meet SLAs/SLOs, so remain humble, help each other improve every single day, and continue to increase your skills as a security analyst.