Soaring with SOAR: An Interview with Kim Watson, IACD Technical Director, John Hopkins Applied Physics Laboratory

Among security organizations, confusion exists around SOAR (Security Orchestration, Automation and Response)—and this confusion impacts how security leaders think about these technologies, often limiting the potential benefit of deployments to their organizations.

To clarify the basic aspects of orchestration, automation, and response, the NTSC chatted with Kim Watson. She is a member of the Senior Staff at the Johns Hopkins Applied Physics Laboratory and is a Technical Director for Integrated Adaptive Cyber Defense (IACD).

In this interview, Kim talks about SOAR from a business perspective, measuring response actions more specifically by benefit and regret, and how organizations can start introducing automation and orchestration into their environments.

Why does SOAR make sense from a business perspective? For example, why should a CFO pay attention to SOAR?

Consistency just may be the main reason that CFOs should pay attention to SOAR. Cybersecurity automation and orchestration creates a consistency in process, measurement, decision making, action, and reporting.

The efficiencies gained are well publicized, and we are starting to hear about efficiency and effectiveness increases in processes, particularly in the Security Operations Center (SOC). Some big name companies are saying that their use of SOAR has allowed them to retrain their Tier 1 and/or Tier 2 analysts because the platform can implement a majority of Tier 1 analysis functions and decisions. They claim that these products are consistently triaging and prioritizing a large portion of their daily alerts, allowing their analysts to consistently focus on higher-risk items than they were previously. The SOC is not just processing more alerts, but humans are working on the more advanced investigative tasks associated with the highest potential risks.

These platforms can consistently collect data about how often certain products and services are used, how often certain decisions are made, how often certain actions are taken, and the conditions associated with each of these counts to consistently support audit and process improvement.

No matter where a risk is initially detected, the investigation and action processes can be consistently applied across your organization, and business risk management principles can be consistently prioritized and executed across cybersecurity operations.

How would you clarify the definition of security automation vs. security orchestration? And why is that distinction important?

In the simplest terms, automation is about machine-based implementation of tasks and orchestration is about machine-based implementation of processes (or synchronization of decisions and tasks). The reason the distinction is important is that cybersecurity automation can make security tasks more efficient, where orchestration can make security processes more efficient. When the two are combined with business objectives, they can make security operations more effective.

I say that orchestration improves the efficiency of processes because the first thing that happens when you try to deploy orchestration workflows is that you find all disconnects and inconsistencies in your current processes. For example, your SOC can prioritize a task to reimage a compromised device and hand it off to IT operations, but if the task is not aligned with the processes that drives the prioritization of IT tasks for the day, then it won’t get done outside of the natural cycle of asset management. If this is happening at your organization, you will find that out as soon as you use SOAR and put your SOC-prioritized task into the IT support ticketing system. The two halves of the process may work great—with the SOC identifying and prioritizing devices for reimaging and IT operations creating and executing on a prioritized list of devices to reimage—but the overall process is broken because the SOC produced tickets never make it to the top of the prioritized IT ticket stack.

In the long run, cybersecurity automation needs to include and integrate all your cybersecurity operations. Every time there are people managing, protecting, or monitoring cyber assets in an organization, they're all seeing parts of the same thing—so they need to be unified in their prioritization and understanding of the business risk tolerance. Think about the power of that. You're gaining an efficiency of process and, even more importantly, effectiveness of process. Suddenly, the right tasks jump to the top of the right queues in a manner that is directly tied to business objectives. That’s what cybersecurity orchestration can give you that automation alone cannot.

Talk about how companies can use the benefit/regret matrix to strengthen their security posture.

The benefit-regret matrix came into being because many people noted that few organizations were taking response actions on threat intelligence about malware and malicious activity. Actually, people are taking response actions every day. They just don't know it because, rather than letting their SOC do it, they're letting a vendor do it. Their vendors are quarantining or blocking threats in an automated fashion all the time because organizations believe those vendors know with a high degree of certainty that a specific threat is malware and should never be on the network. In this situation, the potential for regret is very low and the benefit is very high.

Sometimes, an organization thinks the SOC shouldn't take a specific response action because of liability or the potential risk to business is too high. When this is the case, how can they gain the trust necessary to start to allow their operations staff to take specific response actions? Basically, the number one concern is worry that operations staff will take the wrong response action, where wrong means the result will impact business operations in a negative manner. In this situation, looking for low-regret actions becomes the key. You don't have to be right, as long as you are not sorry. For example, if no device on your network has ever attempted to resolve or connect to a particular IP address, then blocking access to that address would be a low regret action if that address was thought to be associated with malware C2. Even if the intelligence turned out to be wrong, the likelihood of blocking that address having a negative impact on business is very low.

The best place to start adding automation, and particularly automated response, in an environment is with low regret response actions. It will look different for different risk tolerances, but most organizations should be able to identify conditions where taking a particular action is aligned with their business objectives and not likely to inappropriately impact operations.

One of the main lessons learned in helping organizations implement SOAR workflows is “Don't build your processes around what you do today. Build them to optimize your risk decisions quickly.” Let me explain by way of an analogy. An organization was redesigning their customer engagement process because they were getting way too many requests than they could support. The people who were rebuilding the triage process had a flawed assumption: they were building their process to get to the best “yes.” Their process should have been to get to “no” fast. Triage, prioritize, and then get to the best “yes.” Applying this same logic to SOAR workflows, don’t implement investigative processes to get to the most accurate answer. Instead, figure out what it takes to determine if the risk is acceptable or if a response is low regret and do that first. Then determine what it takes to prioritize what remains and do that next. The “full” process should only be invoked for the high priority items that remain.

One way to do this is to understand how much you have already offloaded to vendors, and why, because that will help you identify high benefit, low regret response actions. Can that same low regret action apply under other circumstances? Are there other high benefit response actions that your vendors don’t identify, but you can? The answers to these questions will help you identify places to bring automated responses into your environment in an appropriate manner. You will start to develop trust in both automation and the process.

The next step is to figure out what one piece of information, if you knew it, would move a response action from high regret to low regret. For example, consider reimaging devices. You're not going to reimage the production server just because you think it's compromised or a watering hole. Instead, you’re probably going to do that action during the next scheduled maintenance window while deploying enhanced monitoring and limiting access until then. But you can probably reimage a compromised laptop during the day. In this example, it is a certain attribute of the asset that makes the response low regret. So think about when an action can be low regret, and what information is needed to make that determination.

How does an organization ramp up their SOAR strategy if they lag? What steps can they take?

IACD has a developed a readiness framework to help organizations get started and it is available on our website (www.iacdautomate.org). Once an organization has invested in SOAR, the vendors offer communities and services to mature your use of cybersecurity automation. But honestly, your peers are one of your greatest resources. Participate in communities where you can hear about what others have done and what they’ve learned. Maybe they started where you are, or maybe they’ve made a step toward where you want to go. Learning those lessons from organizations that share your general risk posture and threat environment helps to identify solutions more closely aligned with your organizational needs and constraints. Often, you already have relationships with people in these organizations that make you more willing to share and listen.

NTSC Blog