How to fix alarm overload with cloud security automation

How to fix alarm overload with cloud security automation

So you’ve got your new security monitoring system set up. You have  alarms that send you emails when the root user logs in. You get a slack  message when someone launches an RDS instance open to the internet. And  everyone gets a text message when an S3 bucket is created without  encryption.

This is great but, like pretty much everyone, you’re swamped with notifications all day long.

At CloudSheriff, we believe it’s cruel and lazy to set up a security  monitoring system just to send notifications. Many products on the  market do a great job of telling you what is wrong – look, this bad S3  bucket has a red label on it! – and that’s it.

We want to take care of your security so you can innovate and sleep well at night. That’s why we offer automated remediation.

What is automated remediation? Let’s break it down. In cloud  security, “remediation” is just a fancy word for “fixing something”.  When you find something broken, you fix it, or remediate it. If you find  an S3 bucket with public access, you remediate it by changing the  access policies to be private.

Automated remediation is just fixing security problems automatically.  That means when problems pop up (and pop up they will) they get fixed  in the background. And the only notification you get, is that the  problem was fixed. Now that’s a notification anyone would like to get!

Cloud Sheriff’s Best practices for avoiding security alarm overload with automated remediation:

  • Build a remediation plan for a wide variety of common problems.  For example, when you use AWS Security Hub, it launches with dozens of  Center for Internet Security (CIS) recommendations already implemented  in AWS Config. You will have to code the remediation actions yourself.
  • Use  human-readable policies so your team has a fighting chance at  compliance. At CloudSheriff, we use CloudCustodian, an open source  project that lets you build guardrails with human-friendly YAML. We give  you 100 such policies out of the box so you don’t have to reinvent the  wheel.
  • Save automated remediation for the most egregious policy  violations (like a Relational Database Service cluster open to the  public Internet) that you know for sure you will want to fix, with no  hesitation.
  • For policy violations that are more of a gray area,  have a “man-in-the-middle” approach to remediation. These are errors  that are easily-fixed, but you’d like to know about them before you fix  them. For example, if a port was opened up on a Security Group that  wasn’t normally authorized, you might want to inspect it or ask around  the team before you take action. It could be a valid change.
  • For  the least cognitive load on your team, offer a choice in resolution –  we can fix it by doing x or y, or do nothing. Which do you prefer? Let  the team member just click a button instead of planning out what needs  to be done.
  • Plan remediation in advance. Nothing kills  productivity like having to context-switch completely away from  value-producing work to think about how to fix a policy violation.
  • In  order to plan in advance, you will end up thinking about baseline rules  that everyone should follow. It’s very common for teams to enforce that  everything be tagged with a simple tagging policy. CloudSheriff  automates this by enforcing global tag policies. What this does for you  is give a clean view of your entire inventory – a necessity for a  healthy security posture (you can’t secure resources that you don’t know  about)

With the wide variety of security offerings in AWS today, it is  possible to become so overloaded with alarms that you become numb to  them. We call that “notification fatigue” or “alarm overload”. If  everything is urgent, then nothing is urgent. A better approach is to  automate a fix to common and obvious policy violations. This will save  tens if not hundreds of developer hours per year that are better used to  push the business forward. Do yourself and your business a favor and  think through the security policies you want to enforce, and automate  the remediation of policy violations.