Best Practices

The high-performing team's guide to AWS Identity and Access Management

CloudSheriff Security Engineers

10 Jan 2020 • 6 min read

Getting deep into IAM is a great way to have analysis paralysis. Here are some common sense, high-leverage tips to get secure but keep rolling.

Intimidated or annoyed by the complexity of IAM? Worried that if you dive into IAM, you’ll never get out because the details go on and on and on? Well, I have some news for you. It’s a legitimate fear. The details can be overwhelming. The number of decisions available to you is very high.

Without some experience, you can easily find yourself spending hours troubleshooting with roles, principals, definitions, resource names, and so on, all the while surprising and angering your team as unpredictable access patterns break for them. And you’ll lose sight of the bigger picture in the process.

Throwing a novice into IAM management is a fantastic way to bring chaos to your team and bring their productivity to a halt.

But our best practices line up under this one piece of battle-tested advice: Did you know most cloud security is basically about preventing the wrong people from seeing data and doing the wrong things?

Backstory

AWS’ IAM service is a powerful and precise system that works extremely well across the entire global infrastructure. It is actually an ingenious, powerful system, because with accessible, readable JSON, you can set policies that not only are easily understood by you and your team, but they are also globally enforced, automatically, by AWS.

You can trust it to “just work.”

However, its precision and power is a double edged sword. A large portion of hiccups and interruptions in developing a product for AWS can happen from misconfigured IAM policies.

For example, you might be launching a Lambda function, and try to dial down the Lambda permissions, but find that some process is not working, or the function times out, or something else. You have to sort through the logs to look for the root cause – and you’re not always thinking it’s an “access denied” issue. And of course, Cloudwatch logs take a minute or so to show up. A misconfigured Lambda execution role will certainly throw errors that are hard to track down.

Or, let’s say you’ve set up a data pipeline in Glue. You are moving data from one data lake zone to another. The data job takes a few minutes to start up, a few minutes to process, then shut down. Something breaks in the process, and you wait 5-10 minutes to figure it out. Then, you are culling deep into Glue logs (which are hard to read, in our opinion) and finally discover some access policy was referencing the wrong S3 bucket.

What often happens, is that the developer team, frustrated and under time pressure, will just open up access by assigning whatever role “AdministratorAccess” or some other liberal policy, and say to themselves, “I’ll come back to this later.”

The problem is, people don’t come back to it later. Now you have Groups, Roles, and Instance Profiles floating around with AdministratorAccess – an undeniable security problem.

How it Works

Before any API call to the AWS infrastructure, IAM checks the permissions of the requestor against the action. The permissions of the requestor come from a whole host of sources, and what AWS does is collapse all policies in play into a single set of policies. For example, the simplest way to think about it, is the User has his/her own set of policies and permissions. But, the User might also be part of a Group, or multiple Groups, also having their own policies. Those policies are combined with the user’s policies. Then there might be Account-level policies that govern what anyone or anything can do in an Account. These are called Service Control Policies; think of them as a hierarchical set of rules that cascade down and everyone “under” those policies inherit them.

But how far do you take it?

This is all cool and interesting, but as an engineer friend of mine once wearily said, “it’s just yet another thing to learn.”

Herein lies the problem: where do you stop?

Again, the issue with IAM is you just write down what you want to happen – this person or group can see this data, that group can create or delete data, that person cannot see that data, but can create this data, and so on. It’s hard enough to decide what to do, let alone set aside a team member’s time and attention to write, check, and maintain policies.

CloudSheriff’s Best practices for IAM for high-performing teams

Just doing #1-4 alone will put you well ahead of most teams we’ve seen.

Assign someone to be the team lead for IAM. This requires real focus (because building policies is a bit like coding), maturity, and diplomacy. It needs maturity and diplomacy because you will need to proactively share information with the team. And likely, you’ll need to restrict access, and this will cause some ripples in the team.
Before doing any serious tinkering with your policies, start with AWS Managed Policies, like AmazonEC2ReadOnlyAccess. These are standalone policies managed by AWS. You cannot modify them directly (although you can attach other policy statements if you need to tweak it). These are great for teams that want to use a commonsense policy out of the box. Clear out all your old policies and replace them with Managed Policies.
Explore Job Function managed roles. These are managed roles that have captured what is commonly needed for well-understood roles.
Be very selective about who gets AdministratorAccess.
If your policy starts getting long, see if you can combine policies by using lists inside the policy statements.
Avoid inline policies (CIS), “*” statements (CIS) Avoid the use of “NotResource” type statements. It’s just confusing. All of these are totally legitimate features that AWS allows but we recommend you avoid them.Inline policies are hard to track.
“*” statements, where all actions are allowed, are a problem because it doesn’t tell the reader what actions are specifically allowed. For EC2, for example, using * authorizes dozens and dozens of individual actions. It’s better to list them out.
Don’t let anyone have IAM:FullAccess unless you mean it (because they can change their permissions to anything they want). This should be limited to Administrators, and, if you have one, an IAM Administrator; although, most teams don’t have this role.
Monitor access on occasion – “last logged in” etc, and “last used” for policies. This tells you which policies, groups, roles, or users are “stale” and therefore are a security risk.
Dial in a precise policy for your most valuable assets. Let’s say you have a lot of customer data in an S3 Bucket. It’s really worth the time to research and get precise on both the Bucket Policy and the policies that govern access to the bucket.
Keep “Least privilege in mind” but unless you have a security expert, you will go crazy and slow down everyone if you try to get it really perfect out of the gate. It’s better to start thinking about security in team meetings and occasionally asking if the IAM policies are intelligently written.
Only use “DENY” when it’s really necessary. Use it when you think there might be a lot of policies in play and you want to make absolutely sure a particular principal doesn’t do something. Use it sparingly because a DENY statement trumps every other statement.

As you can imagine, just doing a check of your existing policies to see how they fare against the advice above, would require a lot of focus. Does this still seem overwhelming? Remember, the big picture is stopping the wrong people from doing the wrong things. Keep that in mind, and go through items 1-4 above, and you’ll be well on your way to getting secure while still moving fast.

Conclusion

IAM is a powerful, well-designed service that puts all your access policies in human-readable form. The syntax itself is not hard. But the bigger problem lies in applying judgment to what you should use, and also maintaining those policies. Team members will come and go; new services will be introduced; AWS will change their recommendations for best practices. We think most teams believe that security is job #1 but when the rubber meets the road, it’s a lot of detailed work.

CloudSheriff is there for companies and teams that could provide that focus, but don’t want to. Our managed cloud security service combines great security tools with great expertise, and a cooperative, supportive attitude to come alongside your team. We take control of your security posture and bring a sigh of relief to most clients within a couple weeks, so you can just focus on building business value.

Sign up for more like this.