The high-performing team's guide to AWS Identity and Access Management

The high-performing team's guide to AWS Identity and Access Management

Getting deep into IAM is a great way to have analysis  paralysis. Here are some common sense, high-leverage tips to get secure  but keep rolling.

Intimidated or annoyed by the complexity of IAM? Worried that if you  dive into IAM, you’ll never get out because the details go on and on and  on? Well, I have some news for you. It’s a legitimate fear. The details  can be overwhelming. The number of decisions available to you is very  high.

Without some experience, you can easily find yourself spending hours  troubleshooting with roles, principals, definitions, resource names, and  so on, all the while surprising and angering your team as unpredictable  access patterns break for them. And you’ll lose sight of the bigger  picture in the process.

Throwing a novice into IAM management is a fantastic way to bring chaos to your team and bring their productivity to a halt.

But our best practices line up under this one piece of battle-tested  advice: Did you know most cloud security is basically about preventing  the wrong people from seeing data and doing the wrong things?

Backstory

AWS’ IAM service is a powerful and precise system that works  extremely well across the entire global infrastructure. It is actually  an ingenious, powerful system, because with accessible, readable JSON,  you can set policies that not only are easily understood by you and your  team, but they are also globally enforced, automatically, by AWS.

You can trust it to “just work.”

However, its precision and power is a double edged sword. A large  portion of hiccups and interruptions in developing a product for AWS can  happen from misconfigured IAM policies.

For example, you might be launching a Lambda function, and try to  dial down the Lambda permissions, but find that some process is not  working, or the function times out, or something else. You have to sort  through the logs to look for the root cause – and you’re not always  thinking it’s an “access denied” issue. And of course, Cloudwatch logs  take a minute or so to show up. A misconfigured Lambda execution role  will certainly throw errors that are hard to track down.

Or, let’s say you’ve set up a data pipeline in Glue. You are moving  data from one data lake zone to another. The data job takes a few  minutes to start up, a few minutes to process, then shut down. Something  breaks in the process, and you wait 5-10 minutes to figure it out.  Then, you are culling deep into Glue logs (which are hard to read, in  our opinion) and finally discover some access policy was referencing the  wrong S3 bucket.

What often happens, is that the developer team, frustrated and under  time pressure, will just open up access by assigning whatever role  “AdministratorAccess” or some other liberal policy, and say to  themselves, “I’ll come back to this later.”

The problem is, people don’t come back to it later. Now you have  Groups, Roles, and Instance Profiles floating around with  AdministratorAccess – an undeniable security problem.

How it Works

Before any API call to the AWS infrastructure, IAM checks the  permissions of the requestor against the action. The permissions of the  requestor come from a whole host of sources, and what AWS does is  collapse all policies in play into a single set of policies. For  example, the simplest way to think about it, is the User has his/her own  set of policies and permissions. But, the User might also be part of a  Group, or multiple Groups, also having their own policies. Those  policies are combined with the user’s policies. Then there might be  Account-level policies that govern what anyone or anything can do in an  Account. These are called Service Control Policies; think of them as a  hierarchical set of rules that cascade down and everyone “under” those  policies inherit them.

But how far do you take it?

This is all cool and interesting, but as an engineer friend of mine once wearily said, “it’s just yet another thing to learn.”

Herein lies the problem: where do you stop?

Again, the issue with IAM is you just write down what you want to  happen – this person or group can see this data, that group can create  or delete data, that person cannot see that data, but can create this  data, and so on. It’s hard enough to decide what to do, let alone set  aside a team member’s time and attention to write, check, and maintain  policies.

CloudSheriff’s Best practices for IAM for high-performing teams

Just doing #1-4 alone will put you well ahead of most teams we’ve seen.

  1. Assign someone to be the team lead for IAM. This requires real  focus (because building policies is a bit like coding), maturity, and  diplomacy. It needs maturity and diplomacy because you will need to  proactively share information with the team. And likely, you’ll need to  restrict access, and this will cause some ripples in the team.
  2. Before doing any serious tinkering with your policies, start with AWS Managed Policies,  like AmazonEC2ReadOnlyAccess. These are standalone policies managed by  AWS. You cannot modify them directly (although you can attach other  policy statements if you need to tweak it). These are great for teams  that want to use a commonsense policy out of the box. Clear out all your  old policies and replace them with Managed Policies.
  3. Explore Job Function managed roles. These are managed roles that have captured what is commonly needed for well-understood roles.
  4. Be very selective about who gets AdministratorAccess.
  5. If your policy starts getting long, see if you can combine policies by using lists inside the policy statements.
  6. Avoid inline policies (CIS), “*” statements (CIS) Avoid the use of “NotResource”  type statements. It’s just confusing. All of these are totally  legitimate features that AWS allows but we recommend you avoid  them.Inline policies are hard to track.
  7. “*” statements, where  all actions are allowed, are a problem because it doesn’t tell the  reader what actions are specifically allowed. For EC2, for example,  using * authorizes dozens and dozens of individual actions. It’s better  to list them out.
  8. Don’t let anyone have IAM:FullAccess unless  you mean it (because they can change their permissions to anything they  want). This should be limited to Administrators, and, if you have one,  an IAM Administrator; although, most teams don’t have this role.
  9. Monitor  access on occasion – “last logged in” etc, and “last used” for  policies. This tells you which policies, groups, roles, or users are  “stale” and therefore are a security risk.
  10. Dial in a precise  policy for your most valuable assets. Let’s say you have a lot of  customer data in an S3 Bucket. It’s really worth the time to research  and get precise on both the Bucket Policy and the policies that govern  access to the bucket.
  11. Keep “Least privilege in mind” but unless  you have a security expert, you will go crazy and slow down everyone if  you try to get it really perfect out of the gate. It’s better to start  thinking about security in team meetings and occasionally asking if the  IAM policies are intelligently written.
  12. Only use “DENY” when  it’s really necessary. Use it when you think there might be a lot of  policies in play and you want to make absolutely sure a particular  principal doesn’t do something. Use it sparingly because a DENY  statement trumps every other statement.

As you can imagine, just doing a check of your existing policies to  see how they fare against the advice above, would require a lot of  focus. Does this still seem overwhelming? Remember, the big picture is  stopping the wrong people from doing the wrong things. Keep that in  mind, and go through items 1-4 above, and you’ll be well on your way to  getting secure while still moving fast.

Conclusion

IAM is a powerful, well-designed service that puts all your access  policies in human-readable form. The syntax itself is not hard. But the  bigger problem lies in applying judgment to what you should use, and  also maintaining those policies. Team members will come and go; new  services will be introduced; AWS will change their recommendations for  best practices. We think most teams believe that security is job #1 but  when the rubber meets the road, it’s a lot of detailed work.

CloudSheriff is there for companies and teams that could provide that  focus, but don’t want to. Our managed cloud security service combines  great security tools with great expertise, and a cooperative, supportive  attitude to come alongside your team. We take control of your security  posture and bring a sigh of relief to most clients within a couple  weeks, so you can just focus on building business value.