Insights

Cloud Infrastructure Management – Top 5 Tips

January 30, 2020

Following on from an article I contributed to in Information Age, which talked about recommendations for managing cloud infrastructure, I’ve had a lot of people asking me for a bit more detail on the top level tips I shared. 

There are a number of new challenges which cloud migration and cloud deployment brings. This can include managing hybrid infrastructure, multicloud deployments, understanding where your data resides and maintaining your visibility. 

In this blog, I’ve put a bit more detail behind my top 5 tips to offer more insight. For anyone who wants to ask specifically about their situation, feel free to get in contact.

Tip 1: Automate as much as possible

This is for those repetitive tasks. 

It’s worthwhile spending some time to automate tasks for a number of reasons; primarily for error reduction. Once you’ve done your testing and automated the task, whether it be networking, security or deployment of cloud native services, you can just press a button, invoke it, and the task is done in exactly the same way. This improves reliability and can eliminate errors. If you have staff members performing tasks over and over, they’re likely to drop the ball, it’s just human nature. This is especially important where you have one team doing the deployment and another team actually running the day to day, with a separation of duties between them. 

Speed of deployment is a huge benefit you can gain from automation. You can manage complex infrastructures and smaller more simple cloud infrastructures and you don’t need to get in to the nitty gritty, doing hundreds of ‘mouse clicks’, to do simple or complex tasks. This saves you time, saves errors, and you have more control. 

You can also have change control on your automation so you know that it’s correctly replicated and you can retrofit any new changes to old infrastructure. 

Tip 2: Create and govern design templates – and stick to them! 

In the old world of infrastructure, we had set architectural patterns and reference architecture with high level designs that we could cut and paste. This creates a standardised design which doesn’t leave room for errors. 

If you have design templates and know a cloud infrastructure is going to be used time and time again, it’s far easier to automate, rather than your teams having to build things manually every single time. However, when things do change, especially from a network and cloud infrastructure perspective, you need strict governance in place. Templates allow the automation guys to know what’s changing very quickly, the security guys to do a security assessment to ensure nothing’s changing, and from a budget perspective you can monitor any changes if you’re doing anything a bit out of the ordinary, should something not materialise, to avoid wasting a huge amount of time and money. 

There are a lot of architecture design patterns on the internet to do certain design patterns like DMZ or various tiers of infrastructure. Stick to them, or you’ll end up with a hodge podge of various scenarios where you think something’s happening in the network, but it’s not, because someone else made a change with no governance, bringing unpredictable results. 

Tip 3: Have the network/security team manage the networking parts (routing, firewalls, subnets) and give the DevOps teams a “secure play pen” to work within. Separate out the duties based on skill-set

The problem we’ve seen is, with the cloud making it so easy for people to spin networks up and get services online, it’s causing organisations a lot of issues. In the traditional way, you should have a network team who specialise in this area. The development teams and support teams wouldn’t necessarily know what was happening in the network because they were consumers. Now there’s a merger with cloud infrastructure, which means it’s very easy for non-network people to do more, and network people to spin up separate servers – everything is far more blended. That leads to all sorts of problems. And things end up not being done properly. 

Networking teams understand routing, subnetting, firewalling, IP address schemers – which is one of the main issues – and the nuances of networks and cloud infrastructure, which the developer and DevOps teams generally don’t have knowledge of. The networking team, on the flip side, also may or may not have an understanding of the development team and what their needs are. So having a separation of duties, where people who understand security and routing are allowed to govern and control that area, and development teams are given a certain bandwidth to work within, creates a much more reliable model. We’ve seen many times that if developers aren’t skilled in IP address schemes, they will just use one big network which brings integration and routing issues further down the line, and creates a potential situation where everything just needs rebuilding. 

This is really inefficient for your business and your biggest risk from a networking and routing perspective is, it makes it very easy to bypass your security. We’ve seen a number of cases where someone thinks that putting a firewall in will solve this – but if your network isn’t correctly pointing to that firewall then traffic is bypassing it. You’ve just exposed your entire environment to the internet! 

Tip 4: Have thoughtful monitoring and alerting. You need a healthy balance between “noise” and needing to take action

This is something which you’ll need to continuously tweak to make sure you know what’s going on in your environment and to understand what’s important. With cloud native services, and network virtual servers you can have many logs and alerting set up from the underlying cloud fabric, the actual virtual appliances or the servers themselves. It’s important to aggregate those to understand what is really important. What you can end up with is more alerting than you know what to do with, too much noise, and you’ll miss something important coming through. 

This is something which you’ll have to tweak as you go along but is so critical to figure out what’s happening in your networking from a security perspective, a consumption perspective, and a hyper high CPU usage. You can get an accurate indication of the health of your system. Different people will need to know different things – your network teams need to know if a firewall or UTM has been threatened, your server team will want to know what’s happening with the database at the back end. Having a centralised view is really important. 

It will take time to filter out the noise. But it’s worth the effort. 

Tip 5: Create a pipeline that is suitable for your enterprise and the way you and your teams want to work – don’t feel the pressure to go with what the crowd tell you to use

Each environment, each team and each business will have their own requirements. For example, for a single deployment which is going to be very static, don’t feel like you should spend time automating absolutely everything just because it’s a cool thing to do. That may seem like a contradiction to my first tip, but it’s about making sure that you’re spending time on something because it benefits the business. Conversely if you have hundreds of deployments don’t feel like it’s cool to do it manually. 

There’s no right or wrong. Some people are falling for the ‘fanboyism’ of following trends. At Cloud Gateway, we’ve built a healthy mix of automation and levels of monitoring but we don’t automate for the sake of automation, or over engineer for the sake of over-engineering. Every environment is different. Don’t feel like you have to follow a trend. 

Build infrastructure that gives you more visibility and control. If something isn’t working please don’t be afraid to go and change it and test it out. You need a sustainable model so you don’t spend more time reworking your pipeline and governance. Go with what’s best for your business – there’s no one pattern that will fit everyone. 

cloud environments
cloud infrastructure
Cloud networking
cloud networks
Neil Briscoe