Azure Route Tables Explained

Reading Time: 16 minutes


In this post, I discuss the differences between Azure default routes and user defined routes (UDR), including the configuration options available when creating a user defined route.

When we create Virtual Networks in Azure, the Azure platform automatically creates a route table for each subnet located inside a Virtual Network. To simplify, a route table is like a set of instructions. For example, a Virtual Machine asks the route table for directions. “Hello Route table, how do I get to this destination?”. The route table provides the Virtual Machine with the next hop, such as, if you wish to get to this destination, this is how you get there.

The default route table automatically created by Azure includes a number of default system routes. These system routes can not be modified or deleted, however, you can override the routes by creating custom routes, also known as User Defined Routes (UDR’s). These are routes created by the user, as in you, and can be modified and deleted as needed. I’ll cover UDR’s later in this post.

Note: This post is intended to provide a basic understanding of route tables. It is important to carefully plan your routing in your production environment, as incorrect routing configuration could lead to major issues.


How to view the Azure system default routes

One of the ways to view the default routes automatically created by Azure is as follows,

1. Locate a Virtual Machine in the Azure Portal
2. From the left pane, under settings, click Networking.


3. Click the network interface located under the IP configuration drop down menu as shown below.


4. From the left pane, scroll down and click Effective Routes, located under the Help section.


Allow a few seconds for the route table to load. We’ll study the default routes in the next section.



What is visible inside the Route Table

We see a number of active default routes, I have taken a small snippet and displayed it below.


The routes in the diagram are as follows.

Source: Default
These are default routes created by Azure and can not be modified or deleted.

State: Active
The state of the route. In the image above, active, meaning the route is enabled.

Address Prefixes: 10.2.0.0/16 and Next Hop of Virtual Network
The address space used for my Virtual Network is 10.2.0.0/16. That’s the IP address name space I configured when I created the Virtual Network. When I configured this address space, Azure automatically created a default route. You may have heard or read that resources inside an Azure VNET (Virtual Network) including resources in different subnets within a VNET can communicate with each other by default. This is the default route which allows that communication to happen.

For example, the below image displays one Virtual Network named VNET1. The VNET (Virtual Network) includes Subnet1 and Subnet 2. Each Subnet includes one Virtual Machine (VM1 and VM2). By default VM1 and VM2 can communicate with each other inside VNET1. If we were to deploy additional Virtual Machines to Subnet1 or Subnet2, all VM’s would be able to communicate with each other. This is the default behavior made possible by this default route.


Address Prefixes: 0.0.0.0/0 and Next Hop of Internet
When we build a new Virtual Machine (VM) in Azure, did you know that the Virtual Machine has outbound internet connectivity enabled by default? Meaning that I could logon to that Virtual Machine and browse the internet such as cloudbuild.co.uk and other websites. How? This is the default route which allows outbound Internet. This is like a catch all route, so if a route is not already located in the route table, Azure will route traffic for any address not specified by an address range within a Virtual Network to the Internet. In simple terms, “if a route does not exist, send it to Internet via 0.0.0.0/0”. However, there is one important exception to note, if the destination is for one of Azure’s services, Azure routes the traffic directly to the service over Azure’s private backbone network, rather than routing the traffic to the Internet.

You may be thinking, “resources have outbound internet connectivity by default. Is this secure?”. If you have had this thought, it’s good to know that you have security in mind, and you’re right, this is not secure by default. However, this default behavior is due to change in September 2025. Check out the following post for details, Default outbound access for VMs in Azure will be retired September 2025.

Next Hop: None
Traffic destined to these address prefixes is dropped, it goes no where and will not leave the subnet. For example, when I created my VNET, I used an address space of 10.2.0.0/16 and we can see a default route for that in the route table above. I did not configure any other IP address name spaces such as 10.0.0.0/8 or 172.16.0.0/12 or 192.168.0.0/16, therefore Azure creates a default route to drop this traffic. Why? because I am not using these private IP ranges. However, if at a later date, I do decide to use one of these IP ranges in my VNET, Azure will automatically amend the next hop of none, to Virtual Networking. So none refers to not allowing traffic destined for that address prefix out of the subnet.

So that is an overview of the default routes, however, there is something to keep in mind. As you enable additional services such as VNET Peering and Virtual Network Gateway to name a couple, Azure will continue to create default system routes. For example, If I was to peer two Virtual Networks, VNET1 to VNET2. Let’s say VNET1 is located in the region UK South and VNET2 in region UK West. I have a requirement to connect the Virtual Networks together to allow resources in VNET1 to communicate with resources in VNET2. Peering would allow me to connect the Virtual Networks and the Azure platform will create a new default route automatically to allow resources to communicate with each other between the Virtual Networks. Below is a route which is created by Azure after I peer to another VNET configured with an IP address name space of 10.3.0.0/16. This address space belongs to the VNET I established a peer with meaning resources in VNET1 know how to get to resources in the peered Virtual Network of VNET2.



What if we want to override a default route or create additional routes?

We know that the default routes can not be deleted or modified. However, we can create custom routes also known as a User Defined Route (UDR) to override the default routes.

When we create custom/user defined routes, they are combined with the default system routes. However, if there are conflicting route assignments, user defined routes override the default routes. For example, we have a default system route of 0.0.0.0/0 with next hop to the Internet. This means traffic from a server is allowed to access the internet outbound. We decide to override this route because we want to route our traffic to a Firewall instead of being allowed unrestricted access to the Internet. We could create a custom route of 0.0.0.0/0 with a next hop of Virtual Appliance, and specify a firewall IP address. We now have a conflict because we have two routes with 0.0.0.0/0. One route to the Internet and one to a Firewall. In this case the default system route would change from active to invalid, and the custom route sending traffic to the firewall will take priority and display a status of active. As show in the image below.

We will discuss custom/user defined routes including next hop choices in the next sections.


Azure Custom/User Defined Routes Explained

As explained earlier in this post, we can create custom/user defined routes to override the default system routes or create new routes in addition to the default routes. Let’s dig a little deeper.

Let’s say in our scenario we deploy a simple hub spoke topology as shown in the diagram below.



We know that by default VM1 and VM2 are allowed outbound internet connectivity. Can you remember which default route allows this routing to take place? Remember the default system route 0.0.0.0/0 with next hop of Internet? That’s the one. We also know that Virtual Machines in a VNET can communicate by default, so in VNET1 shown in the image above, VM1 and VM2 both residing in different subnets can communicate with each other. Can you remember which default system route allows this communication to happen within a Virtual Network? We discussed this further up in this post. I’ll let you go and check. 🙂


Creating a custom route table to route traffic to a Firewall
What if there was a requirement to override the default system route of allowing Virtual Machines to route out to the Internet using the default route of 0.0.0.0/0 and next hop Internet. As an example, let’s say in the diagram below, we wanted to route traffic for VM1 located in VNET1 to a firewall deployed in HubVNet. How would we do this? We would go into the Azure Portal, create a route table and add a custom/user defined route. We’ll cover how to do this in the next section below.

Note: you could also use other methods to deploy resources in Azure such as Powershell, CLI, ARM Templates, Azure Bicep or third party tools such as Terraform. I am focusing on deploying via the Azure Portal in this post.


Let’s go through the process of creating a route table and custom routes.

1. In the Azure Portal, search for Route tables


2. Click Route tables to create a new one
3. Input the details as required. I have inputted a few details below for demo purposes.

Propagate gateway routes: If you plan to associate the route table to a subnet in a virtual network that’s connected to your on premises network through a Virtual Network Gateway, and you don’t want to propagate your on premises routes to the network interfaces in the subnet, set Virtual Network Gateway route propagation to No.

When you disable route propagation, the system doesn’t add routes to the route table of all subnets with your Virtual Network Gateway routes. This process applies to both static routes and BGP (Border Gateway Protocol) routes. Connectivity with VPN connections is achieved using custom routes with a next hop type of Virtual Network Gateway. Route propagation shouldn’t be disabled on the GatewaySubnet. The gateway will not function with this setting disabled. If you set Propagate gateway routes to NO and associate the route table to the spoke subnets, the VMs in those subnets do not get the Virtual Network Gateway routes.

4. Click Review + Create
5. Once created, open the newly created Route Table. It will be an empty table without any routes.
6. Let’s create a User Defined Route. From the left pane, click Routes


7. Click the +Add button


8. We have a few options as shown in the image below.


In the next section we will go through these user defined route configuration options.


Azure User Defined Route configuration options explained


Route name: give the route a suitable name.

Destination Type:


We have two options available in the drop down as shown above. We have,

IP Addresses (Address prefixes)
Here you type the destination IP address range. Where is the resource such as the Virtual Machine wanting to go? For example, in the image below, resources in VNET1/Subnet1 are required to communicate with resources located in VNET2/SubnetA. We would specify the address space of VNET2/SubnetA as 10.2.1.0/24, along with some additional configuration we cover later in this post. For now let’s focus on the text highlighted red in the diagram below, destination 10.2.1.0/24.


Why do we need to create a route for VM1 to communicate with VM3 through HubVNET?
VNET peering is non-transitive by default, which means that traffic from VNET1 won’t be able to reach a resource in VNET2 through a VNET in the middle. As shown in the diagram below.

Another example which may help to understand non-transitive. We have three letters A B C. Non-transitive in this example would mean that A and B can talk to each other. B and C can talk to each other. However if A wanted to talk to C or C wanted to talk to A, they could not go through B to reach each other. The same applies with VNET peering. We would need to create a user defined/custom route for this communicate between VNET1 and VNET2 to be possible as per the diagram below.



Service Tag:
From the destination type drop down, we have another option of Service Tag. A service tag includes a group of IP addresses from an Azure service. Microsoft manages the addresses included in the service tag and automatically updates the service tag as addresses change. Therefore, minimising the complexity of frequent updates to user defined routes and reducing the number of routes you need to create. In simple terms, a service tag is a group of IP addresses managed by Microsoft. If new IP’s are added or decommissioned, Microsoft manage this process so we don’t have to. If there was a requirement for you to route traffic to an Azure Service, you wouldn’t need to specify the IP ranges for that service as you could use a built in managed service tag which already includes all the required IP addresses.



Next hop type
The next hop type determines the type of device that traffic is forwarded to when it matches the address prefix/destination type of the route. For example, if the resource wants to reach a specific Virtual Machine in another Virtual Network, how does it get there? Firstly the destination is matched and then the traffic is forwarded to the next hop type such as a Firewall.

Let’s go through the options in the Next hop type drop down list.


Next hop: Virtual Network Gateway


You’ll notice that when you click Virtual Network Gateway, the option to type a Next hop address is greyed out. This is by design. Going off topic slightly but still related, when we configure peering between Virtual networks, there are configuration options available which we must configure if we require for the VNETs to use an Azure Virtual Network Gateway. I won’t be going into details on VNET peering, but I have created a post explaining VNET peering configuration at the following link Azure Virtual Network Peering Options Explained.

Back to next hop of Virtual Network Gateway, to allow for this to work we need to configure two settings.

1. Configure VNET peering options as documented in the link I shared above.
2. Create a User Defined Route

But why use Virtual Network Gateway as a next hop? you may need to route traffic from a VNET to on premises through a Virtual Network Gateway. You may already be aware that one of the use cases of a Virtual Network Gateway is to connect your Azure environment to on-premises. Or you could use a Virtual Network Gateway to forward traffic from one VNET to another in a hub spoke topology. Example, a Virtual machine in VNET1 needs to communicate with a Virtual Machine in VNET 2 via a VNETHub. A Gateway located in the VNETHub could be used as a forwarder to route traffic from VNET1 to VNET2.

Next hop: Internet

Next hop of Internet is one we came across earlier. There was a default route of 0.0.0.0/0 with a next hop of Internet.  If you don’t override this route, Azure routes all traffic destined to IP addresses not included in the address prefix of any other route, to the Internet. You may wish to route the traffic to Network Virtual Appliance such as a Firewall or an inspection/logging appliance.

However, if needed, you can specify Internet as a next hop when you want to explicitly route traffic destined to an address prefix directly out to the Internet, or if you want traffic destined for Azure services with public IP addresses kept within the Azure backbone network. As explained earlier in this post, if traffic is destined for Azure services with public IP addresses, that traffic is kept within the Azure private backbone network and does not route out to the Internet.

Next hop: Virtual Appliance
A virtual appliance is a Virtual Machine that typically runs a network application, such as a firewall. You can use a next hop of Virtual Appliance also known as a Network Virtual Appliance (NVA) to route traffic to an Azure Firewall, Cisco Firewall, Palo Alto Firewall, Traffic Inspection Appliance, and more. Azure supports a number of third party appliances available from the Azure Market Place. You can also access the market place from the Azure portal at portal.azure.com


For example, as per the image below, we have a route table associated with VNET1/Subnet1 with a destination of 10.2.1.0/24 which belongs to VNET2. We have a HubVNet between both Virtual Networks. We know that VNET peering is non-transitive so we associate a route table to VNET1/Subnet1 instructing resources that if there is a requirement to communicate with resources in VNET2/SubnetA, the next hop is a virtual appliance. The virtual appliance in this case is an Azure Firewall located in HubVNet.



Next hop: None

Specify none when you want to drop traffic to an address prefix, rather than forwarding the traffic to a destination. None also represents a black hole. Packets forwarded to a black hole will not be forwarded at all. You will also notice a number of default routes with a next hop of none. This is because Azure creates system default routes for reserved address prefixes with None as the next hop type. If you decide to add another address prefix in your Virtual Network in future, Azure will amend the default route from none to Virtual Network. In the image below the option to specify a next hop address is greyed out as we’re dropping the traffic, it’s going no where, so we don’t need to specify an address.


Next hop: Virtual Network

We came across next hop of Virtual Network earlier. Can you remember where? It was a default system route created automatically which allows resources to communicate with each other inside a Virtual Network. We also have the option available when creating a User Defined Route with the Next hop address greyed out, as shown in the image below.


A question you may have, if there is already a system route which allows resources to communicate inside a Virtual Network, why would you want to create a user defined route with next hop of Virtual Network?

The default route we visited earlier allows all traffic within the IP address name space of 10.2.0.0/16 to communicate with each other. Azure automatically added this route for all subnets within the VNET, so if I created three subnets in my VNET, all subnets will include a default route with next hop to Virtual Network of 10.2.0.0/16. For example, I have the subnets below configured in VNET1,

Subnet1 10.2.1.0/24
Subnet2 10.2.2.0/24
Subnet3 10.2.3.0/24

According to the default route shown in the image below, all resources inside the three Subnets above can communicate with each other within VNET1. This means that traffic sent to any address between 10.2.0.1 and 10.2.255.254 would be routed within the Virtual Network. If I created additional Subnets in future, they would automatically be allowed to communicate with resources in Subnet1, Subnet2 and Subnet3.


But this does not answer why we would want to create a user defined route with next hop of Virtual Network. Let’s move onto a scenario with a possible use case.


Scenario

1. We have a requirement to force all traffic from 10.2.0.0/16 to a Network Virtual Appliance (NVA) for inspection and logging purposes. We must override the default route of next hop of Virtual Network. The virtual appliance is located in the same VNET, but in a different subnet. You may be thinking, we could add a route as follows, destination 10.2.0.0/16 with next hop of the IP address of the Virtual appliance.

Correct, but let’s add a little twist to this scenario,

2. Traffic destined for addresses between 10.2.0.1 and 10.2.0.254 (10.2.0.0/24) remains within SubnetA, rather than being routed to the virtual appliance, because there is no requirement for inspecting or logging the traffic. So the aim here is that if machines within SubnetA communicate with each other within the subnet, there is no requirement to route this traffic to a Virtual Appliance.

The diagram below sums up the above scenario. Traffic from SubnetC needs to be routed to the NVA located in SubnetB for packet inspection/logging. Traffic between VM1 and VM2 inside SubnetA is not routed to the NVA, however, if VM1 or VM2 need to communicate outside of SubnetA, such as communicating with VM3 or VM4 in SubnetC, then that traffic needs to be routed to the NVA first.

How would you accomplish this?


First, we create a User Defined Route with a destination of 10.2.0.0/16 and next hop of virtual appliance 10.2.1.4 as shown in the image below. This route would override the default system route of 10.2.0.0/16 with next hop of Virtual Network.


Below is an image showing what the routes would look like once we apply the route table to SubnetA. As you can see the default route has become invalid and all traffic for 10.2.0.0/16 now routes through an Network Virtual Appliance which has an IP address of 10.2.1.4.


This routes all traffic inside the Virtual network 10.2.0.0/16 to a Network Virtual appliance.

However, we have one more requirement. We don’t require traffic communicating inside Subnet A 10.2.1.0/24 (between Virtual Machines) to be routed to the NVA. But, according to the route we created earlier, all traffic from all subnets will be routed to the NVA.

To resolve scenario 2, we create another route as shown in the image below. The image shows a User Defined Route with destination of 10.2.0.0/24 (SubnetA) and next hop Virtual Network.


This is what the route table looks like now,


We now have two active routes. Traffic for the address prefix 10.2.0.0/16 is routed to the Virtual Appliance. Traffic for the address prefix 10.2.0.0/24 (SubnetA) keeps traffic within the Subnet.

So which user defined route will apply first as both meet the criteria? It’s the one with the longer prefix, in this case CIDR notation 10.2.0.0/24 has a longer prefix and takes priority when traffic is destined for IP’s in the range of 10.2.0.1 – 10.2.0.254 which belongs to SubnetA.

When outbound traffic is sent from a subnet, Azure selects a route based on the destination IP address, using the longest prefix match algorithm. For example, a route table has two routes: One route specifies the 10.2.0.0/24 address prefix, while the other route specifies the 10.2.0.0/16 address prefix. Azure directs traffic destined for 10.2.0.5 to the next hop type specified in the route with the 10.2.0.0/24 address prefix. This process occurs because 10.2.0.0/24 is a longer prefix than 10.2.0.0/16, even though 10.2.0.5 falls within both address prefixes.

So what if there was an IP of 10.2.1.7? Azure directs traffic destined for 10.2.1.7 to the next hop type specified in the route with the 10.2.0.0/16 address prefix. This process occurs because 10.2.1.7 isn’t included in the 10.2.0.0/24 address prefix, making the route with the 10.2.0.0/16 address prefix the longest matching prefix.

If you wish to learn more about longest prefix matching, the following article explains the process in detail. Longest Prefix Match Routing (I am not affiliated with this company).


How to associate a route table to a subnet
Once you have created a route table with your user defined routes, you can attach a route table to a subnet. A route table can be associated to zero or more subnets. Route tables aren’t associated to Virtual Networks. You must associate a route table to each subnet you want the route table associated to.

To associate a route table with a subnet,

1. Search and click Route tables


2. Click the route table you want to associate to a subnet

3. From the left pane, under Settings, click subnets.


4. Click Associate


5. Locate the Subnet and and click OK


and that’s it. I hope you found the post useful.

See you at the next one



Azure Virtual Machine Scale Set Duration and Cool Down Explained

Reading Time: 7 minutes


When configuring an Azure Virtual Machine Scale Set (VMSS), there is an option to configure auto scaling rules. Auto scaling is the process of dynamically allocating resources to match performance requirements. As the volume of work grows, an application may need additional resources to maintain the desired performance levels and satisfy service level agreements (SLAs). As demand reduces and the additional resources are no longer needed, they can be automatically removed to minimise costs.

As part of the auto scale configuration inside a Azure Virtual Machine Scale Set, we can set a duration and a cool down period. In this post, I will focus on explaining the differences between both options based on a couple of scenarios.

Want to learn more about scaling in Azure?
If you wish to learn more about Azure VM Scale Sets, visit the following Microsoft Learn link, Azure Virtual Machine Scale Sets Overview.

In addition to Azure VM Scale Sets, you can also configure scaling rules for a number of other Azure services such as Azure App Service Plans. When configuring scaling rules for Azure App Service Plans, you can also set up auto scaling based on metrics such as CPU usage, memory usage, HTTP queue length and more. Basically, the App Service Plan includes a built in VM Scale Set.

For auto scaling best practices and to learn more about the different services in Azure which include built in scaling, visit the following Microsoft Learn link Autoscaling guidance – Best practices for cloud applications.

Note: it is important that you plan and configure your scaling rules correctly to avoid performance issues, unnecessary scaling and costs due to an incorrect configuration. The metrics used in this post are for demo purposes only.


Duration in Azure VM Scale Set

Duration is the time the VM Scale Set will look back at metrics before making a decision to scale.

For example, in the scaling rule below, I have configured

  • If CPU Percentage = greater than 85%
  • for a DURATION of 10 minutes
  • Increase the Instance/VM (Virtual Machine) count by 1

So for this condition to trigger, CPU must be continuously greater than 85% for a duration of 10 minutes. The VM Scale Set will look back at CPU utilisation for the past 10 minutes and if CPU was constantly greater than 85%, it would add another instance/VM.

Below is a screenshot of an Azure VM Scale Set rule. I have used a green arrow to highlight the DURATION field.


Cool down in Azure VM Scale Set

The cool down period comes into effect after a scale-in (remove VM instance) or a scale-out (Add VM instance) event is triggered. For example, if I set a COOL DOWN period of 10 minutes, this instructs the scale set to not scale again for another 10 minutes. You’re simply asking the scale set to take a break within the cool down period to allow the VM Scale Set to stabilise and check whether the additional VM instance has made a difference to the CPU utilisation.

I have used a blue arrow to highlight the COOL DOWN field in the image below.


Let’s take a look at what the above scale rule configuration looks like on a diagram.

In the diagram below we start with 1 VM in our scale set.



1. At 3pm CPU for VM1 goes above the threshold of 85%, and constantly remains above the threshold for a duration of 10 minutes (until 3.10pm).

2. At 3.10pm the scale set looks back at the last 10 minutes from 3.10pm and 3.00pm, and because the duration of CPU was constantly above 85%, the scale set adds another VM, totaling two VM’s in our scale set.

3. At 3.10pm, the cool down period also kicks in and no further scaling operations take place. However, the cool down down period does not pause time or stop the collection of metrics under the hood. As you can see from the diagram above, adding another VM at 3.10pm makes a difference to the CPU as it normalises between 50% and 60% utilisation, but the metrics will still be analysed and collected. The cool down period is only requesting for the VM scale set to pause temporarily and to not add (Scale-out) or remove (Scale-in) any further VM’s in the configured cool down time of 10 minutes.

4. The addition of one additional VM has stabilized the VM Scale Set and operations resume as normal. CPU is averaging between 50% to 60%.

5. At 3.40pm, CPU utilisation increases and is above 85% for a duration of 10 minutes. At 3.50pm, the VM Scale Set looks back at the duration of 10 minutes and makes a decision to add another VM.

That’s how duration and cool down periods perform. I hope this helped you understand the differences.


Now that we understand the differences between duration and cooldown in an Azure VM Scale Set (VMSS), let’s move onto another scenario.

What if after the second VM was added, CPU did not stablise and constantly remained over 85%. Would the VM Scale Set add another VM straight after the cooldown period, or would it wait another duration of 10 minutes before making a decision to add another VM?

Firstly, if you come across a scenario where after the VMSS adds an additional VM, and it does not make a difference, for example CPU had not dropped below 85%, you should consider investigating and possibly reconfiguring your VM Scale Set rule.

However, because we’re learning, we want to know what would happen in this scenario, right? Ok, another diagram below to explain.

To test this scenario, I deployed a new Virtual Machine Scale Set and configured a VM Scale Set rule as follows,

  • If CPU Percentage = greater than 0.1% (Yes, a silly number, but it’s for testing purposes only!)
  • for a DURATION of 10 minutes
  • Increase the VM (Virtual Machine) count by 1
  • COOL DOWN period of 10 minutes


The diagram above shows that I start with 1 Virtual Machine at 3pm. Because of the silly CPU threshold of 0.1% the condition is met instantly and the VM Scale Set looks back at the last 10 minutes (duration) from 3pm and 3.10pm and adds another VM Instance. A cooldown of 10 minutes is triggered to pause any further scaling operations whilst the VM scale set is stablising. However, as you can see from the diagram above, metrics are still being monitored and recorded under the hood. The additional VM has not made a difference as CPU utilisation is still greater than 0.1%.

At 3.20pm, the cooldown period of 10 minutes has expired.

But what happens now?

CPU has constantly been greater than 0.1% throughout the cooldown period and the additional VM has not made a difference. What would happen after the cool down period?

Is another scale operation triggered and a third VM/Instance is added shortly after the COOL DOWN period at point A 3.20PM shown in the diagram below? or does the VM scale set analyse metrics for another 10 minutes of DURATION before adding another VM at point B 3.30pm below?


Answer:
Another VM is added after the cooldown period at around 3.20pm. The VM Scale Set takes into account the last 10 minutes and metrics included in the cooldown period. Remember, the cooldown period only temporarily pauses scaling operations, but under the hood the time and metrics are still being analysed and recorded.

Below are the scaling logs from my testing,


Let’s zoom in and focus on the time stamp column showing the times the VM Scale Set added a new VM/Instance. Image below.


According to the time stamps above,

  • at 9.07am a VM Scale operation was triggered to add another VM/Instance. Why? Because CPU was greater than 0.1% for a DURATION of 10 minutes. The VM Scale set looked back at the duration from approx 8.57am – 9.07am.

  • when the scale-out triggered, a COOL DOWN period of 10 minutes was also initiated to allow the deployment of the new VM and CPU to stabilise.

  • the 10 minute COOL DOWN period ended at around 9.17am. However, another scale operation was initiated after the COOL DOWN period ended. The VM Scale set did not wait for another 10 minute duration.

  • because I had a CPU threshold of 0.1% set, the VM Scale Set never stablised and was continuously scaling by adding another VM after the cooldown period of 10 minutes.

    What do we learn from this? When the VM Scale Set looks back at the duration, it will include the cool down time and metrics to make a decision if another scale operation is required.

Note: don’t forget to configure a scale-in rule so the scale set can scale-in (remove VM’s) as CPU levels reduce in less busy times. The VM scale set will only add VM’s (Scale-out) but won’t know how to remove VM’s when no longer needed (Scale-In). In my demo, having a CPU threshold of 0.1% would never allow the VM scale set to scale in. Plan and configure your VM Scale Sets as per your requirement.


and that’s it for now. I hope you found this post useful. Any feedback, please free to comment below.

See you at the next post.

Differences between Azure Policy Exclusions, Exemptions and Overrides

Reading Time: 6 minutes


In this blog post I will describe the differences between Azure Policy exclusions, exemptions and overrides.

If you missed out on my previous post on Azure policy inheritance you can find the article at the following link Azure Policy Inheritance explained

Let’s get started and understand the differences and use cases for each.



Azure Policy Exclusions
Azure Policy allows organisations to enforce rules and compliance across resources in Azure. Compliance status is visible within the Azure Policy overview page which provides a single pane of glass view of resources which are compliant or non-compliant in their environments. However, there are requirements where organisations may not want Azure Policy to scan all resources. One of the features of Azure Policy is the ability to exclude certain resources from an Azure policy assignment. This is known as Azure Policy Exclusions. Let’s continue to a couple of examples below where we could use Azure Policy Exclusions.

Example
Your organisation has a policy that audits all storage accounts to ensure the replication for disaster recovery purposes is set to geo-redundant replication.


However, there is one subscription that contains storage accounts used for development and testing purposes that do not need to be scanned by this policy. In this case, you can create a policy assignment that applies to all subscriptions, and exclude the subscription containing the development and testing storage accounts. This way, the policy will apply to all storage accounts except for those in the excluded subscription. Furthermore, those storage accounts from the excluded subscription will not appear on the Azure Policy overview page as non-compliant.

Another Example
Your organisation has a policy that requires all virtual machines to be deployed in a specific region for compliance reasons. However, there is one resource group that contains virtual machines used for disaster recovery purposes that need to be deployed in a different region. In this case, you can create a policy assignment that applies to all subscriptions and resource groups, and exclude the resource group containing the disaster recovery virtual machines. This way, the policy will apply to all virtual machines except for those in the excluded resource group.

Where is the exclude option?

When assigning an initiative/definition the option to exclude appears below the field named scope as shown in the image below.


Let’s move on to Azure Policy Exemptions in the next section below.


Azure Policy Exemptions
The Azure Policy exemptions feature is used to exempt a resource hierarchy or resources from being evaluated. Resources that are exempt are not evaluated because there may be a time bound waiver with an expiration date applied by an engineer. A benefit of exemptions is that they are audited, including the reason why an engineer created an exemption, the name of the engineer, and time the exemption was created. All exemptions can be tracked in the Azure Policy portal. Let’s continue to a use case of an exemption below.

Example
Let’s say we are applying the built-in definition “Storage accounts should disable public network access” with an effect set to audit. We find that the compliance assessment shows that a storage account named “imransstorageaccount” is found to be non-compliant, but it must have public network access enabled for business purposes. How do we get around this non-compliant resource? We can create an exemption and type a reason why “imransstorageaccount” was exempted. Once the exemption is created, “imransstorageaccount” will be shown as exempt in compliance view in Azure Policy.

Note: It is important to regularly review your exemptions to ensure that all eligible items are appropriately exempted and to promptly remove any that no longer qualify for exemption.

Where is the exemption option?
1. Access Azure Policy and click Assignments under the section Authoring. As shown in the image below.


2. Click the ellipsis icon (the three dots) by the policy assignment you wish to exclude and click create exemption,


3. Once the exemption window launches you can select an exemption category of Waiver, for example if you are due to delete/decommission the resource. The second option is mitigated, where you may have resolved the non-compliance issue via a different method. Providing a reason why a resource is exempted allows for the reason to be audited.


You can also apply an exemption expiration date.

The policy exemption isn’t deleted when the expiry date is reached. The object is preserved for record keeping, but the exemption is no longer honored and Azure Policy will scan the resource again and mark it as compliant or non-compliant.

Exemption history can be viewed by clicking Exemptions located under the section Authoring. As shown in the image below.

So what is the difference between Azure Policy exclusions and exemptions?

1. Exemptions can be time bound, for example, exempt for 1 month and then start scanning and reporting on the compliant status again.
2. Exemptions are audited so it is possible to check why a policy was temporary or permanently exempt.
3. Exemptions are configured after the resource has been scanned and appears as non-compliant. We can then create an exemption for the non-compliant resource if required.
4. Azure Policy Exclusions are different. When we create an exclusion, Policy does not scan the excluded scope or mark it as compliant or non-compliant because it is ignored. Exclusions can not be configured to be temporary/time bound with an expiration date.



Azure Policy Overrides (In preview at the time of writing)
Overrides is a feature which offers a capability different to exclusions and exemptions. The overrides property allows you to change the effect of a policy definition without modifying the underlying policy definition and in turn reducing the management overhead. Let me give you examples of use cases below.

Example
You have assigned an initiative which includes/groups together a number of single definitions. Let’s say that one or more of the definitions inside your initiative has an effect of audit, however you wish to change the policy effect to deny. Azure policy override allows you to override the effect from audit to deny so that you don’t have to recreate azure polices or change the effect parameter within the policy template/JSON code. I could simply override an existing effect of deny to audit by editing the existing assigned policy and adding an override of change effect from audit to deny. I can add and remove overrides when needed.

Another example
I have a policy initiative named Security that includes several policy definitions, such as RestrictVMSize and RequireSQLDatabaseAuditing. The default effect of these policy definitions is audit. However, I now want to change the effect of these policy definitions to deny without modifying the underlying policy definitions or amending the parameter effect in the policy definitions template.

To achieve this, I can use Azure Policy overrides to change the effect of the policy definitions to deny. I can create an override for each policy definition and set the effect to deny. If needed, I can remove the override at a later date which will revert the effect back to the original effect of audit without me having to change the JSON file.

Where is the override option?
An override can be applied to a policy at the time of policy assignment or by editing an existing assignment.

1. Click Assignments under the section Authoring.

2. By the assignment name, click the ellipsis icon (…) and click edit assignment.

3. Click the advanced tab

4. Click Add override

5. Click override value and change the effect.

6. In my case, I have a policy which does not allow resources to be deployed to any region apart from UK South. My policy effect is currently set to deny. For demo purposes, I am going to add an override with an effect of audit. Click Add to apply the override and then click review and save.

Now the policy will not deny when resources are built outside of the UK South region but audit only and mark them as non-compliant. I can remove or edit the override when needed. Note an override was added to this policy for demo purposes only. Plan your overrides accordingly.


That’s it. I hope you found the post useful

See you at the next one

Azure Policy Inheritance explained

Reading Time: 7 minutes


In this blog post I will explain how Azure Policy inheritance works using different scenarios/demos. If you are not aware of what Azure Policy is, click the following Microsoft Learn link to learn more, Overview of Azure Policy.

When we assign Azure Policies in our Azure environments we have the option to scope a policy at different levels including Management Groups, Subscriptions and Resource Groups as shown in the image below. The diagram below shows that we can apply Azure Policies at three levels, Management Groups, Subscriptions and Resource Groups.


Like RBAC (Role Based Access Control) permissions, policies also inherit from top down in the Azure Hierarchy. So if I scoped/assigned a policy at the Management Group level, that policy would inherit down to subscription, resource group and the three resources in my diagram, Virtual Machine 1 (VM1), Virtual Machine 2 (VM2) and Virtual Machine 3 (VM3). I have used virtual machines in my example, but this could be any type of Azure resource.

If I was to assign a policy at the Management Group scope allowing engineers in my organisation to deploy Virtual Machines to the Azure UK South region only, that policy would inherit down to the subscription and resource group. Therefore, my engineers would not be able to deploy Virtual Machines to any other region apart from the UK South region in the existing subscription and resources group, and any new subscriptions or resource groups deployed under my Management group in future as the policy would automatically apply.


Let’s go through a few commonly asked questions,

Question 1:
If you apply a policy to the Management group which only allows resources to be deployed in the UK South Region, and that policy is inherited by subscription and resource group, what would we expect to happen if I deployed a new Virtual Machine called VM4 to a different region such as as East US into the resource group?

Result
The deployment would be prevented by the policy and the engineer would not be allowed to deploy to the East US region because the policy scoped to the management group is being enforced. As shown in the image below. I have tried deploying VM4 to the East US region and been denied due to the policy applied at the Management Group level which has inherited all the way down, by design.


Question 2:
In the example above, you tried deploying to resource group named ResourceGroup1 and the policy denied you from trying to build a new VM4 in region East US. What if you create a new resource group name ResourceGroup2 and try deploying VM4 to East US or any other region inside the newly created resource group?

Result:
It will be denied, why? because the policy applied to the Management Group applies to all new resource groups under that Management Group. The new resource group ResourceGroup2 inherits the policy from above. Again, the below image shows the policy preventing VM4 from being deployed in East US in a new resource group named ResourceGroup2.


Question 3:
What about the resources (VM1, VM2 and VM3) which were deployed and already exist in Resource Group1 as shown in the diagram below. What would happen if VM1 was located in UK South and VM2 and VM3 were located outside of the UK South region, before you applied the policy? what would happen to existing resources that are already there?


Answer:
If the three resources, in my case three existing virtual machines VM2 and VM3 were deployed outside of the UK South region and VM1 was deployed in UK South region, then Azure Policy would display VM1 as compliant for being deployed in UK South, and the other two VM2 and VM3 as non-compliant. The Azure policy overview page would report this.


Question 4:
What would happen if you created another policy and assigned at the resource group level, but the new policy assigned to the resource group allowed deployment to East US region only, so denying UK South? You now have one policy applied to the Management Group which allows deployments to UK South region only (inherited down), and a new policy assigned to the resource group which only allows deployments to East US. Which one policy would win? as show in the image below.


Result
I leave the existing policy assigned to the management group, allowing resources to be deployed in UK South only. I have created a new policy and assigned to the resource group which only allows resources to be deployed to East US region. We now have conflicting policies.

The images below show that I have applied a policy to ResourceGroup1, allowing resources to be deployed inside that resource group but to East US region only.


What was the result?

1. Was I able to deploy a new virtual machine called VM4 in ResourceGroup1 in the UK South Region? No, denied as per the image below.



2. Was I able to deploy a new virtual machine called VM4 in ResourceGroup1 in the East US region? No, denied as per the image below.


Are you confused? 🙂

Are you thinking, I thought that adding a policy to ResourceGroup1 directly would override the Policy assigned to the Management Group level and allow me to build resources in East US region?

Or maybe you are thinking the opposite, that the policy assigned at the management group level would still take precedence/priority and allow a resource to be deployed to UK South but not East US.

Or maybe you’re thinking that it would be possible to build resources in ResourceGroup1 in both UK South and East US regions.

Or may be you knew what the result was going to be, in this case, I am not allowed to deploy to any of the regions.

Or you were thinking something else. In that case, please do drop a comment towards the end of this post to let us know.

Why was I not allowed to deploy to any of the regions
This is because both polices are applying together and a deny takes precedence/priority. In this case, both policies are denying. The policy at the management level denies/restricts deployments to East US as it only allows deployments to UK South. The policy at the resource group level denies/restricts deployments to UK South region as it only allows deployments to East US. The most restrictive/deny policy will take effect, in this case both of them are restricting/denying deployments, therefore both policies apply and deny access. As per the diagram below,


Question 5
What would happen if you apply two policies but the same configuration to both Management Group and resource group. So allow deployments to UK South only scoped to the Management Group and allow deployments to UK South region only at the resource group? as per the image below.


Result
You would be allowed to deploy to the UK South region as both policies are allowing at Management Group and Resource Group. In my case I removed the parameter of EAST US region at resource group level and added UK South. You may need to wait up to 30 minutes before new changes take effect.

The result, there was no error/warning when I selected UK South, as shown in the image below.



Question 6
Could you exclude policies from being applied to certain resources? For example, VM1, VM2, VM3 which already exist in a resource group, or if I want to apply a policy but I want to exclude certain resources from the policy. Is this possible?


Answer
Yes, you can exclude a subscription, resource group or specific resources so a Azure policy does not apply. There is an exclusions option.

There are also other options available such as exemptions and overrides. For more information on Azure Policy Exclusions, Exemptions , visit my post Differences between Azure Policy Exclusions, Exemptions and Overrides


Question 7
The built in ‘allowed locations’ policy you used is denying you to deploy resources in the regions not specified in the Policy. Is there a deny setting somewhere that you have to enable?

Answer:
The policy I used already is configured to deny, known as an effect. We can check this by accessing the JSON (Java Script Object Notation) code which is what the policy is using under the hood.

1. Go to Azure Policy
2. Click Assignments from the left pane under the section Authoring
3. Click the three dots … as shown in the image below


4. Click view definition (This will display the JSON code used under the hood, it is how the policy is compiled)
5. Scroll down to “effect”:


You can’t change the effect of a default/built in policy, the option to edit or delete built in policies/initiatives are greyed out. However, you can duplicate an existing definition, give it a new name and amend as required.


A list of effects and explanations which can be used in Azure policies are documented at the following Microsoft Learn page, Understand how effects work – Azure Policy


I hope you found the post useful.

See you at the next post.