Windows Azure SDK 2.2 introduces the concept of a topology blast. This blog post will describe how topology changes happen at the fabric level and how you can take advantage of topology blast to build a more robust service.
Definitions:
- Role Topology– The number of instances, number of internal endpoints, and composition of internal endpoints (ie. internal IP addresses of instances, also known as DIP addresses ).
- Topology Change– Any change in this topology. Typically a scale up or down of the number of instances, or a service healing event which causes one VM to move to a new physical server and obtain a new internal IP address.
- Rolling Upgrade– The process the fabric controller uses to make changes to a hosted service. The fabric controller will send the change to Upgrade Domain #0, wait for all instances in UD #0 to return to the Ready state, and then move to UD #1, continuing until all UDs have been walked. See the Upgrade Domain information at http://msdn.microsoft.com/en-us/library/windowsazure/hh472157.aspx.
- Topology Blast– A new feature to allow topology changes to be sent to all UDs at one time, bypassing the normal UD walk.
- topologyChangeDiscovery– Use this .csdef <ServiceDefinition> attribute to control the type of topology change your service receives. topologyChangeDiscovery="Blast" will turn on Topology Blast.
- RoleEnvironment.SimultaneousChanged/SimultaneousChanging– These events are raised in your code when a topology change happens and you have set topologyChangeDiscovery="Blast".
* Note that you must have an InternalEndpoint defined in order to receive topology changes. If you turn on RDP for your service an internal endpoint is implicitly created.
Picture 1 – Hosted service with 3 instances
Picture 1 shows a standard Windows Azure hosted service with 3 role instances. Each instance has a Microsoft.WindowsAzure.ServiceRuntime.RoleEnvironment.CurrentRoleInstance.InstanceEndpoints list with the endpoints pointing to the correct DIPs for all other instances.
Picture 2 – Standard topology change and rolling upgrade
Picture 2 shows a standard rolling upgrade after a topology change has occurred.
- The server hosting IN_2 (with original DIP 10.31.70.8) has had a failure. The fabric controller automatically detects this failure and recreates IN_2 on a new server. IN_2 receives a new DIP of 10.25.18.2. This constitutes a topology change.
- The fabric controller begins the rolling upgrade process in order to notify the rest of the instances that there has been a topology change. Each instance will receive a RoleEnvironment.Changing and RoleEnvironment.Changed event, with the change of type ServiceRuntime.RoleEnvironmentTopologyChange. The InstanceEndpoints list will be updated with the new DIP(s).
- In Picture 2 the fabric controller is currently processing UD #0 and IN_0 has a correct InstanceEndpoints list. When IN_0 attempts to communicate with IN_2 using an InternalEndpoint it will connect to the new DIP 10.25.18.2.
- IN_1 is in UD #1 and has not yet been notified of the topology change and IN_1 is still using the old incorrect InstanceEndpoints list. When IN_1 attempts to communicate with IN_2 using an InternalEndpoint it will fail to connect to the old DIP 10.31.70.8.
Depending on the architecture of your service and how well your code tolerates communication failures this scenario of some instances with the correct InstanceEndpoints list and some with an incorrect InstanceEndpoints list can cause significant problems in your application.
Picture 3 – Topology change with Topology Blast enabled
Picture 3 shows a topology blast after a topology change has occurred.
- The server hosting IN_2 (with original DIP 10.31.70.8) has had a failure. The fabric controller automatically detects this failure and recreates IN_2 on a new server. IN_2 receives a new DIP of 10.25.18.2. This constitutes a topology change.
- The fabric controller initiates a topology change. Because the service has set topologyChangeDiscovery="Blast" the fabric will initiate a topology blast and send the topology change to all instances at the same time.
- Both IN_0 and IN_1 receive a RoleEnvironment.SimultaneousChanging event at the same time with an updated InstanceEndpoints list. Both instances are now able to successfully communicate with IN_2.
Note that your architecture must still be tolerant of the communication failures that will happen from the time that the server hosting IN_2 fails until the fabric recreates IN_2 and sends the topology change.
Turning on Topology Blast
Topology Blast is enabled per deployment. To turn this on for a deployment set topologyChangeDiscovery="Blast" in the csdef. Your service will now begin receiving topology blast configuration changes.
<ServiceDefinition name="waTestFramework"
topologyChangeDiscovery="Blast" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" schemaVersion="2013-10.2.2">
Optionally, if you need to execute code to respond to a topology change you can implement the following events:
Topology changes will now raise the RoleEnvironment.SimultaneousChanged/SimultaneousChanging events instead of the default Changed/Changing events. Add handlers for these two events in OnStart and then implement your code in the appropriate event handler. These new events behave the same as the old ones with two exceptions:
- With topology blast turned on, only topology changes will fire the Simultaneous* events and all other types of changes will fire the standard Changed/Changing events.
- The SimultaneousChangingEventArgs does not implement a Cancel property. This is to prevent all role instances from recycling at the same time.
publicoverridebool OnStart()
{
// For information on handling configuration changes
// see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
RoleEnvironment.SimultaneousChanged += RoleEnvironment_SimultaneousChanged;
RoleEnvironment.SimultaneousChanging += RoleEnvironment_SimultaneousChanging;
returnbase.OnStart();
}
void RoleEnvironment_SimultaneousChanging(object sender, SimultaneousChangingEventArgs e)
{
// Add code to run before the InstanceEndpoints list is updated
// WARNING: Make sure you do not call RequestRecycle or throw an unhandled exception
}
void RoleEnvironment_SimultaneousChanged(object sender, SimultaneousChangedEventArgs e)
{
// Add code to run after the InstanceEndpoints list is updated
}