Sysadminsblog.com Anything for sysadmins!

15Jul/10110

Full failover with two Exchange 2010 Servers

Every sysadmin runs into the problem at some time; switching to a newer version of Exchange. Hopefully most of you can migrate to Exchange 2010 within the forest. However sometimes it just makes more sense to setup a new forest with the new version of Exchange, SharePoint, etc. In my case it makes more sense. It takes a lot more work, but in the process I’m able to update all the servers to Windows Server 2008 R2 as well. Having all the servers on the same version of Windows saves me time on management. I’m getting side-tracked! Let’s get back to Exchange and the problem at hand: DAG Failover with two Exchange 2010 Servers.

If you haven’t read up on the functioning/existence of Database Availability Groups (DAG) and CAS failover of Exchange 2010, you’ll probably think that it’s a breeze. However that’s not the case. The new failover isn’t really build on a two server setup. DAG uses Windows’ Failover Clustering to provide failover on the Mailbox Database level. This work really well, but comes with one huge disadvantage. Failover Clustering is not compatible with Network Load Balancing (NLB) and NLB is used for failover of the Client Access Server (CAS) role. As an alternative one could use a hardware or software load balancer that load balances TCP/IP traffic, but those don’t come cheap, which doesn’t really make sense for the smaller shops. But a solution is near!

The solution

After a lot of thinking, discussing and experimenting I came up with a solution. While using the standard Windows Failover Clustering for DAG I can use the Client Access Server Array (

Get-ClientAccessArray

) without NLB for failover of the CAS role. However instead of having NLB switching the active server I’ll have to script which server is active. My current default answer for scripting and automation applies here: “Let’s PowerShell it!”.

First I tried to change the RpcClientAccessServer directly, but that didn’t have the right effect. A colleague suggested to use the CAS array and just activate the CAS array IP on the active server. This made a lot of sense as NLB does something similar. So let’s go through the steps!

  1. Create a CAS array
    [Powershell]New-ClientAccessArray -Fqdn “name.domain.local” -Site “AD-Site-Name”[/Powershell]
  2. Create the A record in DNS for your newly created CAS array and have it point to an available IP that can be used by both Exchange servers.
  3. Add the CAS array IP on one of the available network adapters of the active server by using the command
    [Powershell]netsh in ip add address “Adapter Name” 192.168.0.xxx 255.255.255.0[/Powershell]

You’re not done yet! You can connect an Outlook client to a mailbox. AutoDiscover should now use the CAS array DNS as the connection point. You can check the connection point by right-clicking the Outlook taskbar icon while holding the CTRL button and selecting Connection Status.

If you don’t see the right hostname in the server name field, you should check the results of the AutoDiscover. You can use the option Test E-mail AutoConfiguration in the CTRL + right-click menu of Outlook or you can use the website testexchangeconnectivity.com to test you AutoDiscover results. You can use this site to test almost any aspect of your Exchange connectivity. You should get something like the following as a result.

<?xml version="1.0" encoding="utf-8"?>
<Autodiscover xmlns="http://schemas.microsoft.com/exchange/autodiscover/responseschema/2006">
<Response xmlns="http://schemas.microsoft.com/exchange/autodiscover/outlook/responseschema/2006a">
<User>
<DisplayName>Test</DisplayName>

<…>

<DeploymentId>fd337f53-17f0-47a1-b92d-dc549fac3b65</DeploymentId>
</User>
<Account>
<AccountType>email</AccountType>
<Action>settings</Action>
<Protocol>
<Type>EXCH</Type>
&lt;Server&gt;<span style="background-color: silver;">exchange.domain.local</span>&lt;/Server&gt;
&lt;ServerDN&gt;/o=&lt;domain netbios&gt;/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=<span style="background-color: silver;">exchange.domain.local</span>&lt;/ServerDN&gt;
&lt;ServerVersion&gt;7380827F&lt;/ServerVersion&gt;
&lt;MdbDN&gt;/o=&lt;domain netbios&gt;/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=<span style="background-color: silver;">exchange.domain.local</span>/cn=Microsoft Private MDB&lt;/MdbDN&gt;

&lt;…&gt;

The magic script

Time to implement the PowerShell script. The script will take care of the CAS failover by activating the IP on one of the Exchange 2010 servers. The script uses ping to make sure that the other host is still reachable. I’ve scripted it to check if it can ping the gateway before doing a CAS failover just to make sure that it’s not a network-wide issue. It will also check the database copy status (Get-DatabaseCopyStatus) to make sure that the mailbox database has also done its failover to the host running the script.

There are a couple of variables that you need to set in the script before you can run it properly. There should be enough comments in the script to figure it out, otherwise just leave a comment and I’ll be sure to answer! You might also want to change the text of the e-mail that is being send in case of failover.

You can download the script here!

There are a couple of settings that you have to edit in the script to customize it for your environment and preferences. I won’t go into this any further as it’s quite well explained in the script itself. If you do have questions please comment on this post and I’ll get back to you as soon as I can.

$Limit = "10" # Ping fails before failover attempt
$Gateway = "&lt;hostname/IP&gt;" # Gateway of this server
$Hostname = "&lt;hostname/IP&gt;" # Hostname of the other Exchange Server
$LocalHostname = "&lt;hostname/IP&gt;" # Hostname of the local Exchange server
$MailTo = "Name &lt;email@address.com&gt;" # E-mail address where the failover e-mails will be send to
$MailFrom = "Name &lt;email@address.com&gt;" # E-mail address shown in the from field
$IP = "&lt;IP&gt;" # Failover IP address that will be added to the server (IP of ClientAccessArray FQDN)

Creating the script account

Create a new domain user and add it to the View-Only Organization Management role using the Exchange Control Panel (ECP). You can access the ECP by going to https://<servername>/ecp/. This however doesn’t provide it with permissions to allow remote PowerShell. You can grant the permissions by running

Set-User FailMon -RemotePowerShellEnabled $True on the Exchange Management Shell (EMS).

<img alt="" src="http://www.sysadminsblog.com/wp-content/uploads/2010/07/070710_1725_Fullfailove1.png" />

Also add the domain user to the local administrators group to give it the appropriate permissions to run the task on the right level.
<h3>Scheduling the script</h3>
To have the task start, you can use the task scheduler. The script can be fired as much as you like, as it will only spawn one instance.

powershell.exe -command "&amp;{C:\Scripts\FailoverMon.ps1}"

Make sure that, while creating the scheduled task, you select an account with the appropriate permissions on your Exchange organization.

I've set my task to trigger (as shown) on startup with a repeat of every 10 minutes.

If you want to know more on how to schedule a PowerShell script, please check here.

Testing

When the tasks are scheduled and running on both the servers you can try a failover by shutting down the server which currently has the CAS array IP address. You should see it failover with the DAG and you should also see that the IP address is added to the other server. The client will get a notification on Outlook that the administrator made changes to the configuration and that Outlook need to restart to work properly.

Failback

Failback is still a manual process if you want to failback to the previous server. To do so, you'll have to make sure that the previous server is configured with the CAS array IP address and that the DAG indicates that it's healthy. Then it's just a matter of manually removing the CAS array IP address on the server that doesn't need it anymore. Then it'll automatically detect that it's switched servers and the above dialog is again presented to the client. You can use the following command to remove the IP address from the server.

[Powershell]netsh in ip delete address "Adapter Name" 192.168.0.xxx[/Powershell]

Edit: Please note that this setup is not supported by Microsoft.
If you still have questions or when it just doesn't work for you, please let me know by commenting on this post.

Update: The script is available again through this link.

Be Sociable, Share!

Posted by Mischa Oudhof

Comments (110) Trackbacks (3)
  1. Dear Mischa,

    The article you posted is very valuable to understand and configuration of High Availability using just two servers.

    I have seen somebody on the internet commented that if like we can create new cluster resource name and IP address in the same snap-in of DAG cluster then create CAS array with this new IP address. This is will not be load balance but fail-over will do.

    I haven’t tested this scenario however looking for your thoughts on this.

    Regards,

    Shahid (Bxperts)

    • Loadbalancing can only be done with hardware or with software that works on the TCP/IP layer. The DAG cluster and CAS array are 2 seperate things. DAG is to make the mailbox databases highly available, and CAS array is to make the Client Access Roles highly available. It is possible to have the CAS service running on a different server than the DAG.

      If you’re using my scenario, you’ll definately break things when you’re using the same IP for both since you’ll be switching the IP to a different server for the CAS array. The IP address used for DAG should be an available IP in the MAPI network. Since you’re using the IP for the CAS array, it doesn’t qualify.

  2. Thanks for the nice tricks!

    Is the solution above can also be applied if two exchange servers are located at different subnets (Production 192.168.1.x & DR site 192.168.3.x)?

    Rgds,
    Jason

  3. Thank you very much for sharing this smart script, that is exactly what i was looking for. I have successfully tested it in my environment and i am goint to use it in production. I have just one doubt, why do you schedule the script every 5 minutes? Wouldn’t be enough to run it once, since it will be always running in the background?
    Thanks in advance

    • Good to hear that you’re going to use the script. I’ve been running the script in production for a while now, and it still hasn’t failed me even once.

      The disadvantage of these kind of scripts is that they are forgotten most of time when doing a reboot or any other maintenance. If you schedule it every 5 minutes without allowing multiple instances it will run only once, but if it’s disrupted in any way, it will fire up again within 5 minutes.

      In short it’s a precation.

  4. Thanks for the reply!
    I have one more doubt. Is it necessary to add my mailbox database to the cas array? This should be done when using software NLB, but should i do it also with your suggested configuration? So, after i created the CAS array caa.mydomain.local should i do someting like

    Get-MailboxDatabase | Set-MailboxDatabase -RPCClientAccessServer “caa.mydomain.local”

    thank you in advance

    • Yes, you need to create the CAS array for the RPC Client Access Server setting to be effective. By adding the mailbox database to the CAS array you create a single DNS access point for you clients to connect to. By switching the caa.mydomain.local IP from server A to server B you force the clients to connect to server B during a failover. That is what the script does. It assigns the IP to the server that is still visible after losing sight of the other server (with of course a couple of checks to prevent a split brain situation).

  5. Great Article
    I have another ? I have at two nodes in primary and one in the DR site . What is good thirdparty solution besides Microsoft. I saw CA ARCserve Replication, but its pricey. My current latency is 22 ms

    • Disaster recovery is of course very important, but you need to test it regularly. Unfortunately most of us don’t have the time to do this. That is why I’ve setup a failover location instead of a disaster recovery location (also because I’ve virtualized everything). Your time to recover is much faster with a failover then with DR. Failover will also allow you to still use the server and not let all that processing power go to waste. With a latency of 22ms I would suggest adding the server to the farm and use the Exchange replication for the mailbox databases.

      However if you still want to use DR you have several options. If you’ve virtualized the exchange environment you can use snapshots of your Exchange server/mailbox databses with transaction log shipping and in some cases (EMC) there are builtin tools that allow you to replicate exchange data between SANs. If this is not the case you can perhaps use your backup suite (Symantec Backup Exec) to do continuous replication of your mailbox databases. CA ARCserve replication is also a good solution, but I have never used it.

  6. Great Script!! Manual failover (within a reasonable time frame) works for us. If I use split DNS (separate internal and external) and setup a CAS Array, could I manually change the appropriate A record for the CAS Array FQDN to point to the functional server or the server I prefer to have active for all of the CAS services either internal or external? I guess this would be sort of a DNS based failover??

  7. I haven’t set up a CAS Arrary yet so this may be a stupid question. I used .20 as my mail.xxx.com, autodiscover.xxx.com, owa.xxx.com IP. It is not a server IP it is a separate IP. I have added that IP to my main server and that is how I have been running all services including POP, IMAP. My server IPs are .13, .14. and .15. Can I use that same IP (.20) to set this up the CAS Arrary or does the CAS Arrary need to have it’s own? In your script you say you need to edit the script. $Gateway = “” # Gateway of this server – —- hostname?, the gateway is an ip. The way I do it now, I tried removing the IP from the main server and moving it to the backup on and it didn’t send mail. I did not you netsh. (is that why it didn’t work or because I didn’t set up CAS Array?_ I am trying to do the upgrade to SP1 and didn’t want to have the system down for an hour. I am hoping, I am close. LOL. Thank you very much for your work. Pam

  8. I looked into the script. Very nice. Please forget my question on the gateway, I will just put in the IP. Can this script be used to CAS three servers? I really don’t need to use the third one as a CAS if it doesn’t. It is mainly for the DAG.

    • If you’re using the 3rd server only for the DAG you won’t have to include it in the CAS array. Only include it in the CAS array when it’s assigned the CAS role.
      The script should be capable of using 3 servers, but it’ll need some adjusting.

  9. Hi there Mischa,

    Would just like to thank you on the excellent tuts. I have the exact problem, plus the two server’s are running remote desktop services.

    Just a question on the script. You said that it is a manual process, to revert back to the primary DC. What if you added a “if” command to the primary DC’s script, so that when it comes up, it will check to see if it is active. If its not, then you can run a command to the second DC, instructing it to netsh itself, releasing the IP to the main DC.

    Just a thought

    Keep up the awesome blog. Cheers

    • That is of course possible. I just didn’t include it to make sure that it wouldn’t switch back and forth all the time. Also when it failed over it did that for a reason, and that needs to be looked into and resolved before it should resume its function. After that you can opt for doing a failback, if needed of course.

  10. One last question. I don’t see in your instruction to attach the databases to your cas array. Do I need to do that? I know you do in NLB but not sure on this. What command would you suggest?

    • You’re correct, you can find it in one of the comments from max70:
      Get-MailboxDatabase | Set-MailboxDatabase -RPCClientAccessServer “caa.mydomain.local”

      This will set the RPCClientAccessServer property of all your Mailbox Databases

  11. This looks like a good solution for redundancy on a budget. Thanks for this. How do you do with certificates? Do you add the same certificate on both servers?

    • Both servers have certificates from our internal CA. Externaly I use a wildcard certificate that was purchased and has been installed on the Forefront Threat Management Gateway (TMG).

  12. It may be a dumb question but does the script need to be installed on both servers?

    • There’s no such thing as a dumb question, and I can see why you ask this. The answer is Yes. They both need to be able to detect when they need to be the active server. A check against the gateway will make sure that the server knows when it’s having network issues itself.

  13. please provide an alternate link for the script

  14. Thanks Mischa Oudhof. The download is working now 🙂

  15. It’s absolute sad that you can’t do DAG and NLB on the same Sever for SMB. But veen then MS internal does not recommend NLB anymore (See my links). They only recommended solution is a Hardware Load Balancer. You solutions is LIKE ROUND ROBIN DNS and not supported.

    When using 2 DAG Server i asume you have a certain budget for Exchange and it’s one of the most used in your company. Why not a network load balancer. They start around EUR 1700.- today. Mayb your need two of those and maybe you need a more larger modell then the one for that price.

    http://www.load-balancer.info

    The solutions you mkention is clearly not supported my MS Support. And if you ever have to go deep level because of some failout with DAG or CAS you are dead. Mail want work for some days and the money you saved you loose…. Worst case someone losses his jobs because mail is down for one week.

    http://www.stevieg.org/2010/11/exchange-team-no-longer-recommend-windows-nlb-for-client-access-server-load-balancing/
    http://channel9.msdn.com/Events/TechEd/Europe/2010/UNC311

    • The solution is definately not Round Robin DNS. As you only assign 1 server the IP of the CAS array. with Round Robin DNS you should assign multiple IPs to 1 DNS entry. The script that I wrote is more like a DNS based failover then anything else. Assigning the IP to a different server allows the client to connect to the same address but located on a different server. This is also mostly how hardware LB works (except for switching the IP), but then the checks are also done by the LB and it makes the decision to which server the request is going.

      The budget needed for a hardware loadbalancer is not just based on the price of the product, but also based on things like airco, rackspace, power capacity, maintenance and support fees. Also a lot of loadbalancers have support contracts for updates and product support. As you’re using DAG you probably also want to make sure that the whole environment is redundant and without single point of failures. Then you need 2 units that can do failover amongst eachother as well. All of this combined makes for quite a costly solution.

      A valid consideration is indeed the cost of downtime. I never had any problems with our downtime. If a DAG fails and I can’t fix it quickly, I can break the DAG down and run all mailboxes from 1 server while I find a solution for the DAG problems. This whole process will take me 10 minutes tops. Whether or not this is acceptable depends on your SLA.

      The solution is not supported by MS, and I neglected to mention that in my post. I added a note that the solution is not supported by MS. Thank you for the headsup!

  16. First of all, Thanks Mischa Oudhof for the script.
    I am having a problem in the steps before the implementation of the script. Sort of in the first step 😛
    I created a new casarray and assigned it to my databases with the command
    set-mailboxdatabase -rpcclientaccessserver “casarrayname”
    I also created an A record in the dns. Now the servername in the outlook connections is still pointing to my old server exchange1.mycompany.com and not to the casarrayname. This only occurs with outlook 2003 and 2007 clients. Outlook 2010 users get updated fine.
    Does anyone have a solution for this?
    THANKS

    • You’re most welcome!

      Unfortunately the older Outlook clients will not pickup the new name from autodiscover like Outlook 2010 does. I did find a couple of ways to get them to update, but they all require you to access Outlook or your users to perform an action.

      Delete and recreate the Outlook profile
      Use the repair function in the account settings of Outlook

      The last one is probably the best way to go as this doesn’t reset any settings and takes less actions.

      I haven’t had a chance to test these so please let me know how this works out for you!

  17. Thanks Mischa Oudhof, both ways worked for me. Though i prefer a profile repair over recreation of a profile. Now i need to get my autodiscover to work over the internet for my OA users.

    Thank you again, you saved me a lot of searching.

    • Repair option is indeed the easiest one to use.

      Autodiscover is pretty straight forward. An easy way to check the results is right-clicking on the icon in the notification area while holding CTRL. Then select Test Email Autoconfiguration. Microsoft also created a site for checking external connectivity to Exchange servers which helped me a lot!

      http://testexchangeconnectivity.com/

  18. I did test it through http://testexchangeconnectivity.com/ but i had the internaluri messed up. Changed the internaluri and autodiscover works fine now.
    Thanks again 😀

  19. Excellent post! One small thing.. the name of the network connection is hardcoded to Intra. Not a difficult thing to change, but would be great if it was a variable.

  20. Dear Mischa,Thanks for your article it was very helpful.
    Shahid Mushtaq mentioned in the first comment another way. Once I understood what he was saying, I tried it.
    It works and totally eliminates the need for the PowerShell script. When you create the CAS array and give it a name, let’s say outlook.mydomain.local, point your DNS Host record for “outlook” to the DAG IP address instead of a new one. Make sure you change the RpcClientAccessServer name on the database to match the CAS Array name. Then when DAG failover occurs, DAG just points the IP to the other server once the healthy database becomes mounted.
    The client experiences a brief disconnect and reconnect without any need to restart outlook. I have tested this several times and it has worked flawlessly every time. It also allows for failback without any intervention.
    Works great for me.

    • Now that you explain it like this I can see how it was meant. That is indeed a good suggestion, however I don’t believe that you will be as flexible as the solution above with the script. The script also sends out an email stating that a failover has occured. This is quite important in my situation as the failover location has a higher latency than the onsite server. However in other situations one could see the script as a fragile part of the failover process.

      Thank you for taking the time to explain!

    • Hi James,

      do you have some more information on how to configure this?
      I would also like to try en test this solution?

      Please let me know.
      Many thanks.

      • I’ll respond to both comments in this one.

        You can always try it in PowerShell. However this will definitly work as production Exchange 2010 servers are always on x64 versions (the x86 versions are not supported for production).

        To monitor the script you could run it while logged in as an account that has enough permissions on exchange. It should produce enough output to let you see how it works.

        There is not much to configure. Just change the parts in the beginning to the correct values. These have been commented in the script.

        If you need more than this, please let me know. We could then schedule some IM time if needed. Anything to help out a fellow dutchy!

        • Hi Mischa,

          I tried to run it in powershell but it only gives back already running.
          When i shutdown the other server nothing happens?

          I also have a problem on the second cas server when connecting to the shell.
          If you like to help me out with a IM or remote session i would appreciate it 🙂

          • Never mind, I found the problem and the script is working manually.
            Will try to get is scheduled and see if it will work.

            thanks

          • It’s probably saying that because it’s already running in the background. But I guess you already figured that out!

    • If you would be so kind, please describe this in a little more detail. I have a feeling this is exactly what I’m looking for. Thank you so very much.

  21. Hi Mischa,
    I take your point about the impotance of email monitoring. I notice that event ID 2090 is created whenever a failover happens, so I will create a scheduled task to email on the detection of the event. Fingers crossed…
    So far my only experience with this failover stuff is in test/dev. Yet to try it on my production.

  22. Hi There,

    thanks for this script, it looks like a perfect solution.
    Currently I’m creating the task etc for it to run.

    While doing this i noticed that in the script it uses Get-WmiObject Win32_Process.
    Does this also work with a x64 version of 2008?

    Also is there a way that you can monitor the job to see if works?

    Thanks!

  23. Hi Mischa! Thanks a lot for this script! But I have a problem with it. May be you can help. I have created CAS array, set the database to use this array, made DNS record for this, and add casarray IP to my active CAS server. And it works, I can see that outlook connects using casarray name, but when I assign casarray IP to secondary server, outlook cannot connect. What detail did I missed? How can I manually make secondary server to become active?

    • If you run Get-ClientAccessArray | fl do you see both servers as members? Are both servers in the same Active Directory Site and are they both serving the same the same Mailbox Database?

      You can make the switch by switching the IP to the other server. This is just a failover of the CAS role, not the DB. You can switch the DB by going to the Exchange Management Console and selecting the Organization Configuration > Mailbox. Then rightclick the Mailbox Database and select Move Active Mailbox Database.

      Normaly the DAG would take care of this, but since you’re doing a manual failover you’ll have to force the mailbox database as well.

      If all is well, and you still can’t connect to the server with Outlook, try Webmail. If that also doesn’t work, it’s probably not configured properly to access client requests.

  24. Hi Mischa,

    Thank you very much for sharing this script.

    Does this require the servers to have 2 nics each?

    When I run the netsh command, it overwrites the primary IP of the server.

    Thanks,

    Will

    • Nope, you can use 1 nic. Although it’s recommended to use 2 nics and have the replication of the mailbox db go over 1 nic.

      I think you’re using:
      netsh in ip set address "Adapter Name" 192.168.0.xxx 255.255.255.0
      in stead of:
      netsh in ip add address "Adapter Name" 192.168.0.xxx 255.255.255.0

      Could you check?

  25. Hi Mischa.

    Does your script work if the two servers each have an active and a passive copy of a database? For example: Server1 has DB1 active and DB2 passive and Server2 has DB1 passive and DB2 active. When running (Get-MailboxDatabaseCopyStatus).Status with more that one database, the result returned isn’t true so the failover never happens.

    Another question: After a failover, the remaining active server now has the CAS array IP address as a second IP. When the failed server comes back online, both servers have the CAS array IP bound. Doesn’t this cause an IP address conflict on the network or issues with client connections until the IP address is manually removed from one of the servers?

    Thanks,
    Adrian

    • Q1:
      The way it is now, it doesn’t work, but with some small adjustments it will.

      1. Just make sure that the $LocalHostname = part has the actual hostname without domain behind it.
      2. Add $MailboxDBName = "Mailbox Database Name" under $LocalHostname
      3. Change (Get-MailboxDatabaseCopyStatus).Status to (Get-MailboxDatabaseCopyStatus -Identity ($MailboxDBName + "\" + $LocalHostname)).Status

      It will then only get the status of that DB to make the decision. Lemme know if this solves your problem, or if I overlooked anything.

      Q2:
      It will have a conflict, but all traffic will still go to the server that didn’t fail as the last working route is the first route it tries. Also if the failed server returns it won’t be able to activate the IP as windows will see that it’s already in use. When you’re using a properly configured DAG, it should also still work as both server can be used as a CAS.

      When a failover happened you should always check why it happened and manualy failback.

      • Thanks Mischa.

        It’s working. I just had to remove the double quote in front of $LocalHostname above.

        I set $MailboxDBName on Server1 to DB2 and vice versa because in a normal-operational state, DB1 is mounted on Server1 and DB2 is mounted on Server2. To test if the DAG failover happened correctly (for example is Server1 failed), I test if DB1 changed to Mounted on Server2 and vice versa.

        I suppose it wouldn’t matter because when one server fails and the DAG failover happened as expected, both databases will be in a Mounted state on the operational server.

        Thanks again for the sharing of the script and your prompt assistance.

      • I know this ia alittle old but I would very much like to impliment your script but is there a way to get this script to work with active/active cross-site DAG?

        Site A (MAPI) 192.168.1.x – DB1 active DB2 passive
        (Replication) 10.10.0.x

        Site B (MAPI) 192.168.101.x – DB2 active DB1 passive
        (Replication) 10.10.1.x

        Approx 150 active users per database

        • The script does work with cross-site DAGs, but not with cross-site CAS.

          As you can read at the link below, you can’t use a CAS array across multiple active directory sites.
          http://technet.microsoft.com/en-us/library/ee332317.aspx

          This means that you can have the mailbox dbs replicate cross-site, but you can’t provide the same solution for the client access server.

          If you have little latency between the 2 sites you could just configure it all as if it’s in 1 site (if there are no other dependencies on the site construction).

  26. I completely agree with james solution and implemented at a customer location.the setup is made up of two exchange with 20 dag databases. I have configured cas array as dag cluster ip and it worked flawlessly.So far i have migrated 10 production mailboxes out of 350.everthing looks great from high availability point of view.no need of any script or expensive hardware nlb.

  27. Failback …. “Then it’s just a matter of manually removing the CAS array IP address on the server that doesn’t need it anymore. Then it’ll automatically detect that it’s switched servers”

    Help…. I am not running the script because I wanted to test the system first. I have set up my casarray and I have verified that autodiscover, owa etc are working. I have IPs for my three exchange servers as .13, .14, .15 with my array with an IP of .20. So I just added the ip to the current server .13 and mail is working great on .20. So what I did was add .20 to the .14 server and removed .20 from the current .13 server. It should have failback to the .14 server but it doesn’t. Do I have to reboot? Is it an arp problem? Is it DNS? (I don’t think so). I think is just something stupid I’m missing. From what I read I can’t I just put .20 on any server?

    Thanks for all of your hard work on this. I really want to get this working.

    • Did you first remove the .20 from the .13 and then add the .20 it to the .14? Then the IP should be picked up. If you add it to the .14 before you remove it from the .13 it will not activate the address because there’s an IP conflict. Perhaps that’s what you’re experiencing right now.

      Just remove the .20 from all servers and then add it to the .14. You should be golden then.

      • I did try that before but you are right last night I added the ip first. I will do a test again tonight and see if it works. It almost seems like my router or firewall is using the Mac address of the server and not updating when I move it. I will try it and let you know. THANKS for the fast response.

  28. It works great! I was having a problem with owa on one of my servers. The Microsoft Exchange forms based authenication service was on automatic but wasn’t started. So it looked like the changeover wasn’t working. Next step to install and test the script. Thanks again.

  29. That is a pretty inventive solution, I’m impressed. There is an easier/more automatic way however, and it is still economical for small shops. I implemented the following solution using VM’s in our cluster, but two ancient workstations could be utilized to accomplish the same thing.

    Zen Load Balancer is a (free!) Debian Linux-based dedicated load balancer distro. It meets all the requirements of Exchange 2010 CAS/Hub Transport failover, and can be clustered so that the load balancer itself doesn’t become a single point of failure.

    Zen is super easy to install and requires zero knowledge of Linux. It will allow for automatic, instant failover.

    Hope this helps somebody out there.

    • Thank you for providing another solution. One of my reasons for not using load balancers is that I don’t want to maintain even more servers, and also I don’t want to add more possible problems to my network. However this is a recommended solution by Microsoft and is probably one of the best ways to go next to the other solution provided by the commenters!

  30. I think there is no need to use the Script too. You can do this via Failover CLuster Manager in MS WIndows

    1.) Create a new CAS ARRAY suppose name CASarray

    2.) Create a DNS entry for CASarray suppose CASarray = 192.168.23.135

    3.) Replace each Database CAS with the new CAS name CASarray

    4.) (I noted that in the Cluster Failover Manager that at a time one Exchange Server is active so I configured a “Service and Application” and add a “IP Address Resource” and add the IP of CASarray which is “192.168.2.35”

    Now Shutdown the Active Exchange Server Node(you can view the Active Exchange Node from the Cluster Failover Manager). The CASarray IP will also be switch voer to the second node and the MS Outlook users will only need to restart the MS Outlook(As the MS Outlook Users will get a popup message )

    I tested this in my LAB and its working 100% fine.

  31. I’ve actually set three customers up using only the following method…with 2 servers…

    1.Create the DAG as normal.
    2.Run the new-CASArray PS command…adding both servers to the array.
    3. Go into DNS and create a CNAME record matching the CASArray name…pointing to the DAG name
    4. Set the TTL of the records to 5 minutes (for cross-site IP address notification).
    5. Set the URLs for all the internal and external to point to the CASArray name.
    6. Get a UC cert with the common name matching the CASArray name.
    7. Add autodiscover…and each FQDN of both servers as Subject Alternate Names.

    That’s it. Failover happens within site in about 5 seconds (clients show a brief disconnect/reconnect…that’s all). Cross site failover occurs within ~5 minutes (due to the DAG claiming a new IP and the TTL of the DNS record cache flush occurring). No $20k BigIP boxes needed. No Cisco IOS commands. No complicated scripts or scheduling event. Certainly no need for a separate NLB array costing thousands…and thousands more dollars.

    The only difference in this scenario is that the CASArray doesn’t load balance…it always prefers the current Active DAG member. Which…for most companies <1000 users…is perfectly acceptable. Other than that…it's incredibly less expensive…and much less complicated to setup and/or troubleshoot.

    🙂

    • Thank you for yet another great solution that has again different advantages over the others. I will soon combine all the solutions in an edit of this article!

    • When setting up the CAS Array, where can you see that the array and the members are there?

      • You can check the members with PowerShell using Get-ClientAccessArray.

        • So when i run Get-ClientAccessArray here is what I get:

          Name Site Fqdn Members
          exch1 ADSite name of CAS

          • The members column should contain the servers that are active in the Client Access Array. The name of the Client Access Array should be the Fqdn. This should also be visible when you run Get-MailboxDatabase | fl. The field RPCClientAccessServer should have the same Fqdn as the Get-ClientAccessArray Fqdn.

            You can use Get-ClientAccessServer to check what servers are currently configured with the CAS role.

            I hope this sheds some more light onto the whole situation. If not, let me know!

    • Thanks Stewart, you saved me 🙂

      Just want to know if the array name is fine with certificate or I must have to include server names on SAN?

    • i have DAG configured with two servers joined and currently we have one exchange server hosting CAS role and another hosting UM..

      So i need to add another server hosting the CAS role and follow your guide using my already configured DAG name

      The one server hosting CAS role currently has a single certificate hosted by goDaddy for all services except UM…

      My question do i need to purchase another certificate to apply to the new CAS server?

      How does the Certificate get handled during Fail over?

      Hope i dont sound stupid but Please any help with this would be greatly appreciated……

      • The DAG is only for the Mailbox Databases. The CAS server that you want to add you’ll have to add to the CAS Array. The CAS Array is the connection point for Outlook. The certificate should use the CAS array name, or if you’re using a TMG or other webproxy the one that hosts everything (mail.companyname.com).

        I’m currently using a wildcard certificate for our company domain which allows me to run anything as HTTPS (*.companyname.com). This might be worth looking into if you’re exposing multiple subdomains.

        • Mischa,

          Thank you for the Quick response, it is much appreciated….

          So I will need GoDaddy to change the HostName in the current certificate to point to CAS Array name…..

          I assume this would be the same step I would have to take with your solution or Stewarts….

          Also I didn’t say it before but wonderful Script Mischa this is a great benefit to so many….

          • There’s an easy answer here. What do your users type to go to webmail? Is that the same as the hostname that Outlook connects to (through Outlook anywhere)? And is that the same as your CAS Array FQDN?

            If you answer yes on all questions (except the first of course!), then the CAS Array FQDN is your only connection point and it should be used as the name of your certificate. It’s possible that the Webmail and Outlook Anywhere hostnames are different from your CAS Array FQDN if you’re using a TMG or other web proxy software. If you want to communicate a bit faster and with some more details that you don’t want to put on here, let me know I’ll send you an email.

          • Defiantly if you have a chance please send me a email I’m truly grateful for your help and quick responses I want to deploy this as soon as I can as it is a great solution……

  32. Thanks for sharing. I am installing a similar environment. But it seems I can make your script to run properly.
    I didn’t schedule the script but just run it directly while I log on the server. My setup is Windows 2008 R2 Enterprise SP1 plus Exchange 2010 with SP1. Once I run the command, I got a yellow warning message as “warning: some imported commandname include unapproved verbs which might make them less dicoverable. Use the verbose parameter for more details or type Get-Verb to see the list of approved verbs.”

    Then if I shut down exch02, the script does detected the problem and report as following
    “Problem with exch02 detected!
    There might be a problem on exch01 or on exch02. Please verify!”

    The message is keep looping.

    So I manually use command
    netsh in ip add address “Production” 192.168.1.101 255.255.255.0
    to give CAS array IP to exch01

    But when I run get-mailboxdatabase I found RpcClientAccessServer is still pointing to exch02.

    Did I missed anything?

    Thanks!

    • I think you indeed missed a bit. The RpcClientAccessServer property should be set to the FQDN of the client access array (Get-ClientAccessArray). The IP pointing to that FQDN should be the IP that you assign with the netsh command. This newly assigned IP will then give that server the CAS role.

  33. Hi Mischa,
    First timer here, question about the DAG.
    I woud like to use Personal Archive with my Exchange 2010 environment, how/what would you recommend, two DAG or one DAG? I would like to separate the DAG, but like to hear your recommendation.

    • As personal archives are stored in the normal Mailbox Databases, and because there’s not real difference between personal archives and mailboxes except for the usage, I’d say go with 1 DAG. If you want to separate the mailbox servers from the personal archive servers, then you might want to use a separate DAG.

      The recommendation for a personal archive is to treat it like any other mailbox.

      • Thanks for the response.
        So I have 2 DAG, Archive DAG and Production DAG, everything is working well with Production DAG, I have 3 Mailbox Server for Production DAG, I have 2 mailbox server for Archive DAG, and they are using iSCSI storage for the database, but my issue is that, whenever I reboot one of the Archive mailbox server, Archive database goes offline, even though I had moved all the active database to another server.

        • Do you have a Witness Server in the Archive DAG? Possibly the DAG can’t determine which server is offline because there are only 2 servers in the DAG. In a situation with 2 mailbox servers in a DAG a witness server is required to properly determine what happened when a server goes down.

          If that’s not the case you might find it usefull to check the connections that your Outlook is using by holding the CTRL button and right clicking on the taskbar icon, then select Connection Status. You also have the option to Test E-mail AutoConfiguration here. Check the XML tab to see what connection details your Outlook is getting, and check if the proper servers are returned.

          • I do have Witness Server for Archive Dag, Arc1 and Arc2, when I manually failover to Arc2 site, and shutdown Arc1 for maintenance, all the active database on Arc2 goes offline/dismount status, and unable to mount any of them until Arc1 is back online.
            So I am thinking I can probably suspend database copy when active on Arc2.
            or since I have 2 dags, I can move all the archive dag into prod dag.

          • Before you did a failover, was the database status on Arc2 healthy? If it is healthy and the copy queue is not long the failover should happen quite fast.

            The links below are a good read to find out how the entire process of failover works. Also you might be interested in the DAC (Datacenter Activation Coordination) mode which was introduced in Exchange 2010 SP1.

            http://technet.microsoft.com/en-us/library/dd351049.aspx
            http://exchangeinbox.com/article.aspx?i=172
            http://eightwone.com/2010/08/17/dac-active-manager-activate/

            Hope this gives you a better insight of what is exactly going wrong.

          • Will, I did successful failover, except for one little issue, after reading and researching KB, following were the steps I took:
            Change DNS entry IP address to failover CAS server
            Modify Autodiscover to failover CAS server
            Modify RPCClient access to CAS array name
            Use Move-ActiveMailboxDatabase to failover mailbox server
            Suspended all database copy
            Stop cluster service and force quorum to failover mailbox server
            Get-Mailboxdatabasecopystatus make sure everything is healthy
            shutdown local mailbox server one at time

          • Good to hear that you got it to failover. However, these steps should all be automated. The Autodiscover should use the CASarray FQDN that you have. Same goes for the RPCClient access of the mailbox database. Move-ActiveMailboxDatabase does is in fact the same as clicking Move Active Mailbox Database in the GUI. The suspension of the database copy should then be automatic, as well as the quorum that should pick up that the failover is being done (only if the other server is actualy unavailable).

            You mentioned an issue, is there anything else wrong after this procedure?

          • Mischa,
            How familiar are you with ForeFront?
            Now that my Exchange environment is stable, next project is to get ForeFront install and running. Below is what I am plannin gon, but let me know if you have any recommendation.

            what is your recommendation on setting up the following:
            I have 3 CAS/HUB server, 5 MBX server, my plan is to have FPSMC install as primary with SQL on another server, install FOPE on all of the Exchange servers, push FPSMC agent out to all the servers, all this are successful. My main question is that, this is closed network so I have no access to download scan engine and definition update, so pretty much everything has to be manual. I can download updates from another network and upload the updates to one CAS/HUB server with FOPE installed, I can run the scan and update the server, then use power shell to Export just the configuration settings to FPSMC server, create a package to distrubute to all other CAS/HUB servers.

            Like to know if you have other recommendation?

          • Roger,
            Unfortunately I haven’t had the chance to play around with it yet. I am currently using antispam and antivirus solutions from Halon.se as a VSP on ESX. We feel that this is the best way to go for our mail relay servers. These also provide me a way to retain messages when anything goes horribly wrong with the Exchange environment. It’s very managable and the support is absolutely amazing. No I don’t have any shares or other reasons to promote Halon!

            You probably found this article already but you can use this MS provided powershell script to download the updates to a file share. Depending on how closed the network is, you have to options to directly access the updates, or to create a couple of mirrors.

            As I lack experience with ForeFront I’m afraid that I can’t be of more help here.

            http://support.microsoft.com/kb/2292741/en-us

  34. Hello Mischa,

    Thanks for you script!.. I’m also trying this from another post here:

    According this reply:
    1.) Create a new CAS ARRAY suppose name CASarray
    2.) Create a DNS entry for CASarray suppose CASarray = 192.168.23.135
    3.) Replace each Database CAS with the new CAS name CASarray
    4.) (I noted that in the Cluster Failover Manager that at a time one Exchange Server is active so I configured a “Service and Application” and add a “IP Address Resource” and add the IP of CASarray which is “192.168.2.35″

    Can you please explain how to do step 2 3 and 4 ?

    Step 1 is easy.. I just created the cassarray.domain.local

    Step2 is made an DNS A record with the name above and pointed to EX1 (DAGmember1)

    Step3: where can I do this?

    step 4: When I open select service or application,
    what service or application do I need to select? ..

    Thanks!

    • Sorry for the slow response.
      Step 2
      The DNS record should point to the CASarray IP. This DNS record will be the connection point for all clients

      Step 3
      As the CASarray is your client connection point, this will need to be defined in the Mailbox Database settings. To do this, just run the PowerShell command Set-MailboxDatabase "Database name" -RpcClientAccessServer "cassarray.domain.local"

      Step 4
      The default setup doesn’t require any service or applications and I think this doesn’t either. Although I’ve never used this methode, and therefore cannot properly support it, you might want to try to right click the services and applications, go to More actions and select Create Empty Service or Application.

    • Dylan, I setup my environment which is pretty large, but I didn’t use Cluster resource from Cluster Failover Manager, although the DAG is some form of Cluster, but it doesn’t use full Cluster functionality.
      what are you trying to do on step3?
      normally once the DAG is created, you’ll need to setup DAG member for replications.

  35. btw, I have 2 exchange servers (EX1 and EX2) it’s a standard installation with al the roles combined.
    After this I created the DAG, and the CAS array.

  36. dear mischa
    thank you for your scripts , i am working in a company and i would like to propose this solution , but i dont understand where to execute the scripts do we execute it in both servers and change the setting , (for exemple , for exchange1 we put localhostname : exchange1 and for the scripts that is running in exchane2 we put localhostname:exchange2

    I have tried it in virtuel machine VMWARE it was succesful but i thing what i have done is wrong because i didn’t change the setting for the exchange 2 and i have running the scripts in DC two , and what i have noticed that when i am adding the ip of case to the mapi network , the exchange1 with the address of the cas is added as host in DNS , please inswer to me because my bosse is very intersted about this solution

    • You have to run this script on both Exchange servers. The edit you need to do in the script is indeed the $LocalHostname (local server) and the $Hostname (remote server). Also you’ll need to edit the gateway, which is used to identify network outages. If the server can’t reach the gateway it’s assumed that the server itself is down.

      For Exchange1 you change it to:
      $LocalHostname = “Exchange1”
      $Hostname = “Exchange 2”

      For Exchange2 you change it to:
      $LocalHostname = “Exchange2”
      $Hostname = “Exchange 1”

      Unfortunately it sometimes updates the DNS to refer the hostname to the CAS Array IP. I haven’t found a way to solve this, however it doesn’t break the functionality.

      If you want more information please be specific about what you want to know.

  37. thank you for your responce
    but i didn’t understande how to edit the gateway since it is the same ip addresse in both server , have i to give different ip gateway to each server , and i put the ip gateway of the local server in the scripts? .
    thank you again

    • The gateway can be the same in Exchange1 and Exchange2. It’s actualy the gateway that is set in the network settings of the server. If you run ipconfig /all in a cmd box you should see it there. This IP address is used to check if the server can still reach the gateway. If it can’t, it assumes that the server lost network connection and shouldn’t take the CAS Array IP.

  38. Can someone please direct me to the download of the script? Can find the download link anywhere

  39. I have done a typical installation for Exchange Server 2010. I have 2 servers hosting Mailbox role, Hub Role and CAS role per server. Does your solution work in my environment. Can I have a failover on CAS?

  40. there is no script again.Please fix it. tnx.

  41. Hi Mischa,

    Thanks for the write-up.

    The script download link appears to be broken.

    “You can download the [download id=”1″] script here!”

  42. Unfortunately it sometimes updates the DNS to refer the hostname to the CAS Array IP. I haven’t found a way to solve this, however it doesn’t break the functionality.

    I’m not using this method and also did not go thru the script.. will Netsh Int IPv4 Add Address LAN-1 172.16.5.10 SkipAsSource=True
    will fix above issue?


Leave a comment