Jan 212018
 
Azure AD

This weekend I configured Azure AD Connect for pass through authentication for my on-premise Active Directory domain. This was a first for me and extremely easy to do, however there was a few issues with my firewall and SSL content filtering and scanning rules which was blocking the connection. I figured I’d create a post providing some information you’ll need to get this setup and running quickly.

In my environment, I have a Sophos UTM firewall which provides firewall services (port blocking), as well as HTTP and HTTPs scanning and filtering (web filtering).

The Problem

After running the Azure AD Connect wizard, all went good however there was an error at the end of the wizard notifying that synchronization was configured but is not occurring due to firewall. It provided a link for more information (that actually didn’t really contain the information needed).

While this issue is occurring, you’ll notice:

-Azure AD Connect in the Azure portal is reporting that pass-through authentication is Enabled, however after expanding the item, the Authentication Agent reports a status of Inactive on your internal domain controllers.

-In the Event log, under “Applications and Services Logs”, then “Microsoft”, then “AzureADConnect”, then “AuthenticationAgent”, and finally “Admin”, you’ll see the following error event:

Event ID: 12019

Source: Microsoft Azure AD Connect Authentication Agent (Microsoft-AzureADConnect-AuthenticationAgent)

Event:
The Connector stopped working because the client certificate is not valid. Uninstall the Connector and install it again. Request ID: '{WAJAJAJA-OHYA-YAAA-YAAAA-WAKAKAKAKAKAKAK}'

This event log above is due to the SSL and HTTPs content filtering.

-Azure Pass-Through authentication won’t work

The Fix

After doing some research, I came up with the following list of ports and hosts you’ll need to allow unfiltered to a specific list of hosts.

Ports

The following ports are used by Azure AD Connect:

Port 443 – SSL

Port 5671 – TCP (From the host running the Azure AD Connect to Internet)

Hosts (DNS Hosts)

Here’s the host list:

*blob.core.windows.net
*servicebus.windows.net
*adhybridhealth.azure.com
*management.azure.com
*policykeyservice.dc.ad.msft.net
*login.windows.net
*login.microsoftonline.com
*secure.aadcdn.microsoftonline-p.com
*microsoftonline.com
*windows.net
*msappproxy.net
*mscrl.microsoft.com
*crl.microsoft.com
*ocsp.msocsp.com
*www.microsoft.com

If you’re running a Sophos UTM like I am, you’ll need to create an HTTP(s) scanning exception and then import this list in to a rule “Matching these URLs”:

^https?://([A-Za-z0-9.-]*\.)?blob.core.windows.net/
^https?://([A-Za-z0-9.-]*\.)?servicebus.windows.net/
^https?://([A-Za-z0-9.-]*\.)?adhybridhealth.azure.com/
^https?://([A-Za-z0-9.-]*\.)?management.azure.com/
^https?://([A-Za-z0-9.-]*\.)?policykeyservice.dc.ad.msft.net/
^https?://([A-Za-z0-9.-]*\.)?login.windows.net/
^https?://([A-Za-z0-9.-]*\.)?login.microsoftonline.com/
^https?://([A-Za-z0-9.-]*\.)?secure.aadcdn.microsoftonline-p.com/
^https?://([A-Za-z0-9.-]*\.)?microsoftonline.com/
^https?://([A-Za-z0-9.-]*\.)?windows.net/
^https?://([A-Za-z0-9.-]*\.)?msappproxy.net/
^https?://([A-Za-z0-9.-]*\.)?mscrl.microsoft.com/
^https?://([A-Za-z0-9.-]*\.)?crl.microsoft.com/
^https?://([A-Za-z0-9.-]*\.)?ocsp.msocsp.com/
^https?://([A-Za-z0-9.-]*\.)?www.microsoft.com/

The exception I created skips:

  • Authentication
  • Caching
  • Antivirus
  • Extension Blocking
  • MIME type blocking
  • URL Filter
  • Content Removal
  • SSL Scanning
  • Certificate trust check
  • Certificate date check

After creating the exceptions, I restarted the “Microsoft Azure AD Connect Authentication Agent”. The errors stopped and Azure AD Pass-through started to function correctly! Also the status of the Authentication Agent now reports a status of active.

Oct 182017
 

Well, it’s October 18th 2017 and the Fall Creators update (Feature update to Windows 10, version 1709) is now available for download. In my particular environment, I use WSUS to deploy and manage updates.

Update: It’s now May 2018, and this article also applies to Windows 10 April 2018 update version 1803 as well!

Update: It’s now October 2018, and this article also applies to Windows 10 October 2018 update version 1809 as well!

Update: It’s now May 2019, and this article also applies to Windows 10 May 2019 update version 1903 as well!

I went ahead earlier today and approved the updates for deployment, however I noticed an issue on multiple Windows 10 machines, where the Windows Update client would get stuck on Downloading updates 0% status.

I checked a bunch of things, but noticed that it simply couldn’t download the updates from my WSUS server. Further investigation found that the feature updates are packaged in .esd files and IIS may not be able to serve these properly without a minor modification. I remember applying this fix in the past, however I’m assuming it was removed by a prior update on my Windows Server 2012 R2 server.

If you are experiencing this issue, here’s the fix:

  1. On your server running WSUS and IIS, open up the IIS manager.
  2. Expand Sites, and select “WSUS Administration”
  3. On the right side, under IIS, select “MIME Types”
  4. Make sure there is not a MIME type for .esd, if there is, you’re having a different issue, if not, continue with the instructions.
  5. Click on “Add” on the right Actions pane.
  6. File name extension will be “.esd” (without quotations), and MIME type will be “application/octet-stream” (without quotations).
  7. Reset IIS or restart WSUS/IIS server

You’ll notice the clients will now update without a problem! Happy Updating!

Feb 182017
 
Windows Server Volume Shadow Copy Volumes Snapshot Screenshot

On VMware vSphere ESXi 6.5, 6.7, and 7.0, a condition exists where one is unable to take a quiesced snapshot. This is an issue that effects quite a few people and numerous forum threads can be found on the internet by those searching for the solution.

This issues can occur both when taking manual snapshots of virtual machines when one chooses “Quiesce guest filesystem”, or when using snapshot based backup applications such as vSphere Data Protection (vSphere vDP), Veeam, or other applications that utilize quiesced snapshots.

The Issue

I experienced this problem on one of my test VMs (Windows Server 2012 R2), however I believe it can occur on newer versions of Windows Server as well, including Windows Server 2016 and Windows Server 2019.

When this issue occurs, the snapshot will fail and the following errors will be present:

An error occurred while taking a snapshot: Failed to quiesce the virtual machine.
An error occurred while saving the snapshot: Failed to quiesce the virtual machine.

Performing standard troubleshooting, I restarted the VM, checked for VSS provider errors, and confirmed that the Windows Services involved with snapshots were in their correct state and configuration. Unfortunately this had no effect, and everything was configured the way it should be.

I also tried to re-install VMWare tools, which had no effect.

PLEASE NOTE: If you experience this issue, you should confirm the services are in their correct state and configuration, as outlined in VMware KB: 1007696. Source: https://kb.vmware.com/s/article/1007696

The Fix

In the days leading up to the failure when things were running properly, I did notice that the quiesced snapshots for that VM were taking a long time process, but were still functioning correctly before the failure.

This morning during troubleshooting, I went ahead and deleted all the Windows Volume Shadow Copies (VSS Snapshots) which are internal and inside of the Virtual Machine itself. These are the shadow copies that the Windows guest operating system takes on it’s own filesystem (completely unrelated to VMware).

To my surprise after doing this, not only was I able to create a quiesced snapshot, but the snapshot processed almost instantly (200x faster than previously when it was functioning).

If you’re comfortable deleting all your snapshots, it may also be a good idea to fully disable and then re-enable the VSS Snapshots on the volume to make sure they are completely deleted and reset.

I’m assuming this was causing a high load for the VMware snapshot to process and a timeout was being hit on snapshot creation which caused the issue. While Windows volume shadow copies are unrelated to VMware snapshots, they both utilize the same VSS (Volume Shadow Copy Service) system inside of windows to function and process. One must also keep in mind that the Windows volume shadow copies will of course be part of a VMware snapshot since they are stored inside of the VMDK (the virtual disk) file.

PLEASE NOTE: Deleting your Windows Volume Shadow copies will delete your Windows volume snapshots inside of the virtual machine. You will lose the ability to restore files and folders from previous volume shadow copy snapshots. Be aware of what this means and what you are doing before attempting this fix.

Sep 232016
 

Well, recently one of the servers I monitor and maintain in a remote oil town recently started throwing out a Windows event log warning:

Event ID: 129

Source: HpCISSs2

Description: Reset to device, \Device\RaidPort0, was issued.

The server is an HP ML350p Gen8 (Windows Server 2008 R2) running latest firmware and management software. It has 2 RAID Arrays (RAID1, and RAID5), and a total of 6 disks.

Researching this error, I read that most people had this occur when running the latest HP WBEM providers, as well as anti-virus software. In our case, I actually tried to downgrade to an older version, but noticed the warning still occurs. While we do have anti-virus, it’s not actively scanning (only weekly scheduled scans).

In the process of troubleshooting, I noticed that under the HP Systems Management Homepage, one of the drives in the RAID1 array, had the following stats:

Hard Read Erros:  150
Recovery Read Errors:  7
Total Seeks:  0
Seek Errors:  0

I found these numbers to be very high in my experience. None of the other drives had anything close to this (in 4 years of running, only one other disk had a read error (a single one), this disk however had tons. For some reason the drive is still reporting as operational, when I’d expect it to be marked as a predicted failure, or failed.

While all online documentation was pointing towards at locks on the array by software, from my own experience I think it was actually the array waiting for a read operation on the array, and it was this single disk that was causing a threshold to be hit in the driver, that caused a retry to recover the read operation.

Called up HPE support, I mentioned I’d like to have the drive replaced. The support engineer consulted her senior engineer and reviewed the evidence I presented (along with ADU reports, and Active Monitoring health reports), the senior engineer concurred that the drive should be replaced.

Replacing the drive resolved the issue. I’m also noticing a performance increase on the array as well.

Make sure to always check the stats on the individual components of your RAID arrays, even if everything is operating sound.

Sep 102016
 

When initiating manual backups or occasionally when automatic/scheduled backups run, a user may notice that Windows Server Backup may appear to “hang” when the status is reporting: “Preparing media to store backups…”.

In some rare cases, it may actually be in a hang state, however most of the time, it’s actually consolidating and/or checking previous backups on the destination media.

To Confirm this:

Open the Task Manager as Administrator, then click on the “Performance” tab, click on “Open Resource Monitor”. Flip over to the “Disk” tab, expand “Disk Activity”, and sort by name. You should see the read requests on the destination media, you’ll also notice that it is slowly progressing consecutively through each backup set (increments of 1, accessing multiple at a time).

This confirms that the Windows Server Backup services are functioning and it is in fact running. In one case, I had 723 previous backups, and it took around 50 minutes to count from 1 to 723, and then the backup finally proceeded.

I have also seen this occur when a previous backup failed or was cancelled. This occurs with Windows Server Backup on Windows Server 2008, Windows Server 2008 R2, and Windows Server 2012 R2.

Mar 262016
 

An issue that’s been making me rip my hair apart for some time… And a fix for you experiencing the same.

 

Equipment:

HP Proliant DL360 G6 Server (with a P800 Controller) running Server 2012 R2 and Backup Exec 2014

HP MSL-2024 Tape Library with a single HP SAS LTO-6 Tape Drive

 

Symptoms:

-After a clean restart, a backup job completes successfully. Subsequent jobs fail until server or services restarted.

-While the initial backup does complete, errors/warnings can be seen in the adamm.log and the Event Viewer even when successful.

-Subsequent backups failing report that the device is offline. The Windows Device Manager reports everything is fine.

-Windows Server itself does not report any device errors whatsoever.

 

Observations:

[5648] 03/05/16 07:50:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Device error 1167 on “\\.\Tape0”, SCSI cmd 0a, 1 total errors
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle 214, error 0
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Retry logic was engaged on device: HP       Ultrium 6-SCSI
[5648] 03/05/16 07:55:46 Adamm Mover Error: DeviceIo: 00:00:00:00 – Retry Logic: Original settings restored on device: HP       Ultrium 6-SCSI

Event ID 58053
Backup Exec Alert: Storage Error
(Server: “WhatsMySRVRname”) The device state has been set to offline because the device attached to the computer is not responding.

Ensure that the drive hardware is turned on and is properly cabled. After you correct the problem, right-click the device, and then click Offline to clear the check mark and bring the device online.

[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 17, new handle ffffffff, error 32
[09968] 03/05/16 01:42:08.426 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[09968] 03/05/16 01:22:07.867 PvlSession::DismountMedia( 0, 0, 0 )
Job = {JOBHEXNUMBERZZZZZZ} “ServerBackup-Full”
Changer    = {CHANGERZZZZ} “Robotic library 0001”
Drive      = {MYBACKUPDRVXZZZZZ} “Tape drive 0001”
Slot       = 13
Media      = {MEDIAZIDZZZZ} “BARCODEID”
ERROR = 0xE0008114 (E_PVL_CHANGER_NOT_AVAILABLE)

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

[19812] 03/05/16 01:42:12.613 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 1a, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.129 DeviceIo: 03:07:00:00 – Refresh handle on “\\.\Tape0”, SCSI cmd 00, new handle ffffffff, error 32
[19812] 03/05/16 01:42:13.645 PvlDrive::DisableAccess() – ReserveDevice failed, offline device
Drive = 1007 “Tape drive 0001”
ERROR = 0x0000001F (ERROR_GEN_FAILURE)

[19812] 03/05/16 01:42:13.645 PvlDrive::UpdateOnlineState()
Drive = 1007 “Tape drive 0001”
ERROR = The device is offline!

Event ID 1000
Faulting application name: wmiprvse.exe, version: 6.3.9600.17415, time stamp: 0x54505614
Faulting module name: MSVCR110.dll, version: 11.0.51106.1, time stamp: 0x5098826e

 

Research:

I spent a ton of time researching this… Old support threads were pointing me in all different directions, most of the threads being old, mentioning drivers, etc… Initially I thought it was hardware related, until through testing I got the gut feeling it was software related. There was absolutely no articles covering Backup Exec 2014 running on Windows Server 2012 R2 with this specific issue.

Tried a bunch of stuff, including swapping the P800 controller, for another HP P212. While it didn’t fix the issue, I gained some backup speed! 🙂

Updating the HP software (agents, providers, HP SMH, WBEM) had no effect.

Disabling the HP providers, and disabling the HP Monitoring, Insight, Management services had no effect whatsoever. Tried different firmware versions, also tried different drivers for the Library and Tape drive, no effect. Tried factory resets, no effect. Tried Library and Tape tools, all tests passed.

Disabled other monitoring software we have in place to monitor software/hardware on clients servers, no effect.

 

Resolution:

-Uninstalled the HP WBEM Providers and Agents.

-Added a “BusyRetryCount” 32-bit DWORD value of 250 (decimal) to the “Storport” key under “Device Parameters” in all the Tape Library and Tape Drive Registry entries. Example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\Changer&Ven_HP&Prod_MSL_G3_Series\5&334e8424&0&000500\Device Parameters\Storport]
“BusyRetryCount”=dword:000000fa

This needs to be added to ONLY and ALL the tape device entries (under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\SCSI\) for the Tape Library and Tape Drives. You probably will have to create “Storport” key under the devices “Device Parameters” key.

After doing this, the backups run consecutively with absolutely no issues. The event log is CLEAN, and Adamm.log is clean, and the “Faulting application name: wmiprvse.exe” errors in the event log no longer occur.

Fixed!

 

Additional Notes:

-Both “fixes” were applied at the same time. I believe the WBEM providers/agent caused the Event ID 1000 errors on WMIPRVSE.exe. While the registry keys alone may have possibly resolved the backup issues, I believe there still would have been an underlying issue with WMIPRVSE.exe faulting that could have other consequences.

-I do not believe the original installation of the HP WBEM providers caused the issue, I have a feeling a subsequent Windows Update, Backup Exec update, other module update, or an update to the HP software may have caused the issue to occur at a later time than original install. I do remember we didn’t have an issue with the backups for months, until one day it started occurring.

-I will be re-installing the HP providers and agents at a later time. I will be uninstalling all of them, and re-installing from scratch the latest versions. I will post an update with my results.

-There is a chance the registry key is needed for the HP software to co-exist with Backup Exec backups for this configuration.

-There is a chance that the registry key isn’t needed if you never load the HP software.

Mar 052016
 

Just wanted to write about a couple issues that I’ve seen occur after migrating customers from Microsoft Small Business Server to Microsoft Server 2012 R2 (with Essentials Experience role), with Microsoft Exchange 2013 On-Premise.

Migration documents that were available were used at the time of migration. We still observed these issues after following. Please note that since these issues occurred, migration documents may have been updated.

Just an FYI: I provide Small Business Server Migration and consulting services. For more information, click here!

Windows SBS Company Web Connector ServerName

After the migration was complete we started seeing event logs pertaining to a “Windows SBS Company Web Connector ComputerName”, often mentioning it’s referencing an object in the Deleted Items container, also referencing the connector is not being activated due to no routes available.

Event ID: 5016

Microsoft Exchange could not discover any route to connector CN=Windows SBS Company Web Connector SERVERNAME,CN=Connections,CN=Exchange Routing Group (XXXXXXXXXXXXXXXXX),CN=Routing Groups,CN=Exchange Administrative Group (XXXXXXXXXXXXXXXXX),CN=Administrative Groups,CN=First Organization,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=domainname,DC=local in the routing tables with the timestamp 3/5/2016 1:55:34 PM. This connector will not be used.  Total source server count: 1; unknown source server count: 1; unrouted source server count: 0; non-active source server count: 0.

What is happening is that a “Foreign Connector” is still present in the Active Directory and Exchange Configuration for the SBS environments SharePoint e-mail to web feature. In my client’s environments SharePoint is no longer used, so it is safe for us to delete this connector. Only delete this connector if you know you’re not using it (it is used for SharePoint e-mail to web feature).

To list and get information on the orphaned connector, open Exchange Powershell and run:

Get-ForeignConnector | Format-List

To delete the orphaned connector, enter the following command in Exchange Powershell and update the connector name to match the name shown in the command above:

Remove-ForeignConnector “Windows SBS Company Web Connector SERVERNAME”

This will remove the orphaned connector and clean up these errors from occurring. You can also remove the connector using ADSIEDIT, however I prefer to use ADSIEDIT as a last resort, and find this method not only easier, but cleaner.

SMTP rejected a (P1) mail from ‘[email protected]

Initially post-migration we started observing this event on the server. Mail flow was not affected and everything was functioning properly.

Event ID: 1025

SMTP rejected a (P1) mail from ‘[email protected]’ with ‘Client Proxy EXCHSRVR’ connector and the user authenticated as ‘HealthMailboxXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX’. The Active Directory lookup for the sender address returned validation errors. Microsoft.Exchange.Data.ProviderError

Additionally, on our corporate firewall (that provides anti-spam), we would observe numerous undeliverable bouncebacks on outgoing messages to the e-mail address “[email protected]” with the subject “Inbound proxy probe”. These messages occur on exact 5 minute intervals continuously.

Using Exchange powershell to view the active Health Mailboxes, we see that each of these bounce backs are being stored on a particular health mailbox. Essentially the mailbox will continue to grow. Due to the growth, this issue needs to be resolved so the mailbox doesn’t continue to grow in size.

Numerous things can cause this, however in our case looking at transport logs, it is seen that a HealthMailbox is sending e-mail to another HealthMailbox but using an incorrect e-mail address. The Health Mailboxes on the Exchange server have “domain.com” e-mail addresses, while according to the transport logs, the e-mails are being sent to “domain.local”.

Something got mixed up, either with provisioning the Exchange E-Mail address policies, or the domain configured as “default domain”. Either way, Exchange is configured and running, so I wanted to correct this in a manor that would have minimal consequences or changes to the system.

To correct this issue, we need to go in to ADSI edit and modify the “ProxyAddresses” value for the HealthMailbox. Note that any type of mailbox can have numerous aliases and a single default alias. Inside of ADSIEdit for “ProxyAddresses” the value/format is case-sensitive, and uppercase SMTP configures default e-mail address, while lowercase smtp configures alternative aliases. An example value: “SMTP:[email protected]” for default, or “smtp:[email protected]” for an alternative alias.

Identifying the account from the event log (note the XXXXXXXXXXXXXXXX in the example), we found the account in the Monitoring Mailboxes container inside of ADSIEdit. We right-clicked on the specific HealthMailbox account, went to properties, and found the “ProxyAddresses” value. We then proceeded to create a new alias by clicking edit, using lowercase smtp and created “smtp:[email protected]” and added it to the list, we did not modify or delete any existing values. All we did is create an alternative alias.

So now the Health Mailbox is receiving e-mail for both “@domain.com”, and “@domain.local”. Immediately the bounce-backs stopped, and event logs disappeared.

PLEASE NOTE: For this fix to work, you MUST confirm that the issue is due to the domain .com and .local mismatch. I’m not quite sure, but this issue may also occur after changing the default domain, or default e-mail address policies, in which case you still could use this technique to resolve the issue.

Hope this helps some of you, cheers!

Sep 302014
 

Recently, a new type of error I haven’t seen showed up on one of the servers I maintain and manage.

 

Event ID: 513

Source: CAPI2

Event:

Cryptographic Services failed while processing the OnIdentity() call in the System Writer Object.

Details:
AddLegacyDriverFiles: Unable to back up image of binary EraserUtilRebootDrv.

System Error:
The system cannot find the file specified.
.

 

Also, after further investigation I also noticed that when Windows Server Backup was running, sometimes snapshots on the C: volume wouldn’t “grow in time” so were automatically deleting.

It was difficult to find anything on the internet regarding this as in my case it was reporting “The system cannot find the file specified”, whereas all other cases were due to security permissions. On the bright side, I was able to identify the software that this file belonged to: Symantec Endpoint Protection.

Ultimately I found a fix. PLEASE ONLY attempt this, if you are receiving the “The system cannot find the file specified”. If you are seeing any “Access Denied” messages under System Error, your issue is related to something else.

 

To fix:

1) Uninstall Symantec Endpoint protection.

2) Restart Server

3) Disable VSS snapshots for C: volume (NOTE: This will delete all existing snapshots for the drive.).

4) Re-install Symantec Endpoint protection.

5) Re-enable VSS snapshots for C: volume.

 

When this issue occurred, I was seeing the event many times every hour. It’s been 4 days since I applied this fix and it has completely disappeared, back to a 100% clean event log!

May 312013
 

Back in February, I was approached by a company that had multiple offices. They wanted my company to come in and implement a system that allowed them to share information, share files, communicate, use their line of business applications, and be easily manageable.

Just an FYI, I provide Microsoft Small Business Server consulting services, including migrations! For more information, please visit https://www.stephenwagner.com/2020/02/28/microsoft-small-business-server-migration-upgrade/.

The Solution – Microsoft Small Business Server 2011

The first thing that always comes to mind is Microsoft Small Business Server 2011. However, what made this environment interesting is that they had two branch offices in addition to their headquarters all in different cities. One of their branch offices had 8+ users working out of it, and one only had a couple, with their main headquarters having 5+ users.

Usually when administrators think of SBS, they think of a single server (two server with the premium add-on) solution that provides a small business with up to 75 users with a stable, enterprise feature packed, IT infrastructure.

SBS 2011 Includes:

  • Windows Server 2008 R2 Standard
  • Exchange Server 2010
  • Microsoft SharePoint Foundation 2010
  • Microsoft SQL Server 2008 R2 Express
  • Windows Server Update Services
  • (And an additional Server 2008 R2 license with Microsoft SQL Server 2008 R2 Standard if the premium add-on is purchased)

Essentially this is all a small business typically needs, even if they have powerful line of business applications.

Additional Domain Controller on SBS

One misconception about Windows Small Business Server is the limitation of having a single domain controller. IT professionals often think that you cannot have any more domain controllers in an SBS environment. This actually isn’t true. SBS does allow multiple domain controllers, as long as there is a single forest, and not multiple domains. You can have a backup domain controller, and you can have multiple RODCs (Read Only Domain Controller), as long as the primary Active Directory roles stay with the SBS primary domain controller. You can have as many global catalogs as you’d like! As long as you pay for the proper licenses of all the additional servers 🙂

This is where this came in handy. While I’ve known about this for some time, this was the first time I was attempting at putting something like this in to production.

The Plan

The plan was to setup SBS 2011 Premium at the HQ along with a second server at the HQ hosting their SQL, line of business applications, and Remote desktop Services (formerly Terminal Services) applications. Their HQ would be sitting behind an Astaro Security Gateway 220 (Sophos UTM).

The SBS 2011 Premium (2 Servers) setup at the HQ office will provide:

  • Active Directory services
  • DHCP and DNS Services
  • Printing and file services (to the HQ and all branch offices)
  • Microsoft Exchange
  • “My Document” and “Desktop” redirection for client computers/users
  • SQL DB services for LoB’s
  • Remote Desktop Services (Terminal Services) to push applications out in to the field

The first branch office, will have a Windows Server 2008 R2 server, promoted to a Read Only Domain Controller (RODC), sitting behind an Astaro Security Gateway 110. The Astaro Security Gateway’s would establish a site-to-site branch VPN between the two offices and route the appropriate subnets. At the first branch office, there is issues with connectivity (they’re in the middle of nowhere), so they will have two internet connections with two separate ISPs (1 line of sight long range wireless backhaul, and one simple ADSL connection) which the ASG 110 will provide load balancing and fault tolerance.

The RODC at the first branch office will provide:

  • Active Directory services for (cached) user logon and authentication
  • Printing and file services (for both HQ and branch offices)
  • DHCP and DNS services
  • “My Documents” and “Desktop” redirection for client computers/users.
  • WSUS replica server (replicates approvals and updates from WSUS on the SBS server at the main office).
  • Exchange access (via the VPN connection)

Users at the first branch office will be accessing file shares located both on their local RODC, along with file shares located on the HQ server in Calgary. The main wireless backhaul has more then enough bandwidth to support SMB (Samba) shares over the VPN connection. After testing, it turns out the backup ADSL connection also handles this fairly well for the types of files they will be accessing.

The second branch office, will have an Astaro RED device (Remote Ethernet Device). The Astaro/Sophos RED devices, act as a remote ethernet port for your Astaro Security Gateways. Once configured, it’s as if the ASG at the HQ has an ethernet cable running to the branch office. It’s similar to a VPN, however (I could be wrong) I think it uses EoIP (Ethernet over IP). The second branch doesn’t require a domain controller due to the small number of users. As far as this branch office goes, this is the last we’ll talk about it as there’s no special configuration required for these guys.

The second branch office will have the following services:

  • DHCP (via the ASG 220 in Calgary)
  • DNS (via the main HQ SBS server)
  • File and print services (via the HQ SBS server and other branch server)
  • “My Document” and “Desktop” redirection (over the WAN via the HQ SBS server)
  • Exchange access (via the Astaro RED device)

Hardware

For all the servers, we chose HP hardware as always! The main SBS server, along with the RODC were brand new HP Proliant ML350p Gen8s. The second server at the HQ (running the premium add-on) is a re-purposed HP ML110 G7. I always configure iLo on all servers (especially remote servers) just so I can troubleshoot issues in the event of an emergency if the OS is down.

Implemenation

I’ll explain how this was all implemented.

  1. Configure and setup a typical SBS 2011 environment. I’m going to assume you already know how to do this. You’ll need to install the OS. Run through the SBS configuration wizards, enable all the proper firewall rules, configure users, install applicable server applications, etc…
  2. Configure the premium add-on. Install the Remote Desktop Services role (please note that you’ll need to purchase RDS CAL’s as they aren’t included with SBS). You can skip this step if you don’t plan on using RDS or the premium server at the main site.
  3. Configure all the Astaro devices. Configure a Router to Router VPN connection. Create the applicable firewall rules to allow traffic. You probably know this, but make sure both networks have their own subnet and are routing the separate subnets properly.
  4. Install Windows Server 2008 R2 on to the target RODC box (please note, in my case, I had to purchase an additional Server 2008 license since I was already using the premium add-on at the HQ site. (If you purchase the premium add-on, but aren’t using it at your main office, you can use this license at the remote site).
  5. Make sure the VPN is working and the servers can communicate with each other.
  6. Promote the target RODC to a read only domain controller. You can launch the famous dcpromo. Make sure you check the “Read Only domain controller” option when  you promote the server.
  7. You now have a working environment.
  8. Join computers using the SBS connect wizard. (DO NOT LOG ON AS THE REMOTE USERS UNTIL YOU READ THIS ENTIRE DOCUMENT)

I did all the above steps at my office and configured the servers before deploying them at the client site.

You essentially have a working basic network. Now to get to the tricky stuff! This tricky stuff is to enable folder redirection at the branch site to their own server (instead of the SBS server), and get them their own WSUS replica server.

Now to the fancy stuff!

1. Installing WSUS on the RODC using the add role feature in Windows Server: You have to remember that RODC’s are exactly what they say! !READ ONLY! (As far as Active directory goes)! Installing WSUS on a RODC will fail off the bat. It will report that access is denied when trying to create certain security groups. You’ll have to manually create these two groups in Active Directory on your primary SBS server to get it to work:

  • SQLServer2005MSFTEUser$RODCSERVERNAME$Microsoft##SSEE
  • SQLServer2005MSSQLUser$RODCSERVERNAME$Microsoft##SSEE

Replace RODCSERVERNAME with the computer name of your RODC Server. You’ll actually notice that two similiar groups already exist (with the server name different) for the existing Windows SBS WSUS install, this existing groups are for the main WSUS server. After creating these groups, this will allow it to install. After this is complete, follow through the WSUS configuration wizard to configure it as a replica for your primary SBS WSUS server.

2. One BIG thing to keep in mind is that with RODC’s you need to configure what accounts (both user and computer) are allowed to be “cached”. Cached credentials allow the RODC to authenticate computers and users in the event the primary domain controller is down. If you do not configure this, if the internet goes down, or the primary domain controller isn’t available, no one will be able to log in to their computers or access network resources at the branch site. When you promoted the server to a RODC, two groups were created in Active Directory: Allow RODC Cached Logins, and Deny RODC Cached Logins (I could be wrong on the exact name since I’m going off memory). You can’t just select and add users to these groups, you need to also select and add the computers they use as well since computers have their own “computer account” in Active Directory.

To overcome this, create two security groups under their respective existing groups. One group will be for users of the branch office, the other group will be for computers of the branch office. Make sure to add applicable users and groups as members of the security groups. Now go to the “Allow RODC Cached Logins” group created by the dc promotion, and add those two new security groups to that group. This will allow remote users and remote computers to authenticate using cached security credentials. PLEASE NOTE: DO NOT CACHE YOUR ADMINISTRATIVE ACCOUNT!!! Instead, create a separate administrative account for that remote office and cache that.

3. One of the sweet things about SBS is all the pre-configured Group policy objects that enable the automatic configuration of the WSUS server, folder redirection, and a bunch of other great stuff. You have to keep in mind that off of the above config, if left alone up to this point, the computers in the branch office will use the folder redirection settings and WSUS settings from the main office. Remote users folder redirection (whatever you have selected, in my case My Documents and Desktop redirection) locations will be stored on the main HQ server. If you’re alright with this and not concerned about the size of the user folders, you can leave this. What I needed to do (for reasons of simple disaster recovery purposes) is have the folder re-directions for the branch office users store the redirection on their own local branch server. Also, we need to have the computers connect to the local branch WSUS server as well (we don’t want each computer pulling updates over the VPN connection as this will use up tons of bandwidth). What’s really neat is when users open applications via RemoteApp (over RDS), if they export files to their desktop inside of RemoteApp, it’ll actually be immediately available on their computer desktop since the RDS server is using these GPOs.

To do this, we’ll need to duplicate and modify a couple of the default GPOs, and also create some OU (Organizational Unit) containers inside of Active Directory so we can apply the new GPOs to them.

First, under “SBSComputers” create an OU called “Branch01Comps” (or call it whatever you want). Then under “SBSUsers” create an OU called “Branch01Users”. Now keep in mind you want to have this fully configured before any users log on for the first time. All of this configuration should be done AFTER the computer is joined (using the SBS connect) to the domain and AFTER the users are configured, but BEFORE the user logs in for the first time. Move the branch office computer accounts to the new Branch office computers OU, and move the Branch office user accounts to the Branch office users OU.

Now open up the Group policy Management Management Console. You want to duplicate 2 GPOs: Update Services Common Settings Policy (rename the duplicate to “Branch Update Services Common Settings Policy” or something), and Small Business Server Folder Redirection Policy (rename the duplicate to “Branch Folder Redirection” or something).

Link the new duplicated Update Services policy to the Branch Computers OU we just created, and link the new duplicated folder redirection to the new users policy we just created.

Modify the duplicated server update policy to reflect the address of the new branch WSUS replica server. Computers at the branch office will now pull updates from that server.

As for Folder redirection, it’s a bit tricky. You’ll need to create a share (with full share access to all users), and then set special file permissions on the folder that you shared (info available at http://technet.microsoft.com/en-us/library/cc736916%28v=ws.10%29.aspx). On top of that, you’ll need to find a way to actually create the child users folders under that share/folder in which you created. I did this by going in to active directory, opening each remote user, and setting their profile variable to the file share. When I hit apply this would create a folder with their username with the applicable permissions under that share, after this was done, I would undo that variable setting and the directory created would stay. Repeat this for each remote user at that specific branch office. You’ll also need to do this each time you add a new user if they bring on more staff, you’ll also need to add all new computers and new users to the appropriate OUs, and security groups we’ve created above.

FINALLY you can now go in to the GPO you duplicated for Branch Folder redirection. Modify the GPO to reflect the new storage path for the redirection objects you want (just a matter of changing the server name).

4. Configure Active Directory Sites and Services. You’ll need to go in to Active Directory Sites and Services and configure sites for each subnet you have (you main HQ subnet, branch 1 subent, and branch 2 subnet), and set the applicable domain controller to those sites. In my case, I created 3 sites, and configured the HQ subnet and second branch to authenticate off the main SBS PDC, and configured the first branch (with their own RODC) to authenticate off their own RODC. Essentially, this tells the computers which domain controller they should be authenticating against.

And you’re done!

A few things to remember, whenever adding new users and/or computers to the branch, ALWAYS join using SBS wizard, add computer to the branch OU, add user to the branch OU, create the users master redirection folder using the profile var in the AD user object, and separately add both user and computer accounts as members of the security group we created to cache credentials.

And remember, always always always test your configuration before throwing it out in to production. In my case, I got it running first try without any problems, but I let it run as a test environment for over a month before deploying to production!

We’ve had this environment running for months now and it’s working great. What’s even cooler is how well the Astaro Security Gateway (Sophos UTM) is handling the multiple WAN connections during failures, it’s super slick!

May 112012
 

For the longest time I’ve been dealing with a server that hasn’t been playing nice. Regularly the server will freeze when either creating VSS snapshots, or deleting them!

These usually happen at 6:00AM or 12:00PM (when I have them scheduled) and can sometimes lock the server up for close to 30 minutes. I’ve spent HOURS investigating this, resulting with absolutely no errors, no nothing, except for that some services might fail due to the freeze if I’m actually logged in to the server.

Typically, this behavior only starts happening 1-2 weeks after a fresh reboot. Rebooting the server stops this issue for 1-2 weeks. And keep in mind, as I said absolutely no errors in the event log that point to what is causing this.

The Server runs fully updated/patched Windows Server 2008, has 16GB of RAM, 2 X 6-core processors and SAS disks, so it’s nothing related to performance.

Finally after months I have found out what the culprit is in my case: Turns out that Symantec Endpoint Manager (not the anti-virus, but the management software) was actually causing or agitating this issue. When logging in, I noticed that Symantec Endpoint Protection Manager was somewhat sluggish, and not functioning properly, I restarted the services, and BAM out of nowhere VSS process decides to deleted the oldest snapshot for C:. When this happened the server freezed. I repeated this 4 times to confirm, all in the same morning. I’m not sure why it was triggering snapshot removal, but it was odd.

I proceed to upgrade Symantec Endpoint Protection Manager on that server later that week. During the upgrade (I upgraded to a new 11.x released, then later to 12.x), I noticed that every time the services were restarted automatically as part of the database upgrade process, that the VSS issue would occur and the server would become unresponsive.

We are now running at 12.x on that system, and have not had any reported freeze-ups. It’s been over a week and a half, and it looks like the issue is resolved.