Azure Backup Vault retry failed backup jobs

If you ever need to kick off a re-try of multiple failed backups in Azure Recovery Services (VM Backup Vault), there is currently no way of initiating this through the Azure Portal, apart from via the VM or Recovery Services “blades”.  That’s a lot of clicks and room for error if you have a lot of VMs with failed backups.

Fortunately, it is possible to do this operation through Powershell!

Note: This assumes you have the Powershell Azure Resource Manager pre-requisites setup.

The following script will detect failed backups across all vaults in a subscription and prompt before initiating a backup.


If you need to do this without manual confirmation then use this version.


EC2 based Software VPN AWS Auto-Scaling Group

In some circumstances there may be a requirement to setup an IPSEC Site-to-Site VPN tunnel into an AWS VPC using something other than the AWS VPN Service.  This usually means configuring an EC2 Instance based VPN endpoint and is what AWS refer to as a “Software VPN”.

Unlike the AWS VPN Service (which is managed as a service by AWS) this EC2 Instance is managed by the customer.  As these Site-to-Site VPN connections tend to be mission critical it is a good idea to build a setup with some self-healing, monitoring and configuration backups.

This example used EC2 Instances running CentOS 7 with LibreSWAN (an Open Source project based on OpenSWAN).  Focus is only on one side of the VPN and it is assumed that the “remote” side is a generic IPSEC capable device (Cisco ASA etc).

This example has the Software VPN running in an EC2 Instance configured in Auto-Scaling Groups, using secured s3 Buckets to hold the configuration files.  These buckets will be accessed by the EC2 Instances using IAM roles.  The config files in the buckets will be protected using Versioning and Lifecycle Rules.

Create the base AMI Template.

Spin up CentOS 7 EC2 instance from the AWS Marketplace and ssh in.


Remove / disable any firewalls (iptables / firewalld).  By default the firewall should be completely open but in the event it isn’t then do the following (REF :

To disable firewalld, run the following command as root:

To stop firewalld, run the following command as root:

And finally, to check the status of firewalld, run the following command as root:

Disable SELinux

Change SELINUX=enforcing to:

Update the system:

Shutdown the Instance and create an AMI from it.  This AMI will be used as the reference from the Auto-Scaling Group and can also be shared across regions or even different accounts (depending on the use case).aws_software_02



Once the AMI is created, Terminate the EC2 Instance.

Create s3 Bucket(s)

Create the s3 Bucket (or Buckets depending on the use case) to house the config files that the Auto-Scaling Group EC2 Instances will use to configure the VPN.  We will create the Bucket and then place a policy on it to lock down access.

Click Create Bucket and give it a name.


To protect against accidental deletes from this Bucket, enable Versioning.


If there are concerns about cost over the long term then enable Lifecycle rules (although for the size of files and frequency of change this is probably not required).  Do this by clicking Lifecycle > Add rule.


Further info on s3 Lifecycle rules can be found at

Configure an access policy for the Bucket(s).  Below is an example for access from a single account and access from multiple accounts.  Make sure to use the AWS Account number in the appropriate field.

Example for single account access:

Example for multiple account access:

Click Permissions > Add bucket policy.  


Copy and paste the appropriate policy into the Bucket Policy Editor and click save.


IAM Role and Policies

Create an IAM role so the Instance in the Auto-Scaling Group can perform some actions automatically without having to store AWS keys on the AMI.  The Instance will need to be able to:

  • Automatically grab an Elastic IP
  • Disable the source/destination check on the Instance (to allow traffic to traverse across it)
  • Alter the route table (so Instances inside the VPC know where to route traffic to the other CIDR block).

IAM Policies

In IAM create a new 2 new policies .  The first policy will allow changes to routing tables and also allow access to claim Elastic IP Addresses.

Click IAM > Policies > Create Policies


Click Select on Create Your Own Policy


Create the 2 news policies with the details below.  Take care to change the s3 Bucket references to the one created in the s3 Bucket step.


Policy Name : alternetworking
Description: Allow access to alter routing table and claim EIPs
Policy Document:

Policy Name : accesstovpnbucket
Description : Allow read access to s3 Bucket
Policy Document :

IAM Role

Create an IAM Role and associate the 2 new policies with that role.  Click IAM > Role > Create Role.  Give the role a name.


Select AWS Service Roles > Amazon EC2.


Attach the 2 policies we created in the previous step.


The final role should look something like this.


Elastic IP assignment

For the VPN to run in the VPC you will need a static IP which means creating an Elastic IP.  Click on VPC > Elastic IPs and Allocate New Address.  Make a note of the Allocation ID.


LibreSWAN Config files

LibreSWAN requires 2 config files to attempt to create a connection to a VPN peer.  These files are the tunnel .conf and .secrets files and reside in the /etc/ipsec.d folder.  In this example the tunnel is called AWSVPNTEST1 and next the various config files will be uploaded to the s3 bucket created earlier (this includes a sysctl.conf which is also required for system config changes to enable LibreSwan to run).

Create the .conf file:

  • conn : The tunnel name (i.e. AWSVPNTEST1)
  • leftid : This is the local externally visible IP (i.e. the Elastic IP Address)
  • leftsubnet : Local subnet (i.e. VPC CIDR Block)
  • right : Remote Peer IP Address
  • rightsubnet : Remote subnet CIDR Block

Upload the various config files used by cloud init to the s3 Bucket.

NOTE: I have experienced issues with uploading from Windows machine may corrupt your config files and cause LibreSWAN to not start but will not throw any error messages.  If you do run into problems then try to edit and upload the config files from a Linux or MAC.




Security Groups

Create the security groups needed for the EC2 Instances that will be running the VPN.

To allow all IPSEC traffic from the Remote Peer, open the following ports:


Allow all traffic from VPC.  This is to allow traffic from the VPC to hit the VPN Tunnel.
Allow SSH from your IP for admin access.



Click EC2 > Launch Configurations > Create launch configuration and select the AMI created from My AMIs:


Choose Instance type (this really depends on how much traffic is expected to traverse the tunnel).  This example uses t2.micro but Production a larger Instance will be required (i.e. an M4.large).  If high traffic is expected then maybe the Network enhanced Instance is more appropriate.  More details here:

Give the Launch Configuration a name (use a naming convention that allows for version numbers will help to keep track of which one is the latest).

Select your IAM Role from the drop down.


Input the initial EC2 Instance User data. This is a set of instructions that the Instance runs at boot time and in this example runs all the configuration for the tunnel on the fly from the the s3 hosted config files.  More information on AWS User Data.

The bits that need to change for your config will be the hostname, Elastic IP Allocation, the route table id, region, the s3 Buckets and the config file names.

Tick Assign a public IP address to every instance (this allows the EC2 Instance to be able to communicate with the outside world prior to grabbing the Elastic IP).


Next on Add Storage tick the Delete on Termination box so when the Instance is shutdown and terminated it doesn’t leave volumes behind on your account (they cost $).


On Configure Security Groups, Select the VPC from the dropdown and assign the Security Groups created earlier.  Click through to review and finish the creation of the Launch Configuration.  Select the correct ssh keys at the final step.

Autoscaling Group

Click EC2 > Auto Scaling Groups > Create Auto Scaling Group > Create and Auto Scaling group from an existing launch configuration > select the new created configuration.


Give the group a name, choose the VPC and select the public subnets available in the VPC (in this example there are 2 public subnets configured in different Availability Zones for High Availability).


In Configure scaling policies just select Keep this group at its initial size as this will only be using ASG for HA (i.e. only require a single Instance running at any one time).


Skip over Add notifications as this will be covered in the next section about monitoring.

In Configure Tags set the Key Name to be something useful so that all Instances started up have to same tag.


Click Review and then Create Auto Scaling Group.  The Instance should now start up.  Check the status from Activity History in the ASG.


Once the Instance is booted it should run all the commands in User Data and attempt to establish the connection with the other side (as long as everything on both sides is configured correctly).


There are a number of elements that can go wrong with this type of setup so it is important to verify configuration settings.

Firewall / Security Groups

The first thing that is usually checked is a ping across the tunnel from devices / VMs on each subnet.  Ensure that on the AWS side the EC2 Instance does have a Security Group that allows traffic inbound from the remote subnet.  For example, a very open Security Group may open ALL traffic from  The same needs to happen on the remote end.

Obviously, in the real world you should lock down traffic to what is required rather than opening everything.

EC2 System Log

A common issue is with the commands running on boot via User Data.  Verify Instance startup from the System Log by click EC2 > Running Instances.  Right click on the EC2 Instance > Instance Settings > Get System Log:aws_software_vpn_systemlog_01Check to see if any obvious errors occur.  In the example below the EIP allocation failed!  This was probably caused by a typo or maybe someone deleted the EIP…….


Source / Destination Check

If an EC2 Instance needs to route traffic across itself then it needs to have the Source / Destination Check flag disabled.  NOTE: This should be done on the fly at boot by the Instance itself (via the AWS User Data settings).

Click EC2 > Running Instances.  Right click on the EC2 Instance > Networking > Change Source/Dest. Check:

aws_software_vpn_srcdst_01Should look like this:

aws_software_vpn_srcdst_02Route Table check

To verify that the new VPN instance has altered the route table correctly click VPC > Route Tables > click the appropriate route table and click the Routes tab.  The example below shows the Route Table used by the VPC has a route for via the EC2 Instance running LibreSwan.  NOTE: This should be done on the fly at boot by the Instance itself (via the AWS User Data settings).

aws_software_vpn_route-table_01LibreSwan Tunnel Status

To check what the tunnel is doing on the EC2 Instance side ssh in and check the following:

aws_software_vpn_libreswan_tunnel_status_01The main log file by default.

aws_software_vpn_libreswan_tunnel_status_02Shows the status of the tunnels configured.  The above example everything is working.

aws_software_vpn_libreswan_tunnel_status_03This gives more detail on configured tunnels.

aws_software_vpn_libreswan_tunnel_status_04This error often indicates a corrupted config file.


More to come on Monitoring the status of your EC2 Instance and tunnel itself ……..

AWS Simple Email Service (SES) Domain Verification in BIND zone file

Recently I had to verify a domain for use with AWS SES.  Creating the initial Domain Verification request is simple in the AWS Console and if your domain is in Route53 then Domain Verification is automatic.  However, the domain I needed to verify was running on DNS servers we manage ourselves (I work for a Service Provider) and these run BIND.  So here is a quick copy / paste format for the entry in your zone file once you create the Domain Verification.

Create the AWS SES Domain Verification request:-

aws_ses_bind_1Grab the TXT verification details:-

aws_ses_bind_2Now go to your DNS server, edit the zone file and insert the new record in the following format (note the ” marks around the record):-

To check your domain you can either run nslookup or dig (my personal preference is for dig). I also like to check DNS propagation and so bounce the query off an external DNS server (usually Google public DNS servers).

Dig method:-

NSlookup method

Google DIG Tool Method (

After some time some when DNS has propagated a background AWS automated process will check your domain and then verify.  At which point it should look like this:-


vSphere 6 Setting up vFlash Read Cache

This example is being performed on a Dell R630 with the Perc RAID controller running ESXi 6.  I didn’t do anything special with the Flash disk in the RAID controller, I simply created a RAID 0 volume group with the single drive.

A lot of the inspiration for this fix is from this Yellow Bricks post.

Once the Host has ESXi installed, network configured, booted up, added to vCenter (and all the extra gubbins done), you will first need to mark the disk as a Flash disk …… ESXi won’t pick this up automatically.

vsphere_6_vflash_read_cache_1Manage > Storage > Storage Devices > Highlight disk and click the F button to mark as Flash.

vsphere_6_vflash_read_cache_2Now right-click the ESXi host > Storage > Add Virtual Flash Resource Capacity.

At this point I ran into a problem.  As far as I can see you should simply be able to add the Flash drive as Virtual Flash Resource Capacity, however, by default it was greyed out and isn’t selectable……

vsphere_6_vflash_read_cache_3…… and it only becomes selectable if you click the “Enable remote flash selection” tickbox.

vsphere_6_vflash_read_cache_4When you click OK, vCenter tells you it has completed, however, there are no Flash Datastores available for use and I was getting the following error under the ESXi Host > Monitor > Events:-

Configuration on disk /vmfs/devices/disks/naa.614187704de041001e622fd406f5f19e failed. Reason : A specified parameter was not correct: naa.614187704de041001e622fd406f5f19e

This fix for this is to tag the disk from esxcli, so next SSH into your Host and run

You will see that ESXi has detected this disk as a remote SAS SSD.

vsphere_6_vflash_read_cache_5If you try and add the disk as local (as per the comments in the Yellow Bricks post) you will get the following error:-

Error adding SATP user rule: Duplicate user rule found for SATP VMW_SATP_LOCAL matching device naa.614187704de041001e622fd406f5f19e PSP and PSP Options

vsphere_6_vflash_read_cache_6So to resolve this you will need to remove the rules, readd them correctly, run reclaim and then (back in the Web Client) retag as flash.

Once this is done you should be able to add flash correctly.


vCenter 6 Appliance (vCSA) ESXi Net Dump Collector configuration

Login to your vCenter as [email protected] (or use an account which has been put into the “Administrators” group from Home > Administration > Single Sign-On > Users and Groups > Groups).

Now go to Home > Administration > Deployment > System Configuration > Services > VMWare vSphere ESXi Dump Collector

vCSA_6_net_dump_1Now right click the service and tick Automatic so the services comes up at boot time.

vCSA_6_net_dump_2Now right click the service and hit start.

Now to set your hosts to use the Dump Collector on the vCSA its probably easier to do via PowerCLi.  The post below has a simple and quick way to do this:-

vSphere Dump / Syslog Collector: PowerCLI Script

So in powershell, you can grab the setting for all hosts in a datacenter like this:-

And to set the Dump Collector, you can do something like this:-

One thing I did notice was that if you go to to the vCenter > Manage > Settings and click on ESXi Dump Collector I got an “ESXi Dump Collector Service is not running. Enable ESXi Dump Collector and refresh”.


I double checked that the service is running in the GUI and via the appliance shell:-

vCSA_6_net_dump_4Even a reboot didn’t fix it!

If / when I find a fix for this I will update this post …..



vRealize Automation Windows Blueprint Administrator password set (via gugent)

Here is a very simple way (i.e. non Orchestrator workflow route) for allowing your users to define their own passwords for a Windows VM when they request it via the catalogue.  This method uses the vRA Guest Agent.

An excellent walk through on vCAC 6.2 of how to do this for Linux VMs (and the main inspiration for this post) can be found here:-

A good post on how to prepare your Linux and Windows Templates for the vRA Guest Agent (gugent) operations can be found here:-

vRA7 – Gugent on Linux

vRA7 – Gugent on Windows

So assuming you have all the pre-reqs in place you then need to create a small batch file on your Windows template in say c:\scripts.

My .bat file is:-

…. and the contents of the script (where %1 is the argument you pass from the Blueprint Request form):-

You now need to create the Property Group, in vRA 7 this is in Administration > Property Dictionary > Property Groups.

Create a new group with the following script actions:-

Name,Value,Encrypted,Overridable,Show in Request
VirtualMachine.Software0.Name,Generate new root password,No,Yes,No
VirtualMachine.Software0.ScriptPath,c:\scripts\vCACSetAdministratorPassword.bat {InitialRootPassword},No,Yes,No

When you’re done it should look like this:-

vra7_windows_gugentNow add your new Property Group to the Blueprint:-


Once you have saved the Blueprint, try a deploy and see if it works.  If you are having problems then the first place to look is the gugent log files which by default are in:-

post upgrade vRA 7 orchestrator endpoint won’t connect

I upgraded vCAC 6.2 to vRA 7 the other day. Everything looked good until I tested some blueprints which call custom workflows configured in the embedded Orchestrator.

I had my endpoint configured as per the documentation, listed here:-

You can test if your Endpoint is working or not by either running a refresh or checking the logs. To force the endpoint to run a data collection login to vCAC as a Infrastructure Admin and go to Infrastructure > Endpoints > Endpoints, then select Data Collection and click Start to force a data collection run.vRA_7_Orchestrator_Problems_1

To check the logs go Infrastructure > Monitoring > Log and filter on Message for “endpoint”:-vRA_7_Orchestrator_Problems_2

As you can see I am getting the following error:-

Endpoint Data Collection failed for endpoint vCO-Embedded [Workflow Instance Id=588727]
Unable to connect to the remote server
Inner Exception: Unable to connect to the remote server

I checked the DEM log on my IaaS server. By default the logs can be found here:-

C:\Program Files (x86)\VMware\vCAC\Distributed Execution Manager\<DEM Worker>\Logs

And found the following errors:-

Endpoint Data Collection failed for endpoint vCO-Embedded [Workflow Instance Id=588727]
System.Net.WebException: Unable to connect to the remote server —> System.Net.WebException: Unable to connect to the remote server —> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it

I tried a telnet from the IaaS machine and there is no connection available on port 8281.  So a quick check with the client via:-

https://<vcac app fqdn>/vco/client/client.jnlp

and you are presented with the auto filled java client session, which points at 443 rather than 8281:-


So I edited the Endpoint, removed 8281 and ran the Data Collection again and it now works fine!


At the time of writing it looks like the VMware documentation is out of date but in short remove 8281 from your endpoint for the Embedded vCO and it should work fine.

I’ve left all the working out gumpf in this post on the off chance it helps others…..

vSphere 6 Web Client TIPs and TRICKs

This post is just a collection of various TIPs and tricks to make the VMware vSphere web client more bearable.

To be fair to VMware it sounds like they are working hard to make the web client better and as of 6.0 U1 and 5.5 U3 there are some big improvements.  This video from last years VMworld runs through the various changes they have made and are planning (INF5093):-

However, until the HTML5 version comes out this page will be here as a collection of as many tips as possible…….

Adobe Flash bitching about storage?

Want to see all users tasks in the web client (be warned this can have a knock on performance wise on busy setups).

Client integration plugin not working in FF (i.e. can’t upload files etc)

If you have more feel free to comment and I will update this page……

vRA 7 Migration Tool

If you’re about to upgrade (or already have) to vRA 7.x and want to make sure your logins all work post upgrade then you will need to use the Migration Tool to shift your ID stores from the old SSO setup (whether this was a dedicated ID appliance or integrated with vCenter SSO etc) to the new vIDM system. If you don’t migrate the logins then you will have to create the SSO ID Stores manually which also requires creating a local vIDM user in the Tenant(s).

A good explanation of why VMware have moved from SSO to vIDM can be found here:-

vRealize Automation 7 – Part 5, Identity Management

How you perform the migration relies on which SSO method you were using for your 6.x system.  For example if you were running of a Windows based 5.5 vCenter SSO service you will need to run the migration tool on the Windows server that is hosting it.  Similarly if you were running a dedicated vCAC 6.x ID appliance you will need to run the linux version from the appliance.

In this example we will be running this migration from a dedicated appliance and as always make sure you have backups / snapshots etc before you go any further!!!

First off you will need to obtain the migration tool zip file URL:-

https://vra-app1.domain.local:5480 > vRA Settings > SSO and copy the migration tools URL.

SSH to your old SSO ID appliance (SSH can be enabled from https://<SSO ID Appliance IP>:5480 > Admin), cd /tmp, download and extract the zip:-


Now cd to the new dir and set your environment:-

Now migrate your ID Stores (you will be prompted at various points for passwords and confirmations):-


The script claimed to add everything but it wasn’t 100% clean. For some reasons it failed to sync my groups and when I got in to take a look it turns out the Group DN was missing.  I didn’t look too deep into this as the previous step gave me enough access to start putting my AD logins back in the right place!