While we are on excuses, this is my first attempt at a blog entry so please temper your expectations.
We have a production network segment where we control internet access using a white-list. If you have ever maintained a white-list based web filter configuration, you will know first hand that it is a pretty big pain in the ass. It is rarely as simple as white-listing the website domain as many sites store images on a third-party content delivery network (CDN) or reference common/public javascript or CSS hosted on third-party sites. Any of these resources need to be evaluated and a determination needs to be made whether they can safely be white-listed as well.
Our typical process for adding new sites to the white-list is as follows:
- Log in to our Sophos UTM
- Add the website domain to the white-list
- Fire-up the web protection live log
- Access the website and review the web protection log for anything that is blocked .
- Decide whether it is reasonable to white-list the resource and white-list it if so
We do not receive white-list requests often and the process is not overly cumbersome, just another one of those many sysadmin tasks that contributes to the overall "death by a thousand cuts". I will admit now that I often get the idea in my head to automate a process that, perhaps, really doesn't need to be automated (the ole "just because you can doesn't mean you should"). However, I have a tendency to indulge myself simply as an exercise to improve my scripting skills with full understanding that I may not ever actually use the script.
In planning the script, the following tasks were targeted:
- Verify that the website was not already white-listed.
- Scrape the main page for any externally sourced resources
- Determine if any of the external references were already white-listed and display the info for review
- Add the domain to the domain white-list object we maintain in our Sophos UTM via the UTM API.
To get started, we needed to be able to access the UTM API and retrieve the current contents of the white-list. To do this, we resurrected a UTM module we had started working on that contained the Invoke-UtmApiCall and Join-Parts functions. The Get-WebDomainRegexObject function was written for this project to actually retrieve the domain list.
Note: I believe that the Join-Parts function was borrowed from here.
At this point we are able to retrieve the current list of white-listed domains as follows:
To accomplish the task of scraping the website for external links (we only bother checking the default/landing page), we decided to use the DownloadString method of the .Net System.Net.WebClient class to render the page HTML to a string. There is a simple check to ensure that a "valid" URL is passed and not just the domain before loading the site. Regex is then used to search for src="http" references, ignoring any relative references.
Note: I believe that the Join-Parts function was borrowed from here.
Function Join-Parts()
{
param ([string[]] $Parts, [string] $Seperator = '')
$search = '(?<!:)' + [regex]::Escape($Seperator) + '+' #Replace multiples except in front of a colon for URLs.
$replace = $Seperator
($Parts | ? {$_ -and $_.Trim().Length}) -join $Seperator -replace $search, $replace
}
Function Invoke-UtmApiCall()
{
param
(
[Parameter(Mandatory=$true)][string]$Uri
,[Parameter(Mandatory=$true)][string]$ApiToken
,[Parameter(Mandatory=$false)][string]$Method = "GET"
,[Parameter(Mandatory=$false)][string]$Data
)
if(($Method -eq "post" -Or $Method -eq "patch") -And [string]::IsNullOrEmpty($Data))
{
Write-Host "Post and patch methods require data parameter" -ForegroundColor "Red"
}
else
{
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$base64 = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes(("{0}:{1}" -f "token",$ApiToken)))
$headers.Add("Authorization", ("Basic {0}" -f $base64))
if(-Not ($Uri -Like "http*"))
{
$Uri = "https://{0}" -f $Uri
}
Try
{
if($Method -eq "post" -Or $Method -eq "patch")
{
$headers.Add("Accept", 'application/json')
$results = Invoke-RestMethod -Method $Method -Uri $Uri -Headers $headers -Body $Data -ContentType 'application/json'
}
else
{
$results = Invoke-RestMethod -Method $Method -Uri $Uri -Headers $headers
}
Write-Output $results
}
Catch
{
if(-Not ($_.Exception.Message -Like "*(404) Not Found*"))
{
Write-Error $_.Exception | Format-List -force
}
}
}
}
Function Get-WebDomainRegexObject()
{
param
(
[Parameter(Mandatory=$true)][string]$UtmAddress
,[Parameter(Mandatory=$true)][string]$ApiToken
,[Parameter(Mandatory=$false)][string]$RefName
,[Parameter(Mandatory=$false)][string]$Name
)
if([string]::IsNullOrEmpty($RefName) -And ![string]::IsNullOrEmpty($Name))
{
$lUri = Join-Parts ($UtmAddress,'api/objects/http/domain_regex/') '/'
Invoke-UtmApiCall -Uri $lUri -ApiToken $ApiToken | Where-Object {$_.Name -eq $Name}
}
elseif(![string]::IsNullOrEmpty($RefName))
{
$lUri = Join-Parts ($UtmAddress,'api/objects/http/domain_regex/',$RefName) '/'
Invoke-UtmApiCall -Uri $lUri -ApiToken $ApiToken
}
else
{
$lUri = Join-Parts ($UtmAddress,'api/objects/http/domain_regex/') '/'
Invoke-UtmApiCall -Uri $lUri -ApiToken $ApiToken
}
}
At this point we are able to retrieve the current list of white-listed domains as follows:
$UtmAddress = "utm.domain.com:4444"
$ApiToken = "yourutmtoken"
$TargetList = "NAMEOFTHEWHITELISTOBJECT"
$targetListObject = Get-WebDomainRegexObject -UtmAddress $UtmAddress -ApiToken $ApiToken -Name $TargetList
$domainListUtm = $targetListObject.domain
To accomplish the task of scraping the website for external links (we only bother checking the default/landing page), we decided to use the DownloadString method of the .Net System.Net.WebClient class to render the page HTML to a string. There is a simple check to ensure that a "valid" URL is passed and not just the domain before loading the site. Regex is then used to search for src="http" references, ignoring any relative references.
if(-Not ($Website -Like "http*" -Or $Website -Like "ftp*"))
{
$Website = "http://{0}" -f $Website
}
$webClient = New-Object 'System.Net.WebClient'
$pageContent = $webClient.DownloadString($Website)
#$pageContent
$regexString = @"
(src)=["']((http).*?)["']
"@
$urlmatches = ([regex]$regexString).matches($pageContent)
When we white-list sites, we normally do so by white-listing the entire root domain instead of just the host referenced in the website URL (e.g. site.com is white-listed vice www.site.com). While this is typically desirable for the target website, it is rarely desirable for external resources. A function that makes use of the .Net System.Uri class is used to simplify the process of extracting the host and root domain from the URL of the target website and any external resources.
The default behavior of the script is to white-list the root domain, but a switch parameter is used to override this behavior if desired. Once the primary domain to be white-listed is determined, it is then checked to determine whether it is already white-listed. A similar process is followed for the external resources by iterating through the regex results, running each external source through the function, and updating the returned object with whether the domain is already white-listed.
Once all of the information is collected, it is then displayed to the user. The information on the external resources is informational only and requires that the script be re-run with the relevant domain/url in order for the resource domain to be white-listed. After displaying the information, the user is prompted to continue or quit using a function we commonly use.
And a screenshot of the output.
The last step in the process is to add the target domain ($primaryDomain) to the UTM white-list object. For this, the domain property of the object returned by the Get-WebDomainRegexObject function is modified, and the object is passed back in to a new function, Set-WebDomainRegexObject.
We then retrieve the updated object and validate that the desired website is present:
Those paying close attention may notice that there are a few inconsistencies with regards to the script usage and some of the examples. The actual script includes some additional functionality such as a section to retrieve the API key from a DPAPI encrypted file using a custom function (UnProtect-DPAPIProtectedKeyFile), adding the target domain to a management database we use, validating the sites listed in the management database against the UTM white-list, and a few other minor tasks.
Also, most of the functions referenced here are actually part of custom modules used to simplify scripting tasks by grouping together some commonly used functions (e.g. the Display-Prompt and UnProtect-DPAPIProtectedKeyFile functions are part of a very generic module while the UTM functions are part of a UTM specific module). However, this piece already seemed a bit broad so I will save some of these other functions for later entries.
The default behavior of the script is to white-list the root domain, but a switch parameter is used to override this behavior if desired. Once the primary domain to be white-listed is determined, it is then checked to determine whether it is already white-listed. A similar process is followed for the external resources by iterating through the regex results, running each external source through the function, and updating the returned object with whether the domain is already white-listed.
function Get-UrlInfo()
{
param (
[Parameter(Mandatory=$true)][string]$Url
)
$urlinfo = "" | select url,domain,rootdomain,whitelisted
$UriObject = [System.Uri]$Url
$domainParts = $UriObject.Host.ToString().Split(".")
$urlInfo.url = $UriObject.AbsoluteUri
$urlInfo.domain = $UriObject.Host.ToString()
$urlInfo.rootdomain = "{0}.{1}" -f $domainParts[$domainParts.Count-2],$domainParts[$domainParts.Count-1]
$urlInfo
}
if($NoRoot)
{
$primaryDomain = (Get-UrlInfo -Url $Website).domain
}
else
{
$primaryDomain = (Get-UrlInfo -Url $Website).rootdomain
}
foreach($url in ($urlmatches | Select-Object Value -Unique | Foreach-Object {$_.Value.Replace("'","").Replace("""","").Split("=")[1]}))
{
$thisDomain = Get-UrlInfo -Url $url
$thisDomain.whitelisted = ($domainListUtm -Contains $thisDomain.domain -Or $domainListUtm -Contains $thisDomain.rootdomain)
$secondaryDomains += $thisDomain
}
Once all of the information is collected, it is then displayed to the user. The information on the external resources is informational only and requires that the script be re-run with the relevant domain/url in order for the resource domain to be white-listed. After displaying the information, the user is prompted to continue or quit using a function we commonly use.
function Display-Prompt
{
[alias("message")]
param(
[Parameter(Mandatory=$false)][AllowEmptyString()][string]$messagetext = ""
[Parameter(Mandatory=$false)][string]$question = "Press <enter> to continue, N to quit."
)
Write-Host
$choices = New-Object Collections.ObjectModel.Collection[Management.Automation.Host.ChoiceDescription]
$choices.Add((New-Object Management.Automation.Host.ChoiceDescription -ArgumentList '&Yes'))
$choices.Add((New-Object Management.Automation.Host.ChoiceDescription -ArgumentList '&No'))
$decision = $Host.UI.PromptForChoice($messagetext, $question, $choices, 0)
if ($decision -eq 0)
{
$result = $true
}
else
{
$result = $false
}
Write-Host
return $result
}
if($domainListUtm -Contains $primaryDomain)
{
Write-Host "The specified domain is already present in the white-list: " -NoNewLine -ForegroundColor "Red"
Write-Host $primaryDomain
}
else
{
Write-Host "The specified domain will be added to the white-list: " -NoNewLine -ForegroundColor "Red"
Write-Host $primaryDomain
}
Write-Host ""
Write-Host ("The following domains are referenced by '{0}' and should be considered for white-listing:" -f $Domain) -ForegroundColor "Cyan"
Write-Host ($secondaryDomains | Format-Table -AutoSize | Out-String)
if(-Not (Display-Prompt))
{
Exit
}
And a screenshot of the output.
The last step in the process is to add the target domain ($primaryDomain) to the UTM white-list object. For this, the domain property of the object returned by the Get-WebDomainRegexObject function is modified, and the object is passed back in to a new function, Set-WebDomainRegexObject.
Function Set-WebDomainRegexObject()
{
param
(
[Parameter(Mandatory=$true)][string]$UtmAddress
,[Parameter(Mandatory=$true)][string]$ApiToken
,[Parameter(Mandatory=$true)][PSCustomObject]$Data
)
if($Data._ref)
{
$jsonData = $Data | ConvertTo-Json
$lUri = Join-Parts ($UtmAddress,'api/objects/http/domain_regex/',($Data._ref)) '/'
Invoke-UtmApiCall -Uri $lUri -ApiToken $ApiToken -Method "PATCH" -Data $jsonData
}
}
$targetListObject.domain += $primaryDomain
Set-WebDomainRegexObject -UtmAddress $UtmAddress -ApiToken $ApiToken -Data $targetListObject | Out-Null
We then retrieve the updated object and validate that the desired website is present:
$targetListObject = Get-WebDomainRegexObject -UtmAddress $UtmAddress -ApiToken $ApiToken -Name $TargetList
Those paying close attention may notice that there are a few inconsistencies with regards to the script usage and some of the examples. The actual script includes some additional functionality such as a section to retrieve the API key from a DPAPI encrypted file using a custom function (UnProtect-DPAPIProtectedKeyFile), adding the target domain to a management database we use, validating the sites listed in the management database against the UTM white-list, and a few other minor tasks.
Also, most of the functions referenced here are actually part of custom modules used to simplify scripting tasks by grouping together some commonly used functions (e.g. the Display-Prompt and UnProtect-DPAPIProtectedKeyFile functions are part of a very generic module while the UTM functions are part of a UTM specific module). However, this piece already seemed a bit broad so I will save some of these other functions for later entries.