There are a number of times when a white list is useful to security professionals, such as:
- You are alerting on a list of domains on your network, and don't want to set off thousand of alerts when someone accidentally adds "windowsupdate.com" to the list
- You are reviewing sandbox reports, and don't want to get common non-malicious domains back in your reports
This coincided with Alexa announcing they would stop publishing a commonly used whitelist - the top 1 million sites. Thankfully Alexa have changed their minds about discontinuing the data-set, for now at least, and there are other similiar sources too.
Sources like this aren't well suited to matching against network data though - sites that are programatically accessed (eg; download.windowsupdate.com) often won't be listed in datasets designed to record human traffic. A better choice may be to use the top x domains on your network. However that does require access to network logs of a large network.
For this use case - I've used logs from networks that are publicly available online. There are plenty of people who (perhaps inadvertently) publish this online. In this case I've used data from freedom of information requests for the top sites requested on a number of UK government networks.
Two things to note are:
- This data is biased towards the UK
- I'd suggest only using domains seen on more than one network. For example one of the domains seen on only one network below is likely Chinese APT (yes, they're aware).
You can find the list below, for all your whitelisting needs: