Improving the Security of Software Defined Infrastructures
By Thomas Kraus
The evolution of IT infrastructures towards platforms and high-level (cloud) services greatly benefits from Software Defined Infrastructures (SDI). Apart from these benefits SDI also introduces new areas for which security requirements should be set and met. We see in practice however, that specifying and implementing additional measures to deal with challenges that rise specifically from Software Defined Infrastructures is often considered less urgent, or even overlooked.
If you’re not familiar with the concepts and terminology of Software Defined Infrastructure, you may want to read my introduction to SDI first. This post is intended for IT administrators and managers responsible for building and operating software defined infrastructures. If you have a different role, the contents may still interest you.
Conceptually, a Software Defined Infrastructure typically looks as represented in this picture:
Speaking from my experience as consultant and developer at the Software Improvement Group (SIG), below I will explain 1) why SDI is good for security, 2) how SDI itself can become an attack vector, and 3) how to structurally assure security through SDI.
Why is SDI good for security?
When employing SDI as a security tool, it allows you to define a security baseline in code once and deploy, or rather enforce, this baseline on all your servers. Examples of security measures in such a baseline include: access control, disabling broken protocol versions like SSH1 and SSLv3, enforcing local firewall settings, centralized logging and monitoring. Additional security rules, based a server’s role, can be expressed in code such that any new or existing machine with a particular role will get the appropriate security rules applied.
Because of the large degree of automation, not only deployment of security patches is quick, but also recovery of a potentially compromised server can be fast, by simple rebuilding it. Of course this does require to have state separated from application software, e.g. by persisting state to separate “volume” (a logical disk), “object store”, or remote database, such that it can be reconnected to the new machine.
Furthermore, SDI helps in the automated security & compliance testing area, e.g. by frequently running tools such as ServerSpec and InSpec on all servers. With the right process in place (that is, enforcing that any production change can only be made through an automated process), the auditable history of your version control system makes it easier to demonstrate compliance.
SDI itself as an attack vector?
SDI is a very powerful instrument, and “with great power comes great responsibility” (quote).
Keep in mind that the tools and APIs to build and connect infrastructure, as well as those to install and configure software systems, are the same tools and APIs used for modifying or tearing down these software systems and their infrastructure. Examples of security issues that are often overlooked include:
- Insufficiently protected interfaces
- Insecure handling of secrets
- Inclusion of untrusted code
We’ll elaborate on these examples below.
1. Insufficiently protected interfaces
Although the interfaces for managing your infrastructure (i.e. version control, provisioning API) are not per se exposed to the public Internet, it is insufficient to just rely on your organization’s network firewall for security.
For example, an external attacker may find a vulnerability anywhere in the software, compromise it and use that as an attack vector to get access to the SDI version control repository, modify the infrastructure code and gain access to sensitive production data.
Therefore, even when considered internal systems, authentication and authorization are still important to implement for your version control, provisioning and configuration management tools, as well as the infrastructure APIs. This helps you to provide “defense in depth”.
To be more specific, authentication of the version control repository could be handled by your company’s LDAP server. Depending on your specific version control tooling, you may even define roles to specify who is allowed to merge changes into the “master” branch. Access to provisioning APIs should be allowed via named accounts only. Having this in place, individual access can be easily revoked when necessary, e.g. when an employee leaves the company.
2. Insecure handling of secrets
Configuration management systems separate code from configuration to enable code-reuse. The code expresses what needs to be installed and configured (and for tools that don’t use a declarative language, also the how). Configuration files on the other hand specify site-specific or server-specific configuration values. Configuration management tools like Chef and Puppetseparate code and configuration as follows:
- Chef: Recipes/cookbooks vs. attributes and data bags
- Puppet: Manifests vs. hiera data, e.g. stored in (e)YAML files
Keeping your secrets as part of your configuration values however is a bad practice: they end up in version control in plain text!
Many users typically can access the version control system and as version control repositories maintain history the secrets are accessible for a very long time! This also means after accidentally checking in any secrets into version control, the secrets should be changed!
However, one solution is to encrypt secrets in your version control, and manage encryption/decryption keys through a separate process. For instance, Puppet supports the hiera-eyaml backend, that supports encrypted values in the Hiera configuration.
Another solution is to separate secrets completely from the rest of your configuration and use dedicated tooling (a ‘vault’) to store and expose secrets exactly and only where you need them. Examples include Docker secrets and HashiCorp’s Vault. Dedicated tools can take care of key-management and -rotation for you, and may even periodically expire and renew credentials for you.
Finally, beware of logging when it comes to secrets. Many software systems (both off-the-shelf and bespoke) log environment variables or database connection strings upon start-up, typically in clear-text. Also, a change of a (secret) value in a configuration file, made by your configuration management tool, may show as a diff in the tool’s log output. When these logs are collected in a centralized logging server (which in itself is good practice), then the secrets used on all your servers can be “conveniently” browsed on the central logging server.
3. Inclusion of untrusted code
When your configuration management depends on resources that are retrieved from the public Internet, special care should be taken. Nowadays it is very convenient to re-use scripts, libraries and even complete Docker images made by others. However, these cannot always be trusted, especially over time. Both a file (e.g. an installation package or Docker image) and a script (Docker file or shell script containing installation instructions, which may in turn download additional files) are not immutable resources.
This means that without additional measures there is no guarantee that what you download and install today is the same as what you downloaded and installed last week (that is, with the same command or code).
On one hand this is great as when installing the same software configuration for a second time, you’ll benefit from (security) improvements added in the mean time. On the other hand, the file or script you’re downloading may have been compromised without you knowing it and you could end up with different results, possibly including a backdoor or virus.
Solutions here can vary but should include at least a hash verification (e.g. SHA256) of the remote content, and let the installation process come to a grinding halt when the stored and newly computed hashes do not match.
Additionally, setting up a binary repository for packages is recommended. This binary repository can act as a caching proxy to ensure you’ll install the second time exactly the same as what was downloaded the first time. This also helps in being able to rebuild a software system, container or server when the online resource is (temporarily) unavailable.
How to fix SDI security structurally?
As SDI allows to leverage the benefits of a software development process, the security should be built into the process similar to any other software development. That is, adopt a so-called Secure Software Development Lifecycle. Similar to Microsoft’s SDL, we’ll explain what such a secure process for maintaining your infrastructure should look like.
The lifecycle starts with gathering requirements for your infrastructure, which, next to the “functional” requirements, should cover both security and privacy aspects. This doesn’t mean you cannot work in an agile fashion; when more requirements are added or clarified in the future you can address these in a next iteration of your process.
In the design phase, threat modeling and attack surface analysis activities help getting insight in risks introduced by the changes or additions you’re making to the IT landscape. Be sure to review threats and the attack surface with a small group of people that know the application as well as its infrastructure surroundings.
Once you’re ready to write code (which of course includes test code), you’ll conduct code reviews and verify that any third party libraries you’ve introduced don’t contain known vulnerabilities.
When the development of code is done (i.e. complete including test code, reviewed and meeting quality standards) you can deploy it for run-time testing, for both functionals and non-functionals. Scanning the system for well-known possible vulnerabilities and performing penetration testing is done in this phase. As proper penetration testing requires human skills, effort, and (often external) expertise, it is typically not feasible to perform on a daily or even weekly basis.
Keep in mind: passing a pen-test does not prove the absence of vulnerabilities that can be exploited, but merely that within limited time the system could not be compromised using known exploits or strategies.
While that is fine, do not see this as an excuse to never perform a pen-test. The activity is complementary to other measures you take.
For the final production deployment a last review takes place and you should write down a (brief) incident response plan. The latter could include the use of a pragmatic “panic button”: a single command that halts all automatic propagation of changes throughout your infrastructure. This way, your environments are frozen until the cause for the panic has been resolved.
Your main takeaways
SDI can be used as a tool to improve security but it can also become an attack vector when the additional measures that must be taken to protect an SDI, compared to a classical infrastructure setup, get insufficient attention.
Common security issues that are often overlooked include:
- Insufficiently protected interfaces: e.g. SDI version control, cloud API
- Insecure handling of secrets: e.g. passwords and keys in configuration
- Inclusion of untrusted code: e.g. run-time dependencies on online scripts or binaries
Your Software Defined Infrastructure should have a Secure Software Development Lifecycle process in place. Threat modeling, secure code review and automated security testing help to address security issues early on, and these are complementary techniques.
If you need help with any of the above, don’t hesitate to reach out. At the Software Improvement Group (SIG) we’re committed to Getting Software Right. This includes the software that defines IT infrastructures, and definitely includes security!