The 9 Pillars of DevOps at Bold Penguin
We are obsessed with security, availability, and performance
DevOps is a bit of a buzzword, and it means something slightly different at each organization. At Bold Penguin, we use a collection of tried-and-true principles to make everyday decisions about what we build and how we build it. These nine principles shape how we approach problems, how we run our technical operations, and how we efficiently scale our team.. If you’re wondering what DevOps looks like at Bold Penguin, look no further.
Infrastructure as Code
We manage and configure infrastructure components using human-readable but machine-executable templates. This allows us to apply the same development lifecycle to our infrastructure as to our application code base. Our infrastructure configuration is versioned in source control and is promoted through the application environments along with our application. Through automation, we make sure that infrastructure resources are uniformly and predictably configured between application environments. This secure base line helps us address any security issues and also plays a critical role in our disaster recovery plan.
Architect and Test for Resiliency
“Everything fails all the time”
We don’t just expect things to fail: we regularly test our ability to deal with failures because simply architecting for resiliency isn’t enough. Even though hardware failures aren’t common, we do our best not to tempt Murphy’s Law. We run scheduled “Game Days” where we actually try to break our application in a controlled environment. Our application is deployed to multiple availability zones and has automated, built-in recovery mechanisms to withstand hardware failures, power outages, localized disasters, and other disruptions.
Architect for Zero Downtime Deployments
We never want code and infrastructure changes to delay or otherwise impact our customers. We are resolute about ensuring that our applications can be updated safely, at any time, and with no noticeable impact. This allows us to safely push new features and patches when they are ready, rather than waiting for scheduled deployment windows.
Infrastructure resources are provisioned once, and are never updated. If updates need to be applied, we replace resources rather than modify them. As a result, it’s easy to roll back to a known configuration. Additionally, we avoid the security risks posed by opening up administrative access to running VMs.
Automate Releases Through Delivery Pipelines
We believe in working smart, so we lean heavily on automation to run our deployment operations for us. Because they are automated, they’re more reliable and less vulnerable to human error. In fact, we’re so confident in our automations that we treat our production environment as read-only. This means that only pre-approved automated delivery pipelines can modify the production environment. Only in a break glass scenario do we provide humans elevated access in production.
Defense in Depth
“Two is one, one is none”
We deploy redundant controls to constantly guarantee the security of our infrastructure. By deploying redundant security mechanisms, we protect ourselves from unintentional errors or malicious actors. A good example is how we secure virtual machines in our cloud.
- We deploy any virtual machines to private subnets that have network access controls preventing direct connectivity from the internet.
- We use network security groups to restrict access from within the trusted VPC.
- We purposefully never launch virtual machines with any SSH key installed.
- We immediately flag any SSH sessions, then trigger a security incident for manual intervention.
At the core of this is the principle of least-privileged access. We have role-based access control groups to ensure that everybody has just enough access to do their jobs.
Auditing, Logging, Monitoring, and Telemetry
We store the log files from any resources, services, and applications in a secure and tamper-proof location. Log files are then ingested into a searchable document store for freehand analytical forensics investigation. Any metrics associated with any running resources (like number of connections, response times, number of errors) are also centrally recorded, monitored, and graphed for production assurance purposes. By combining log files, application telemetry readings, and robust monitoring mechanisms, we safeguard a non-repudiable information & event management data store.
Encrypt all the things
“Dance like nobody’s watching; encrypt like everyone is”
Simply put, we encrypt all data at rest and in transit wherever possible. Enabling encryption is dead simple for most managed services, and it only adds negligible overhead. There’s really almost no reason to not enable this. When handling sensitive data, we’ll combine encryption at rest along with one-time use data key encryption for each individual row or entity. As a result, sensitive data can confidently remain confidential even in the most precarious situations. Our cryptographic keys are kept safe using IAM policies and we have a complete audit trail of any usage of our keys for proactive monitoring and forensics purposes.
Automate, simplify judiciously & avoid undifferentiated heavy-lifting
We want to focus on the things that are core to our goal of simplifying commercial insurance. To accomplish this, we prefer to leverage managed services and existing technologies wherever possible. This allows us to benefit from the scale of our technology vendors while remaining focused on the task at hand. Beyond the obvious reasons, this reduces the cognitive burden that comes with operating a lot of things. We try to automate and simplify our operations as much as possible and only add complexity when absolutely needed.
At Bold Penguin, DevOps is at the core of how we work and grow as an organization. That doesn’t mean, however, that there’s a “DevOps” team that owns all of the things. In our mind, building and running a software platform is a shared responsibility across disciplines, but we continually question ourselves and our systems to improve our technology delivery practices. We are stubborn on the principles but flexible on the implementation, which means that an engineer can suggest a change to our infrastructure and expect that this change will be considered as part of the normal software development lifecycle. After all, we are one team at Bold Penguin––and being one team means that we are obsessed with the security, availability, and performance needed for our platform to thrive.