Holistic Approach to Systems - Operations is a Software Problem - Team-Oriented Communication
At DigitalEd, a DevOps Engineer is responsible for ensuring our production systems meet our customers uptime and service needs, with software engineering tools and capabilities, not relentless toil. They are pragmatic, objective, and articulate, with strong communication and teamwork capabilities. They create effective tooling and automation that enables our teams to give our customers a compelling and seamless experience with our Mobius platform.
The SRE team designs, deploys, and manages DigitalEd’s internal Private Cloud Infrastructure as well as our customer facing Google Cloud Platform SaaS application infrastructure. We anticipate this role will ideally spend no more than 30 to 50% of their time on “ops” related work, and the rest of their time on software development to improve the scalability, reliability, and availability of the Mobius application.
Outcomes & Key Responsibilities: The Impact You’ll Have
- System Design: Engage in and improve the whole lifecycle of our service — from inception and design, through deployment, operation and refinement
- System Support: Support our service through system design consulting, developing software platforms and frameworks, and capacity planning
- System Maintenance: Maintain our service by measuring and monitoring availability, latency and overall system health; support on-call rotations with operational duties that have not been addressed with automation
- Incident Management: Practice sustainable incident response and blameless postmortems
- Automate: Identify areas of opportunity to programmatically automate cloud deployment, administration, and monitoring tasks
- Teamwork: Work cross functionally with Engineering, Support and other parts of the business at times to deliver customer features to our cloud platform
- Problem Solve: Investigate and troubleshoot cloud component performance. Leverage your experience to find the root causes of defects and work to proactively address them.
- Knowledge: Never stop learning, remaining current in the latest dev ops / SRE technologies and functionalities, as well as ever-evolving needs of customers and their customers
Measures of Performance: How You Know You’re Doing Well
- Process Execution: Every project, automation task, and incident is executed well and completely. We ensure that all work in our system is done to the best of our ability given our knowledge, tooling, and experience
- Customer Satisfaction: A desire to ensure a high quality of service to provide the best customer experience, by continually finding the next problem to solve, and solving it well
- Effective Cooperation: Working with Customer Success and Development continually to ensure our customers needs are met and exceeded
Competencies & Experience: The Stuff that Makes you Great at This
- An understanding that system failure is normal, and the ability to embrace risk as part of the job
- Demonstrated success in working through blameless post mortem processes, using techniques such as “the infinite hows"
- Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- Ability to debug and optimize code and automate routine tasks
- Ability to see the system as a whole and treat its interconnections with as much attention and respect as the components themselves
- Strong desire to automate
The Technical Piece: The Knowledge and Exposure that this Role can’t Operate Without
- 4-5 years in an Operational role, be it DevOps, SRE, or traditional network/server management
- Advanced expertise with at least one programming language, with a preference for Java and Python; polyglot preferred
- Extensive experience in Linux
- Experience with cloud platforms, preference for GCP
- Experience with containers and orchestration (Docker, Kubernetes)
- Experience with database management (PostgreSQL)
- Experience with incident management and response (Jira)
- Experience with infrastructure as code (IaC) (terraform, puppet, git)
- Experience with general networking concepts and protocols, and storage fundamentals
Tech Stack - What You’ll Be Using
- Apache, CGR, COnfluence, Docker, Docker Swarm, Eclipse, GCP, GCE, Git Bash, GKE, HA Prozy, Java, Jira, Kubernetes (K8s), Linux, Opsgenie, Puppet, Python, PostgreSQL, Terraform, and yes we felt it was necessary to alphabetize this list.
*Note - We believe that skills are transferable. If you have a lot of experience with MySQL, moving that over to PostgreSQL isn’t that far of a stretch. We’ve also just recently built an internal web application; our tech stack there is Firebase, Firestore, Node and VueJS.
We’re ultimately looking for a site reliability generalist with foundations across development, system operations, resiliency testing, security hardening, and performance engineering. We’re on the hunt for someone who’s comfortable with taking on new engineering challenges, defining potential solutions, and implementing designs in a team environment; which means drive, ownership and tenacity are the key tenets of someone being super successful in this position, and in this team. If you want to be an integral part of DigitalEd’s evolution towards contemporary application and infrastructure management practices, this could be a great role for you to leave your mark on.
The Culture Part - How you Know if this Place if for you
The spirit of our aspirational culture is rooted in the concept of ‘No Deposit, No Return’. If you don’t put anything in to your professional experience, you won’t get anything out of it. To bring this to life, we believe in the pillars of our core values: Customer Orientation, Curiosity, Teamwork, Adaptability, Ownership and Coaching (for Leaders). If any of these words strike a chord, then we’ve got something in common.
Read through this posting and not sure if you’re qualified? Apply anyways. You never know where it could go, and we promise to read and review every application that comes through - with a magnifying glass we like to call the ‘Potential Detector’. Everyone has a great story, and we’d love to hear yours.
Sincerely, The DigitalEd People & Culture Team
PS - We know diverse teams make strong teams, so we welcome all individuals of diverse backgrounds, abilities, experiences, and perspectives to apply. If you require accommodation during the application process, simply let us know and we’ll work to ensure it’s a positive experience for you.