Search

Use your LinkedIn profile to find the right job match for you.

Job Match
Skip to main content

Site Reliability Engineer

Olympia, Washington

Apply now
Job ID R1908404-1 Date posted Aug. 13, 2019

Site Reliability Engineer, Cloud Management

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures internally critical and externally visible systems have reliability and up-time appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance. SRE is a mindset and a set of engineering approaches focused on optimizing existing systems, building infrastructure, and eliminating work through automation.  As a Site Reliability Engineer in the Cloud Management team, you will build and operate cloud management solutions for Vmware services being offered across multiple public and private clouds.

Our team focuses on common service components across the stack. We develop and operate solutions to support public cloud management, CI/CD container orchestration, security and monitoring, closing the potential gaps between software and service requirements.

We work with various Software Engineering teams building high performance and reliable cloud systems. You will tackle a variety of business, infrastructure security and application problems in a complex ecosystem. You will collaborate with many SaaS teams across all disciplines. These teams will look to you for support and guidance on how to build and operate complex services. Our team is directly responsible for solutions around cloud management, security, reliability and visibility into cloud systems.

As the SaaS business runs on a 24 by 7 basis, the role requires rotational on-call availability (weekdays at work, evenings and weekend for service/system related incidents).

Success in this role requires very strong technical skills, a broad background and understanding of every layer of the software development and cloud ecosystem and excellent understanding of the cloud and container management stacks. You should be comfortable working independently and as part of a specialized team.

Minimum Qualifications

  • 3+ years in various DevOps/SRE roles

  • 3+ years of experience working with AWS

  • Experience administering Linux systems in a production environment

  • Experience in building and running large-scale systems and application architectures

  • Deep understanding of system performance and monitoring

  • Understanding of containers and container orchestration

  • Experience in one or more of the following languages: Python, Java, Go and/or NodeJS

  • Excellent project management skills and the ability to work in a fast-paced and hectic work environment

  • Demonstrate skills in priority setting, analysis, communication, time management, scheduling, and multitasking.

  • Proven verbal and written communication skills

  • BS or MS degree in Computer Science, or a related field

  • U.S. citizen able to attain a U.S. government security clearance and pass regular background investigations

Preferred Qualifications

  • Experience with modern container orchestration systems: Kubernetes, Mesos, DC/OS, Swarm

  • Experience with infrastructure configuration and automations processes and tools: Terraform, Puppet, Ansible, Chef, Fabric

  • Experience with security in the cloud: Intrusion, penetration, and vulnerability scanning

  • Experience with monitoring solutions: ELK, Splunk, SUMO, Nagios, Prometheus

  • Experience with various data technologies including relational and non-relational databases and message queues

  • Good working knowledge of build automation and continuous integration/delivery ecosystem: Git, Gerrit, Maven/Gradle, Jenkins, Docker, Nexus, Artifactory. Selenium

Employees at work Explore This Location

Related Stories