Sr. Site Reliability Engineer
Austin, TexasJob ID R1912024 Date posted Sep. 11, 2019
Ensure that VMware Cloud on AWS operates at high reliability and performance at scale with minimum human touch for our customers.
The VMC on AWS Site Reliability Engineering team is looking for quality software developers with a diverse set of experiences and skill sets to design and build solutions needed for the exciting new VMWare Cloud on AWS services. As a SDDCaaS SRE software developer you will develop extensible services/platforms that provide service health insight, automated remediation, and service management at scale needed to maintain high service reliability with low touch.
Responsibilities for Monitoring Service Health, Service Management, Orchestration, and Remediation & Troubleshooting
By joining our diverse team you will be responsible for the VMC on AWS service and all aspects of it in production including the user experience. This includes designing and developing software solutions for service monitoring, auto remediation, measuring availability/reliability, performance, Analytics and security. You will build services and solutions that enrich our monitoring and automation through data analytics and applied tooling. Through Service Response (Incident management, problem management, and participation on the globally staffed Service Watch) you will use metrics and health systems to ensure performance, scalability, and reliability. You will ensure proper metrics are implemented to measure service health and drive error budgeting. Through partnerships you foster with the development teams you will support new features, services, releases, and become an authority in our services.
Requirements and Preferred skills:
- Experience engineering, operating, troubleshooting, administrating and scaling online services
- Proficient in Java
- At least one of the following specialties: storage, networking, systems, virtualization
- Excellent troubleshooting, critical thinking, and data analysis skills
- Systematic problem solving approach, coupled with a strong sense of ownership and drive
- Able to balance multiple tasks and projects effectively and quickly adapt to new variables
- Be part of a 7x24 service watch on call rotation, using a follow the sun model
- BS in Computer Science or related technical field, OR equivalent industry experience.
- 3+ year Experience as DevOps Software Development, or SRE (development for large online services)
- 4+ year Experience building and operating highly available and scalable infrastructure solutions
- A tenacious ability to diagnose and fix performance and reliability problems
Bonus Qualifications (Not required but helpful to have experience with 1 or more):
- Experience with container orchestrators (Kubernetes, Docker Native Orchestration, Mesos, Docker Swarm).
- Experience with configuration management tools such as Puppet, or Chef
- Experience in VMware products, specifically Cloud related solutions such as: vSphere, vCenter, ESXi, vSAN or contending cloud solutions and products.'
- Experience with one or more of the following: Data Engineering, A/B testing
- Operational experience with networking (WAN or LAN) and an understanding of network theory
- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols
- You are able to document and version APIs
This position is eligible for the DiversifyCPBU referral campaign