Site Reliability Engineer - VMC on AWS
Palo Alto, CaliforniaJob ID R1902663 Date posted Feb. 26, 2019
Ensure that VMware Cloud on AWS operates with high reliability and performance at scale for our customers. The VMC on AWS Site Reliability Engineering team is looking for quality SRE with a diverse set of experiences and skill-sets to run the exciting new VMWare Cloud on AWS services. As a SDDCaaS SRE you will provide service insight, response, and service management to maintain high service reliability with low touch through extensible services/platforms, standardized processes, data insights, and product input.
- Maintain availability of VMware's global services platform
- Work closely with software engineering teams to improve availability of services
- Handle seamless upgrades of infrastructure and services through automation
- Identify, gather, analyze and automate responses to key performance metrics, logs, and alerts
- Ensure infrastructure security compliance
- Conduct post-mortems to analyze and prevent repeat failures
- Conduct periodic on call duties as needed on a regular basis
- BS in Computer Science or related technical field, or equivalent industry experience
- Domain level understanding of Public/Private Cloud Infrastructure & Networking
- Experience operating, troubleshooting, and scaling online services
- Strong communication and interpersonal skills
- Systematic problem-solving approach coupled with a strong sense of ownership and drive
- Professional and open-minded attitude
- Experience administering Linux systems in a production environment
- On Call support for high priority incidents
- Operational experience with networking (WAN or LAN) and an understanding of network theory
- Proficient in a common scripting language: Bash, Python, Go, Ruby, etc.
- Experience with VMware products: vSphere, vCenter, ESX/ESXi, vSAN, and NSX
- Experience with modern container orchestration systems: Kubernetes, Mesos, DC/OS, Swarm
- Experience with infrastructure configuration and automations processes and tools: Terraform, Puppet, Ansible, Chef, Fabric
- Experience with security in the cloud: Intrusion, penetration, and vulnerability scanning
- Experience with monitoring solutions: ELK, Splunk, SUMO, Nagios, Prometheus
- Experience with Atlassian JIRA Service Desk, PagerDuty
- Experience with Change Management processes and functions
- Experience with various data technologies including relational and non-relational databases and message queues
- Good working knowledge of build automation and continuous integration/delivery ecosystem: Git, Gerrit, Maven/Gradle, Jenkins, Docker, Nexus, Artifactory, Selenium.
VMware is a global leader in cloud infrastructure and business mobility. Built on VMware's industry-leading virtualization technology, our solutions deliver a new model of IT that is fluid, instant and more secure. Customers can innovate faster by rapidly developing, automatically delivering and more safely consuming any application. With 2015 revenues of $6.6 billion, VMware has more than 500,000 customers, more than 75,000 partners, and 19,000+ employees in 120+ locations around the world. At the core of what we do are our people who deeply value execution, passion, integrity, customers, and community. Do you dare to do the stuff you've always dreamed about? Dare to explore at careers.vmware.com.
This position is eligible for the DiversifyCPBU referral campaign