Job Description
Position- SRE Engineer
Location- Bellevue, WA and Overland Park, KS
Required Qualification:
- Bachelor’s degree or foreign equivalent required from an accredited institution. Will also consider three years of progressive experience in the specialty in lieu of every year of education.
- At least 2 years of Information Technology experience.
- SRE Mindset in Production support : Proactive issue identification using observability tools.
- Skilled in using different monitoring & observability tools to track system performance
- Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.
- Experience in Splunk (including Splunk APM and Splunk O11y), AppDynamics,
- Experience in DB, Network, Linux / Unix, Kubernetes
- Experience in APM, NMON , Wireshark usage and analysis
Preferred Qualification:
- Knowledge of Grafana, RedMetrics, 1000Eyes
- Knowledge of VMs, Load balancers, Firewalls, API Gateways,
- Knowledge of Containerization, Docker, AWS, PCF, GCP, ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)
- Experience in UEM and synthetic monitoring tools
- System Administration: Strong knowledge of infrastructure, including command-line tools and system internals. (Kubernetes triage, linux administration)
- Networking: Understanding of network protocols, configurations, and troubleshooting. (nmon, Wireshark)
- Cloud Computing: Experience with cloud understanding, including cloud architecture (on-perm and public) and services. (AWS and Azure)
- Application Management: Familiarity with continuous integration and continuous deployment processes and tools.
- Advanced programming knowledge: Experience with triaging issues with application code. (Java, Python)
- DB troubleshooting: Familiarity in troubleshooting issues with traditional and NoSQL databases (eg: Oracle, SQL Server, MySQL, MongoDB, Cassandra)
- Monitoring and Observability: Skills in using monitoring tools to track system performance and detect issues including all the backend systems, database, and API's (Splunk, AppDynamics, Splunk o11y, Open Telemetry)
- Problem-Solving: Ability to diagnose and resolve complex issues quickly and efficiently
- Collaboration: Strong communication skills to work effectively with cross-functional teams
- Adaptability: Flexibility to handle changing priorities and technologies
- Attention to Detail: Precision in managing configurations and deployments to avoid errors
- Communication : Excellent communicator who could interact with Director/Sr. Director and above.
- Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR
- Coordinate all activities required to lead incident triage in compliance with SLAs and OLAs. Corelating inputs from various dashboards & tools to drive resolution.
- Flexibility to work in 24 X 7 environment
Job Tags
Permanent employment,