Job Description

The rapid shift to hybrid cloud environments necessitates a Site Reliability Engineer (SRE) with expertise in both legacy middleware systems (e.g., JBoss, Apache, WebSphere, IIS) and modern DevOps/DevSecOps pipelines (Terraform, Kubernetes, GitOps). This position focus on ensuring operational continuity for enterprise platforms by stabilizing traditional systems while driving modernization through Infrastructure-as-Code (IaC) practices and automation. As a bridge between legacy and cloud-native technologies, this role will implement robust SRE practices such as observability, error budgets, and incident response to maintain a highly reliable environment with minimal downtime. Additionally, the SRE will lead knowledge transfer efforts to prevent single points of failure and optimize platform performance through auto-remediation playbooks and chaos engineering. This position will implement SRE practices to achieve resilience, automate standards and contribute to the success of future-proof Platform as a Product strategy.

Required:

Technical Proficiency & Cognitive skills:

- Experience as a Site Reliability Engineer with hands-on knowledge of Site Reliability Engineering (SRE) practices & Principles, including implementing and managing SLOs, error budgets, observability, incident response, and automation in high-availability environments.
- Proven experience with legacy middleware (JBoss, Apache, WebSphere, IIS) and modern stacks (.NET, Java, NodeJS, Angular).
- Strong Knowledge and working Experience in Multi-Cloud Platforms (AWS,Azure, GCP)
- Strong database skills (PostgreSQL, MySQL, other RDBMS/NoSQL).
- Proficiency in DevOps/DevSecOps tools (Terraform, Kubernetes, GitOps, Chef, CI/CD, GitHub/GitLab/Azure Repos)
- Experience with containerization (Docker, Kubernetes/AKS) and web service management
- Strong Scripting/automation skills (Python, PowerShell, Bash etc.,)
- Experience with monitoring/observability tools (Splunk, Prometheus, Grafana etc.,)
- Experience in setting up and managing PAAS and COTS solution
- Experience working in Agile environments, with a strong understanding of Agile principles and practices. Exposure to the Scaled Agile Framework (SAFe) is highly desirable.

Selection Criteria:

* Bachelor’s or Master’s degree with at least 4-5 years of relevant experience.
* Experience in adopting Site Reliability Engineering practices to work. Having an SRE certification is a mandatory requirement
* Experience working in Agile environments and a SAFE Agile certification is mandatory
* Strong experience configuring and supporting .NET,Java, NodeJS, Angular Applications
* Good understanding of the multiple middleware technologies and custom COTS product hosting’s
* Experience with Azure DevOps (as both developer and administrator).
* Solid knowledge of modern DevOps practices, including CI/CD, git, Docker, and Kubernetes.
* Familiarity with Artifactory solutions (e.g., JFrog).
* Experience with Infrastructure as Code tools (Terraform, Chef, etc.).
* Knowledge of Azure AD authentication and authorization.
* Proficient with monitoring tools and Splunk.
* Demonstrated experience working in Agile environments
* Hands-on experience with AWS and Azure cloud services. Having cloud certification in Azure/AWS is an added advantage.

Source: https://worldbankgroup.csod.com/ux/ats/careersite/1/home/requisition/33873?c=worldbankgroup

Platform Site Reliability Engineer at World Bank Group

Platform Site Reliability Engineer

Job Description

Get instant alert on jobs and funding opportunities