Platform Site Reliability Engineer

World Bank Group

New Delhi, India

Experience: 3 to 5 Years

Skill Required: Other

The rapid shift to hybrid cloud environments necessitates a Site Reliability Engineer (SRE) with expertise in both legacy middleware systems (e.g., JBoss, Apache, WebSphere, IIS) and modern DevOps/DevSecOps pipelines (Terraform, Kubernetes, GitOps). This position focus on ensuring operational continuity for enterprise platforms by stabilizing traditional systems while driving modernization through Infrastructure-as-Code (IaC) practices and automation. As a bridge between legacy and cloud-native technologies, this role will implement robust SRE practices such as observability, error budgets, and incident response to maintain a highly reliable environment with minimal downtime. Additionally, the SRE will lead knowledge transfer efforts to prevent single points of failure and optimize platform performance through auto-remediation playbooks and chaos engineering. This position will implement SRE practices to achieve resilience, automate standards and contribute to the success of future-proof Platform as a Product strategy.

Required:

Technical Proficiency & Cognitive skills:

  • - Experience as a Site Reliability Engineer with hands-on knowledge of Site Reliability Engineering (SRE) practices & Principles, including implementing and managing SLOs, error budgets, observability, incident response, and automation in high-availability environments.
  • - Proven experience with legacy middleware (JBoss, Apache, WebSphere, IIS) and modern stacks (.NET, Java, NodeJS, Angular).
  • - Strong Knowledge and working Experience in Multi-Cloud Platforms (AWS,Azure, GCP)
  • - Strong database skills (PostgreSQL, MySQL, other RDBMS/NoSQL).
  • - Proficiency in DevOps/DevSecOps tools (Terraform, Kubernetes, GitOps, Chef, CI/CD, GitHub/GitLab/Azure Repos)
  • - Experience with containerization (Docker, Kubernetes/AKS) and web service management
  • - Strong Scripting/automation skills (Python, PowerShell, Bash etc.,)
  • - Experience with monitoring/observability tools (Splunk, Prometheus, Grafana etc.,)
  • - Experience in setting up and managing PAAS and COTS solution
  • - Experience working in Agile environments, with a strong understanding of Agile principles and practices. Exposure to the Scaled Agile Framework (SAFe) is highly desirable.

Selection Criteria:

  • * Bachelor’s or Master’s degree with at least 4-5 years of relevant experience.
  • * Experience in adopting Site Reliability Engineering practices to work. Having an SRE certification is a mandatory requirement
  • * Experience working in Agile environments and a SAFE Agile certification is mandatory
  • * Strong experience configuring and supporting .NET,Java, NodeJS, Angular Applications
  • * Good understanding of the multiple middleware technologies and custom COTS product hosting’s
  • * Experience with Azure DevOps (as both developer and administrator).
  • * Solid knowledge of modern DevOps practices, including CI/CD, git, Docker, and Kubernetes.
  • * Familiarity with Artifactory solutions (e.g., JFrog).
  • * Experience with Infrastructure as Code tools (Terraform, Chef, etc.).
  • * Knowledge of Azure AD authentication and authorization.
  • * Proficient with monitoring tools and Splunk.
  • * Demonstrated experience working in Agile environments
  • * Hands-on experience with AWS and Azure cloud services. Having cloud certification in Azure/AWS is an added advantage.

Source: https://worldbankgroup.csod.com/ux/ats/careersite/1/home/requisition/33873?c=worldbankgroup