Role and Responsibilities: The SRE and DevOps Section Head is responsible for leading the DevOps and SRE teams, managing the successful delivery of high-quality, client-facing services within a managed services model. This role involves overseeing all DevOps and observability practices, developing strategic frameworks, and aligning both internal processes and customer-facing solutions with industry best practices. The Section Head will play a pivotal role in ensuring that the teams deliver resilient, scalable, and high-performing infrastructure and applications, with a focus on customer satisfaction, operational efficiency, and proactive service delivery. Key responsibilities include: 1. Leadership and Strategic Planning • Develop and drive the strategic roadmap for both DevOps and SRE practices, aligning services with client requirements and industry standards within the managed services framework. • Oversee and mentor the DevOps and SRE/Observability teams, promoting a culture of proactive support, innovation, and reliability in line with managed services expectations. • Collaborate with service delivery and client account teams to understand client objectives, propose solutions, and ensure seamless service integration with business goals. 2. Project Delivery and Client Engagement • Manage the successful delivery of DevOps and SRE projects for multiple clients, ensuring adherence to timelines, service-level agreements (SLAs), and quality standards critical in a managed services environment. • Work with account managers to prepare client-specific proposals, define project scopes, and establish clear success criteria. • Act as the primary escalation point for client concerns related to DevOps and SRE services, working with internal teams to resolve issues promptly and maintain high customer satisfaction. 3. Observability Platform Management and Service Reliability • Lead the management and configuration of observability platforms for client projects, ensuring high system visibility and reliability across all services as required by managed service agreements. • Guide teams in implementing best practices in monitoring, alerting, and incident management using tools like IBM Instana, Dynatrace, and Splunk. • Establish guidelines for setting up Service Level Objectives (SLOs) and Service Level Indicators (SLIs) tailored to client requirements, ensuring the operational stability of client environments. 4. DevOps and CI/CD Best Practices Implementation • Oversee the design and deployment of CI/CD pipelines for clients, focusing on automation, quality, and scalability using tools like Jenkins, GitLab, and Terraform. • Define and standardize Infrastructure as Code (IaC) practices to enable scalable, secure, and reliable infrastructure for client environments. • Collaborate with client development teams to optimize deployment processes, reduce cycle times, and deliver continuous improvements aligned with managed services standards. 5. Incident Response, RCA, and Preventative Measures • Supervise the development of incident response frameworks to ensure efficient detection, triage, and resolution of incidents for client projects. • Oversee Root Cause Analysis (RCA) efforts to prevent recurring issues, collaborating with clients to implement proactive measures and improve system resilience. • Ensure that all client-facing observability and incident response documentation adheres to high standards and is readily accessible as part of managed service obligations. 6. Security and Compliance in DevOps and SRE Operations • Guide the integration of security controls and best practices within DevOps and SRE pipelines, ensuring client infrastructure meets compliance standards. • Oversee the implementation of DevSecOps practices across client environments, embedding security at each stage of the development lifecycle. • Monitor adherence to regulatory and industry standards in client engagements, ensuring that solutions meet or exceed compliance requirements. 7. Performance Management and Continuous Improvement • Establish KPIs that reflect team performance, client satisfaction, and SLA adherence within the managed services context. • Conduct regular performance reviews with team members, setting goals that align with client expectations and company objectives. • Identify and implement process improvements and new technologies that enhance service efficiency, quality, and value for clients. ________________________________________ Qualifications and Education Requirements: • Education: Bachelor’s degree in Computer Science, Information Technology, or a related field. • Certifications: Certifications in DevOps, SRE, and at least one of the APM tools (e.g. IBM Instana, Dynatrace, AppDynamics, or Splunk) are preferred. • Technical Skills: o Strong expertise in APM tools in general, with a focus on IBM Instana (Preferred) and its application in dynamic environments like microservices, containers, and Kubernetes. o Knowledge of SRE practices, including automation, monitoring strategies, SLO/SLI definitions, and service reliability improvements. o Proven ability to design integrated monitoring solutions by connecting APM tools to existing systems and workflows. o Level of experience in scripting and automating observability tasks using Python, Ansible, or other automation tools. • Professional Experience: o Minimum of 7-10 years of experience in DevOps, SRE, or related roles, with at least 3 years in a leadership capacity. o Demonstrated success in managing client-facing DevOps and SRE services, preferred if within a managed services model. o Strong experience in delivering large-scale, complex projects across cloud and on-premises environments for multiple clients. Soft Skills: o Leadership and Mentorship: Ability to build, lead, and inspire high-performing teams, promoting a culture of collaboration and accountability. o Client-Centric Communication: Excellent communication skills with the ability to articulate technical concepts to clients and collaborate effectively with stakeholders. o Analytical Thinking and Problem-Solving: Strong analytical skills for diagnosing complex technical issues and implementing innovative solutions.