Site Reliability Engineer – Level 3

Company:  Granicus
Location: remote
Closing Date: 19/06/2026
Hours: Full Time
Type: Permanent

Job Description

Job Description:

  • Provide production support on a shift according to the team on-call roster
  • Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
  • Monitor and Maintain Systems: Continuously monitor the health and performance of our services, systems, and infrastructure
  • Automate Processes: Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
  • Incident Management: Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
  • System Improvements: Participate in designing and implementing system improvements to enhance reliability, scalability, and performance
  • Collaboration: Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
  • Documentation: Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
  • Capacity Planning: Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
  • Security: Implement and adhere to security best practices to protect our systems and data

Requirements:

  • 5+ years of experience in site reliability engineering, system administration, or a similar role
  • Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
  • Experience with scripting languages such as Python, Bash, or Ruby
  • Bachelor's or postgraduate degree in computer science, Information Technology, or a related field, or equivalent practical experience
  • Familiarity with AI/ML operations, including model lifecycle management, vector databases, and inference performance tuning
  • Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
  • Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
  • Advanced knowledge of monitoring and logging tools like Elastic (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines
  • Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently
  • Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders
  • Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives
  • Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Machine Learning – Specialty, Google Cloud Professional DevOps Engineer, or similar are a plus.

Benefits:

  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Paid time off
  • Remote work options
Apply Now
Share this job
Granicus
  • Similar Jobs

  • Senior Site Reliability Engineer

    remote
    View Job
  • Vice President – Site Reliability Engineering, Data Centers

    remote
    View Job
  • R&D Engineer 3, Electrical Design

    remote
    View Job
  • Product Engineer – Mid Level to Senior

    remote
    View Job
  • Infrastructure Engineer

    remote
    View Job
An unhandled error has occurred. Reload 🗙