Company:
Granicus
Location: remote
Closing Date: 19/06/2026
Hours: Full Time
Type: Permanent
Job Description
Job Description:
- Provide production support on a shift according to the team on-call roster
- Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support
- Monitor and Maintain Systems: Continuously monitor the health and performance of our services, systems, and infrastructure
- Automate Processes: Develop and maintain automation scripts and tools to streamline operations and reduce manual intervention
- Incident Management: Assist in troubleshooting and resolving incidents, performing root cause analysis, and implementing long-term fixes to prevent recurrence
- System Improvements: Participate in designing and implementing system improvements to enhance reliability, scalability, and performance
- Collaboration: Work closely with software engineers to understand application requirements, provide feedback on design and architecture, and support deployment and release processes
- Documentation: Create and maintain documentation for processes, procedures, and troubleshooting guides to ensure knowledge sharing within the team
- Capacity Planning: Assist in capacity planning activities to anticipate future needs and ensure that our infrastructure can handle growth
- Security: Implement and adhere to security best practices to protect our systems and data
Requirements:
- 5+ years of experience in site reliability engineering, system administration, or a similar role
- Good understanding of Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
- Experience with scripting languages such as Python, Bash, or Ruby
- Bachelor's or postgraduate degree in computer science, Information Technology, or a related field, or equivalent practical experience
- Familiarity with AI/ML operations, including model lifecycle management, vector databases, and inference performance tuning
- Expertise in Linux/Unix systems, networking, and cloud services (AWS, Azure, or Google Cloud)
- Proficiency in scripting languages (Python, Bash, Ruby) and programming languages (Go, Java, C++)
- Advanced knowledge of monitoring and logging tools like Elastic (Prometheus, Grafana, Splunk), configuration management (Ansible, Chef, Puppet), and CI/CD pipelines
- Strong analytical and problem-solving skills with the ability to diagnose and resolve complex issues efficiently
- Excellent verbal and written communication skills, with the ability to convey complex technical concepts to non-technical stakeholders
- Demonstrated ability to lead and mentor a team, drive projects to completion, and manage cross-functional initiatives
- Relevant certifications such as AWS Certified DevOps Engineer, AWS Certified Machine Learning – Specialty, Google Cloud Professional DevOps Engineer, or similar are a plus.
Benefits:
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options
Share this job
Granicus