Full-stack Developer | Front-End Developers | Back-End Developers
Agile | AWS | Ruby | Go | Python
We are looking for Site Reliability Engineers (SREs) who are responsible for keeping all user-facing services and other production systems running smoothly. SREs ensure that iPrice's services have reliability, uptime appropriate to users' needs and a fast rate of improvement. Additionally, SREs will keep an ever-watchful eye on our systems capacity and performance.
You’ll have the opportunity to manage the complex challenges of scale which are unique to iPrice, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
If you are a blend of pragmatic operator and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and codebase, then you are the right person!
- Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation and refinement.
- Collaborate with engineering teams on their infrastructure needs, and advise them throughout the development lifecycle.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health, within our Service Level Objectives.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless post-mortems.
- Debug production issues across services, databases and levels of the stack.
- Design, develop and manage monitoring tools to provide performance dashboards, alerts, and collect data required to proactively identify issues and/or recommend improvements.
- A Bachelor's Degree/Diploma in Computer Science, Information Technology or a related subject.
- Minimum of 7 years of experience in provisioning environments, deploying applications, and maintaining infrastructures.
- Professional experience using Python, Go, or Ruby.
- Strong familiarity with deployment automation/configuration management tools like Chef, Ansible, Puppet, or Terraform.
- Possess experience with cloud environments – AWS, GCP or Azure.
- Have extensive experience building scalable platforms leveraging containers in a production environment.
- Great to have: Operated distributed data storage systems at scale, especially Elasticsearch and SQL Azure.
- Have experience with logging and telemetry services.
- Solid knowledge of continuous integration, continuous delivery, automated testing and all phases of the software development lifecycle.
- Experience of working in an agile and multi-cultural environment across many SCRUM teams at the same time.
- A Kaizen mindset and spirit of continuous improvement on a personal level and always up to date with the latest technology trends professionally.
- Ability to identify problems before they happen and implement solutions that detect and prevent outages.
- Expertise in designing, analysing and troubleshooting large-scale distributed systems.
- Ability to debug, optimize code and automate routine tasks.
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
- Understanding of CI/CD principles, Linux fundamentals, networking concepts and IP protocols.