Site Reliability Engineer

Site Reliability Engineer 

Location:

Contract Type:

Full Time

Sector:

Infrastructure & Telecommunications

Salary:

£48,000.00 - £50,000.00 Annual

Reference No.

BBBH479980

Remote Job..!! Lead Site Reliability Engineer (GCP – Google Cloud) in South Africa

Hi,

I'm excited to share that one of our clients in UK is hiring for a Lead Site Reliability Engineer (GCP – Google Cloud) in South Africa! It's a fully remote job. Below are the job details. If you're interested, please send your CV to apply.

Title: Lead Site Reliability Engineer (GCP – Google Cloud)
Location: South Africa
Duration: Permanent, fulltime
Job Type: Fully Remote

As the Lead SRE (GCP – Google Cloud), you will drive reliability and scalability across production environments by leading a high-performing SRE team and implementing robust monitoring, automation, and DevOps practices on Google Cloud Platform.

Business Value
  • Infrastructure Performance, Scaling & Optimization
  • Observability & Incident Management
  • Zero-Downtime Deployments & Rollback Reliability
  • Secret Management & IAM Risk Mitigation
  • Configuration Drift & Environment Parity
  • Application-Level Performance & Engineering Quality
Key Responsibilities
  • Own end-to-end system reliability, from cloud resource planning to code-level instrumentation.
  • Review and improve backend code for performance, resiliency, and observability (e.g., retries, timeouts, connection pools, logging).
  • Architect and scale multi-environment Kubernetes deployments (GKE preferred) for high availability and low drift.
  • Define and enforce zero-downtime deployment strategies (canary, blue-green, progressive delivery).
  • Collaborate with fullstack teams on release readiness, CI/CD quality gates, and infra-aware feature rollout.
  • Harden secret management, IAM policies, and privilege boundaries across apps and services.
  • Serve as a hands-on lead in incidents, root cause analysis, and long-term reliability improvements.
  • Write and review Terraform modules, Helm charts, or platform tooling (bash/python/go) as needed.
  • Lead design reviews and cross-functional decisions that impact both product and platform reliability.
Requirements
  • 6+ years of experience across fullstack development, SRE, or platform engineering.
  • Proficiency in one or more backend stacks (e.g., Python/Django, Node/NestJS, Go, Java/Spring) and ability to review or contribute code.
  • Strong expertise in Kubernetes (GKE preferred) and Helm—can optimize, secure, and debug real-world workloads.
  • Strong command of Terraform and IaC workflows, ideally with Terraform Cloud and remote state strategy.
  • Solid understanding of GCP or similar cloud provider (IAM, VPCs, CloudSQL, networking, Secret Manager, monitoring).
  • Experience implementing or enforcing progressive delivery practices (ArgoCD, Flux, GitOps, CI/CD patterns).
  • Proven ability to improve system observability using tools like Datadog, Prometheus, OpenTelemetry.
  • Ability to “go deep” into an application repo, identify architectural flaws or infra misuse, and fix or guide others.
  • Calm under pressure and experienced in incident management and postmortem culture.
Tools and Expectations
  • Datadog- Monitor infrastructure health, capture service-level metrics, reduce alert fatigue through high signal thresholds.
  • PagerDuty- Own incident management pipeline. Route alerts by severity and align with business SLAs.
  • GKE/ Kubernetes- Improve cluster stability and workload isolation. Define auto-scaling configurations and tune for efficiency.
  • Helm / GitOps (ArgoCD/Flux)- Validate release consistency across clusters. Monitor sync status and rollout safety.
  • Terraform Cloud- Support DR planning and detect infrastructure changes through state comparisons.
  • CloudSQL/ Cloudflare- Diagnose DB and networking issues. Monitor latency, enforce access patterns, and validate WAF usage.
  • Secret Management- Audit access to secrets, apply short-lived credentials, and define alerts for abnormal usage.

Read More
APPLY NOW

Share this job

Interested in this job?
Save Job
Create As Alert

Similar Jobs

SCHEMA MARKUP ( This text will only show on the editor. )