We are recruiting on behalf of a leading IT services provider. They are seeking a Cloud MLOps Consultant.
On offer is long term opportunity to work with a large European institution that will offer training and development in addition to a competitive rate.
Job type - Contract
Duration - 1 year + Extensions (long term project)
Location - : Fully remote from anywhere in Europe
Start Date - ASAP
- Fluent English communication skills.
- MLOps SRE/MLRE (Site Reliability Engineer/ML Reliability Engineer) - CNO (Cloud-native Operations) forMLOps, based uponCNO forDevOps, with knowledge about and (hands-on) experience with:
- Keeping the infrastructure healthy, ensuring the reliability of infrastructure, apps, services, databases
- Ensuring availability of inference services to products
- Following up on alerts
- Solving (operational) infra related issues (including Kubernetes related issues and issues related to the selected frameworks like Pachyderm and Kubeflow)
- Linux. Standard Linux shell scripts capabilities.
- GCP/Cloud knowledge
- Systems and software architecture concepts.
- Arranging and configuring the required infrastructure (incl. storage solutions, GPUs, memory)
- Installing, configuring and upgrading selected frameworks like Pachyderm and Kubeflow
- Configuring and maintaining monitoring dashboards and alerts
- Tracking the quality of predictions
- Setting up automated re-training and redeployment if a quality metric indicates that that's needed
- Production systems monitoring. Troubleshooting and incidents analysis (first / second level).
- Continuous Integration / Continuous Delivery platforms (Jenkins)
- Source code management systems (Git/svn/Bitbucket)