MLOps Expert

Anywhere (Remote with UTC +8)

Contract

Job posted:

Jun 16, 2025

About Wubble

Wubble is a pioneering Music AI platform redefining the future of music generation. Our proprietary foundation model pushes the boundaries of AI-driven music creation. We’re proud to work with top-tier global clients, including Disney, Starbucks, Microsoft, HP, and more. Backed by industry giants like Antler, Google, NVIDIA, and others, we are ranked as one of the top 5 startups in Asia !

Role Overview

We are looking for a Contract MLOps Expert with deep experience in docker, kubernetes, and scalable inference architectures to enable concurrent inference requests on our advanced music generation model. This is a high-impact role where you will directly influence the scalability and performance of our platform.

Key Responsibilities

Autoscaling Optimization on Managed Kubernetes
- Design and implement efficient horizontal and vertical autoscaling strategies on a managed Kubernetes service tailored to ML workloads.
- Monitor and tune resource limits, requests, and scaling policies to ensure high availability, cost-efficiency, and low latency.
Performance-Aware Infrastructure Design
- Analyze service performance characteristics and align autoscaling policies with GPU/CPU usage patterns, memory constraints, and traffic variability.
- Introduce node pool optimizations (e.g., GPU vs CPU, spot/preemptible nodes) for balanced cost and performance.
MLOps & CI/CD Integration
- Collaborate with engineering teams to ensure that autoscaling mechanisms integrate smoothly into existing MLOps pipelines and deployment workflows.
- Support rollout strategies (e.g., canary, blue-green) for ML model updates and infrastructure changes.
Observability & Benchmarking
- Set up and maintain monitoring dashboards (e.g., Prometheus, Grafana, Cloud Monitoring) to visualize and alert on key autoscaling metrics.
- Design and execute stress tests and load simulations to validate autoscaler behavior under production-like conditions.
Documentation & Enablement
- Write clear and concise documentation on autoscaling architecture, configuration guidelines, and operational runbooks.
- Facilitate knowledge transfer and training sessions for engineering and DevOps teams.

Qualifications

GPU & CUDA Optimization
- Deep understanding of GPU concurrency, memory hierarchy, and multi-stream execution for efficient resource utilization.
MLOps & Scalable Infrastructure
- Hands-on experience with MLOps pipelines, containerization (Docker), and orchestration using Kubernetes (preferably managed services like GKE, EKS, or AKS).
- Familiarity with infrastructure-as-code tools (e.g., Terraform, Helm) and deployment automation.
Model Serving & Inference Optimization
- Experience deploying and optimizing high-throughput ML inference pipelines, including batching, quantization, or model parallelism.
- Strong understanding of resource scaling patterns for real-time and batch workloads.
Monitoring, Troubleshooting & Communication
- Proficient in using observability tools (e.g., Prometheus, Grafana, Cloud Monitoring) to identify performance bottlenecks and optimize autoscaling behavior.
- Clear communicator, capable of articulating complex technical decisions across teams and stakeholders.
Bonus Points
- Experience with NVIDIA H100 or other next-gen GPU architectures in production.
- Proven experience writing, profiling, and optimizing CUDA kernels for production workloads.
- Familiarity with generative models (e.g., LLMs, MusicGen) and streaming inference in media-rich applications.

Why join Wubble?

Elite Client Portfolio: Work on solutions that power the creative experiences of world-renowned brands like Disney, Starbucks, Microsoft, and HP.
Top-Tier Backing: We’re supported by industry leaders such as Antler, Google, and NVIDIA, offering you the opportunity to collaborate with a well-funded and visionary team.
Cutting-Edge Tech: Contribute to an advanced foundational model pushing the envelope in AI-driven music generation.
High Impact, High Reward: Your expertise will directly shape the performance and scalability of a groundbreaking platform—this is not your everyday startup gig.
Remote Collaboration: Enjoy the flexibility of a fully remote contract, enabling you to collaborate from anywhere while tackling exciting, high-profile challenges.

Contract Details

Contract Type: Contract / Consultancy
Duration: 1 month, with potential extension based on project needs
Compensation: Competitive rate, commensurate with experience

To apply, please send your resume, portfolio, and a brief cover letter highlighting relevant CUDA optimization and MLOps experience, either by using the "Apply now" button below or by emailing support@wubble.ai (CC: sufi@wubble.ai, muhammad.abdullah@wubble.ai) with the subject line:
"MLOps Expert – CUDA Kernel Optimization – [Your Full Name]"

Apply now