About Wubble
Wubble is a pioneering Music AI platform redefining the future of music generation. Our proprietary foundation model pushes the boundaries of AI-driven music creation. We’re proud to work with top-tier global clients, including Disney, Starbucks, Microsoft, HP, and more. Backed by industry giants like Antler, Google, NVIDIA, and others, we are ranked as one of the top 5 startups in Asia !
Role Overview
We are looking for a Contract MLOps Expert with deep experience in docker, kubernetes, and scalable inference architectures to enable concurrent inference requests on our advanced music generation model. This is a high-impact role where you will directly influence the scalability and performance of our platform.
Key Responsibilities
- Autoscaling Optimization on Managed Kubernetes - Design and implement efficient horizontal and vertical autoscaling strategies on a managed Kubernetes service tailored to ML workloads. 
- Monitor and tune resource limits, requests, and scaling policies to ensure high availability, cost-efficiency, and low latency. 
 
- Performance-Aware Infrastructure Design - Analyze service performance characteristics and align autoscaling policies with GPU/CPU usage patterns, memory constraints, and traffic variability. 
- Introduce node pool optimizations (e.g., GPU vs CPU, spot/preemptible nodes) for balanced cost and performance. 
 
- MLOps & CI/CD Integration - Collaborate with engineering teams to ensure that autoscaling mechanisms integrate smoothly into existing MLOps pipelines and deployment workflows. 
- Support rollout strategies (e.g., canary, blue-green) for ML model updates and infrastructure changes. 
 
- Observability & Benchmarking - Set up and maintain monitoring dashboards (e.g., Prometheus, Grafana, Cloud Monitoring) to visualize and alert on key autoscaling metrics. 
- Design and execute stress tests and load simulations to validate autoscaler behavior under production-like conditions. 
 
- Documentation & Enablement - Write clear and concise documentation on autoscaling architecture, configuration guidelines, and operational runbooks. 
- Facilitate knowledge transfer and training sessions for engineering and DevOps teams. 
 
Qualifications
- GPU & CUDA Optimization - Deep understanding of GPU concurrency, memory hierarchy, and multi-stream execution for efficient resource utilization. 
 
- MLOps & Scalable Infrastructure - Hands-on experience with MLOps pipelines, containerization (Docker), and orchestration using Kubernetes (preferably managed services like GKE, EKS, or AKS). 
- Familiarity with infrastructure-as-code tools (e.g., Terraform, Helm) and deployment automation. 
 
- Model Serving & Inference Optimization - Experience deploying and optimizing high-throughput ML inference pipelines, including batching, quantization, or model parallelism. 
- Strong understanding of resource scaling patterns for real-time and batch workloads. 
 
- Monitoring, Troubleshooting & Communication - Proficient in using observability tools (e.g., Prometheus, Grafana, Cloud Monitoring) to identify performance bottlenecks and optimize autoscaling behavior. 
- Clear communicator, capable of articulating complex technical decisions across teams and stakeholders. 
 
- Bonus Points - Experience with NVIDIA H100 or other next-gen GPU architectures in production. 
- Proven experience writing, profiling, and optimizing CUDA kernels for production workloads. 
- Familiarity with generative models (e.g., LLMs, MusicGen) and streaming inference in media-rich applications. 
 
Why join Wubble?
- Elite Client Portfolio: Work on solutions that power the creative experiences of world-renowned brands like Disney, Starbucks, Microsoft, and HP. 
- Top-Tier Backing: We’re supported by industry leaders such as Antler, Google, and NVIDIA, offering you the opportunity to collaborate with a well-funded and visionary team. 
- Cutting-Edge Tech: Contribute to an advanced foundational model pushing the envelope in AI-driven music generation. 
- High Impact, High Reward: Your expertise will directly shape the performance and scalability of a groundbreaking platform—this is not your everyday startup gig. 
- Remote Collaboration: Enjoy the flexibility of a fully remote contract, enabling you to collaborate from anywhere while tackling exciting, high-profile challenges. 
Contract Details
- Contract Type: Contract / Consultancy 
- Duration: 1 month, with potential extension based on project needs 
- Compensation: Competitive rate, commensurate with experience 
To apply, please send your resume, portfolio, and a brief cover letter highlighting relevant CUDA optimization and MLOps experience, either by using the "Apply now" button below or by emailing support@wubble.ai (CC: sufi@wubble.ai, muhammad.abdullah@wubble.ai) with the subject line:
"MLOps Expert – CUDA Kernel Optimization – [Your Full Name]"