About Wubble
Wubble is a pioneering Music AI platform redefining the future of music generation. Our proprietary foundation model pushes the boundaries of AI-driven music creation. We’re proud to work with top-tier global clients, including Disney, Starbucks, Microsoft, HP, and more. Backed by industry giants like Antler, Google, NVIDIA, and others, we are ranked as one of the top 5 startups in Asia !
Role Overview
We are looking for a Contract MLOps Expert with deep experience in docker, kubernetes, and scalable inference architectures to enable concurrent inference requests on our advanced music generation model. This is a high-impact role where you will directly influence the scalability and performance of our platform.
Key Responsibilities
Autoscaling Optimization on Managed Kubernetes
Design and implement efficient horizontal and vertical autoscaling strategies on a managed Kubernetes service tailored to ML workloads.
Monitor and tune resource limits, requests, and scaling policies to ensure high availability, cost-efficiency, and low latency.
Performance-Aware Infrastructure Design
Analyze service performance characteristics and align autoscaling policies with GPU/CPU usage patterns, memory constraints, and traffic variability.
Introduce node pool optimizations (e.g., GPU vs CPU, spot/preemptible nodes) for balanced cost and performance.
MLOps & CI/CD Integration
Collaborate with engineering teams to ensure that autoscaling mechanisms integrate smoothly into existing MLOps pipelines and deployment workflows.
Support rollout strategies (e.g., canary, blue-green) for ML model updates and infrastructure changes.
Observability & Benchmarking
Set up and maintain monitoring dashboards (e.g., Prometheus, Grafana, Cloud Monitoring) to visualize and alert on key autoscaling metrics.
Design and execute stress tests and load simulations to validate autoscaler behavior under production-like conditions.
Documentation & Enablement
Write clear and concise documentation on autoscaling architecture, configuration guidelines, and operational runbooks.
Facilitate knowledge transfer and training sessions for engineering and DevOps teams.
Qualifications
GPU & CUDA Optimization
Deep understanding of GPU concurrency, memory hierarchy, and multi-stream execution for efficient resource utilization.
MLOps & Scalable Infrastructure
Hands-on experience with MLOps pipelines, containerization (Docker), and orchestration using Kubernetes (preferably managed services like GKE, EKS, or AKS).
Familiarity with infrastructure-as-code tools (e.g., Terraform, Helm) and deployment automation.
Model Serving & Inference Optimization
Experience deploying and optimizing high-throughput ML inference pipelines, including batching, quantization, or model parallelism.
Strong understanding of resource scaling patterns for real-time and batch workloads.
Monitoring, Troubleshooting & Communication
Proficient in using observability tools (e.g., Prometheus, Grafana, Cloud Monitoring) to identify performance bottlenecks and optimize autoscaling behavior.
Clear communicator, capable of articulating complex technical decisions across teams and stakeholders.
Bonus Points
Experience with NVIDIA H100 or other next-gen GPU architectures in production.
Proven experience writing, profiling, and optimizing CUDA kernels for production workloads.
Familiarity with generative models (e.g., LLMs, MusicGen) and streaming inference in media-rich applications.
Why join Wubble?
Elite Client Portfolio: Work on solutions that power the creative experiences of world-renowned brands like Disney, Starbucks, Microsoft, and HP.
Top-Tier Backing: We’re supported by industry leaders such as Antler, Google, and NVIDIA, offering you the opportunity to collaborate with a well-funded and visionary team.
Cutting-Edge Tech: Contribute to an advanced foundational model pushing the envelope in AI-driven music generation.
High Impact, High Reward: Your expertise will directly shape the performance and scalability of a groundbreaking platform—this is not your everyday startup gig.
Remote Collaboration: Enjoy the flexibility of a fully remote contract, enabling you to collaborate from anywhere while tackling exciting, high-profile challenges.
Contract Details
Contract Type: Contract / Consultancy
Duration: 1 month, with potential extension based on project needs
Compensation: Competitive rate, commensurate with experience
To apply, send us your resume, portfolio, and a brief cover letter highlighting relevant CUDA optimization and MLOps experience using the button below with the subject line: "MLOps Expert – CUDA Kernel Optimization – [Your Full Name]".