MLOps Expert

AI

Anywhere (Remote with UTC +8)

Contract

Job posted:

Jun 16, 2025

About Wubble

Wubble is a pioneering Music AI platform redefining the future of music generation. Our proprietary foundation model pushes the boundaries of AI-driven music creation. We’re proud to work with top-tier global clients, including Disney, Starbucks, Microsoft, HP, and more. Backed by industry giants like Antler, Google, NVIDIA, and others, we are ranked as one of the top 5 startups in Asia !

Role Overview

We are looking for a Contract MLOps Expert with deep experience in docker, kubernetes, and scalable inference architectures to enable concurrent inference requests on our advanced music generation model. This is a high-impact role where you will directly influence the scalability and performance of our platform.

Key Responsibilities

  1. Autoscaling Optimization on Managed Kubernetes

    • Design and implement efficient horizontal and vertical autoscaling strategies on a managed Kubernetes service tailored to ML workloads.

    • Monitor and tune resource limits, requests, and scaling policies to ensure high availability, cost-efficiency, and low latency.

  2. Performance-Aware Infrastructure Design

    • Analyze service performance characteristics and align autoscaling policies with GPU/CPU usage patterns, memory constraints, and traffic variability.

    • Introduce node pool optimizations (e.g., GPU vs CPU, spot/preemptible nodes) for balanced cost and performance.

  3. MLOps & CI/CD Integration

    • Collaborate with engineering teams to ensure that autoscaling mechanisms integrate smoothly into existing MLOps pipelines and deployment workflows.

    • Support rollout strategies (e.g., canary, blue-green) for ML model updates and infrastructure changes.

  4. Observability & Benchmarking

    • Set up and maintain monitoring dashboards (e.g., Prometheus, Grafana, Cloud Monitoring) to visualize and alert on key autoscaling metrics.

    • Design and execute stress tests and load simulations to validate autoscaler behavior under production-like conditions.

  5. Documentation & Enablement

    • Write clear and concise documentation on autoscaling architecture, configuration guidelines, and operational runbooks.

    • Facilitate knowledge transfer and training sessions for engineering and DevOps teams.

Qualifications

  • GPU & CUDA Optimization

    • Deep understanding of GPU concurrency, memory hierarchy, and multi-stream execution for efficient resource utilization.

  • MLOps & Scalable Infrastructure

    • Hands-on experience with MLOps pipelines, containerization (Docker), and orchestration using Kubernetes (preferably managed services like GKE, EKS, or AKS).

    • Familiarity with infrastructure-as-code tools (e.g., Terraform, Helm) and deployment automation.

  • Model Serving & Inference Optimization

    • Experience deploying and optimizing high-throughput ML inference pipelines, including batching, quantization, or model parallelism.

    • Strong understanding of resource scaling patterns for real-time and batch workloads.

  • Monitoring, Troubleshooting & Communication

    • Proficient in using observability tools (e.g., Prometheus, Grafana, Cloud Monitoring) to identify performance bottlenecks and optimize autoscaling behavior.

    • Clear communicator, capable of articulating complex technical decisions across teams and stakeholders.

  • Bonus Points

    • Experience with NVIDIA H100 or other next-gen GPU architectures in production.

    • Proven experience writing, profiling, and optimizing CUDA kernels for production workloads.

    • Familiarity with generative models (e.g., LLMs, MusicGen) and streaming inference in media-rich applications.

Why join Wubble?

  • Elite Client Portfolio: Work on solutions that power the creative experiences of world-renowned brands like Disney, Starbucks, Microsoft, and HP.

  • Top-Tier Backing: We’re supported by industry leaders such as Antler, Google, and NVIDIA, offering you the opportunity to collaborate with a well-funded and visionary team.

  • Cutting-Edge Tech: Contribute to an advanced foundational model pushing the envelope in AI-driven music generation.

  • High Impact, High Reward: Your expertise will directly shape the performance and scalability of a groundbreaking platform—this is not your everyday startup gig.

  • Remote Collaboration: Enjoy the flexibility of a fully remote contract, enabling you to collaborate from anywhere while tackling exciting, high-profile challenges.

Contract Details

  • Contract Type: Contract / Consultancy

  • Duration: 1 month, with potential extension based on project needs

  • Compensation: Competitive rate, commensurate with experience


To apply, send us your resume, portfolio, and a brief cover letter highlighting relevant CUDA optimization and MLOps experience using the button below with the subject line: "MLOps Expert – CUDA Kernel Optimization – [Your Full Name]".