Three Vs-Driven MLOps at ThakiCloud ๐ โ Why Youโll Want to Join Us
Growth at ThakiCloud, and Your Career ๐
โVelocity ยท Validation ยท Versioningโ (Three Vs) โ If these three words make your heart race, ThakiCloud is exactly your stage. Weโre a place where you can experience real full-stack MLOps with actual traffic, achieving vertical integration from GPU/NPU infrastructure to SaaS. With the power of technology, culture, and colleagues, weโre moving faster, safer, and further.
ThakiCloudโs MLOps, Hereโs How Itโs Different
1. Velocity โ From Idea to Production, Before Your Coffee Gets Cold
- IaaS-PaaS-SaaS Vertical Integration: Mixed deployment of GPUยทNPU in Kubernetes NodePool, zero rescheduling cost when switching from experiment to serving.
- JupyterHub Image Auto-build: Just push a branch and Helm Chart is immediately deployed to staging cluster.
- Feature Store-based Experiment UI: Combine dataยทfeature versions with one click, launch new experiments within 15 minutes.
2. Validation โ Fail Fast, Metrics in Product Language
- Shadow Traffic Funnel: Copy 10% of real-time traffic to evaluate new models without user exposure.
- Click RateยทMAU โ ML Metrics Auto-integration: Monitor business KPIs and ML metrics together on Prometheus + Grafana dashboard.
- Heuristic Safety Layer: Automatically filter predictions with confidence < ฯ to protect user experience.
3. Versioning โ Time Travel with One Docker Tag Line
- OCI Model Registry: Manage modelsยทfeaturesยทmetadata as image tags, instant rollback by specifying
sha
only. - Daily Auto-retraining: When data drift is detected, Airflow DAG automatically executes retrainingยทvalidationยทpromotion.
- Fallback Model: Light model is deployed within 1 second when SLO is violated.
Pain Points We Solved & Next Chapter
- Dev โ Prod Inconsistency โ Unified with single Helm Release.
Next: Multi-region deployment standardization. - Alert Flood โ Noise reduction with Alert tuner bot.
Next: Automatic analysis of log level root-cause with GPT. - Long-tail Bugs โ Reproduction with Feature Slicing debugger.
Next: Complete reproduction automation based on data synthesis. - Slow Deployment โ 30 โ 5 days with Canary + Progressive Delivery.
Next: Tighter integration of modelยทopsยทbusiness team OKRs.
What It Means to Work at ThakiCloud โ Real Stories
โ3 AM, experiment model crashed but rolled back in 5 minutes!โ
Mr. B from the MLOps Platform team says: โThanks to a culture where failures also become assets, Iโm not afraid of experiments. Discarded logs also remain as team knowledge, and the experience of open source PRs directly meeting real traffic is ThakiCloudโs unique charm.โ
Mr. C from the Cloud Infra team recalls the scene of GPUs running in the Saudi desert. โThe experience of directly designing and operating global-scale infrastructure, and collaboration with colleagues becomes the driving force for daily growth.โ
Recruitment Positions
Team | Mission at a Glance |
---|---|
MLOps Platform | Feature Store redesign, Pydantic schema validation automation |
LLMOps R&D | GPT-based log analysis, Self-Healing serving |
Cloud Infra | GPU/NPU hybrid scheduling, Multi-region HA |
Data Engineering | Real-time CDC + Iceberg Lakehouse construction |
Application Method
- GitHub / Tech Blog Link โ Commits are your cover letter.
- Any-format Project โ Jupyter, Dockerfile, Helm charts all welcome.
- Three Vs Experience One Line โ e.g., โModel crashed at 3 AM but rolled back in 5 minutes๐โโ๏ธโ.
Waiting for You to Grow Together
If Velocity ยท Validation ยท Versioning make your heart rate go up, letโs meet at git push origin thakicloud
.
With ThakiCloud, add a new chapter to your career.