Growth at ThakiCloud, and Your Career ๐Ÿš€

โ€œVelocity ยท Validation ยท Versioningโ€ (Three Vs) โ€” If these three words make your heart race, ThakiCloud is exactly your stage. Weโ€™re a place where you can experience real full-stack MLOps with actual traffic, achieving vertical integration from GPU/NPU infrastructure to SaaS. With the power of technology, culture, and colleagues, weโ€™re moving faster, safer, and further.


ThakiCloudโ€™s MLOps, Hereโ€™s How Itโ€™s Different

1. Velocity โ€” From Idea to Production, Before Your Coffee Gets Cold

  • IaaS-PaaS-SaaS Vertical Integration: Mixed deployment of GPUยทNPU in Kubernetes NodePool, zero rescheduling cost when switching from experiment to serving.
  • JupyterHub Image Auto-build: Just push a branch and Helm Chart is immediately deployed to staging cluster.
  • Feature Store-based Experiment UI: Combine dataยทfeature versions with one click, launch new experiments within 15 minutes.

2. Validation โ€” Fail Fast, Metrics in Product Language

  • Shadow Traffic Funnel: Copy 10% of real-time traffic to evaluate new models without user exposure.
  • Click RateยทMAU โ†” ML Metrics Auto-integration: Monitor business KPIs and ML metrics together on Prometheus + Grafana dashboard.
  • Heuristic Safety Layer: Automatically filter predictions with confidence < ฯ„ to protect user experience.

3. Versioning โ€” Time Travel with One Docker Tag Line

  • OCI Model Registry: Manage modelsยทfeaturesยทmetadata as image tags, instant rollback by specifying sha only.
  • Daily Auto-retraining: When data drift is detected, Airflow DAG automatically executes retrainingยทvalidationยทpromotion.
  • Fallback Model: Light model is deployed within 1 second when SLO is violated.

Pain Points We Solved & Next Chapter

  • Dev โ†” Prod Inconsistency โ†’ Unified with single Helm Release.
    Next: Multi-region deployment standardization.
  • Alert Flood โ†’ Noise reduction with Alert tuner bot.
    Next: Automatic analysis of log level root-cause with GPT.
  • Long-tail Bugs โ†’ Reproduction with Feature Slicing debugger.
    Next: Complete reproduction automation based on data synthesis.
  • Slow Deployment โ†’ 30 โ†’ 5 days with Canary + Progressive Delivery.
    Next: Tighter integration of modelยทopsยทbusiness team OKRs.

What It Means to Work at ThakiCloud โ€” Real Stories

โ€œ3 AM, experiment model crashed but rolled back in 5 minutes!โ€

Mr. B from the MLOps Platform team says: โ€œThanks to a culture where failures also become assets, Iโ€™m not afraid of experiments. Discarded logs also remain as team knowledge, and the experience of open source PRs directly meeting real traffic is ThakiCloudโ€™s unique charm.โ€

Mr. C from the Cloud Infra team recalls the scene of GPUs running in the Saudi desert. โ€œThe experience of directly designing and operating global-scale infrastructure, and collaboration with colleagues becomes the driving force for daily growth.โ€


Recruitment Positions

Team Mission at a Glance
MLOps Platform Feature Store redesign, Pydantic schema validation automation
LLMOps R&D GPT-based log analysis, Self-Healing serving
Cloud Infra GPU/NPU hybrid scheduling, Multi-region HA
Data Engineering Real-time CDC + Iceberg Lakehouse construction

Application Method

  1. GitHub / Tech Blog Link โ€” Commits are your cover letter.
  2. Any-format Project โ€” Jupyter, Dockerfile, Helm charts all welcome.
  3. Three Vs Experience One Line โ€” e.g., โ€œModel crashed at 3 AM but rolled back in 5 minutes๐Ÿƒโ€โ™‚๏ธโ€.

Waiting for You to Grow Together

If Velocity ยท Validation ยท Versioning make your heart rate go up, letโ€™s meet at git push origin thakicloud.
With ThakiCloud, add a new chapter to your career.