<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://thakicloud.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://thakicloud.github.io/" rel="alternate" type="text/html" /><updated>2026-06-21T21:17:32+09:00</updated><id>https://thakicloud.github.io/feed.xml</id><title type="html">Thaki Cloud Tech Blog | ThakiCloud | 다키클라우드 기술 블로그</title><subtitle>Thaki Cloud (ThakiCloud, 다키클라우드, thaki cloud, THAKI CLOUD, ثاكي كلاود)는 AI/ML Engineering, LLMOps, DevOps 분야의 최신 기술과 실무 경험을 공유하는 전문 기술 블로그입니다. 머신러닝 모델 운영, 쿠버네티스, 클라우드 인프라, AI 엔지니어링 커리어, 인공지능 기술 블로그, 다키클라우드 개발 팀의 깊이 있는 인사이트를 제공합니다. مدونة تقنية متخصصة في هندسة الذكاء الاصطناعي والحوسبة السحابية.</subtitle><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><entry xml:lang="ar"><title type="html">خفض تكلفة الرموز بنسبة 34-71% عبر الضغط القابل للعكس: تقرير ميداني عن Headroom ونظافة السياق في ThakiCloud</title><link href="https://thakicloud.github.io/ar/dev/headroom-reversible-context-compression/" rel="alternate" type="text/html" title="خفض تكلفة الرموز بنسبة 34-71% عبر الضغط القابل للعكس: تقرير ميداني عن Headroom ونظافة السياق في ThakiCloud" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ar/dev/headroom-reversible-context-compression</id><content type="html" xml:base="https://thakicloud.github.io/ar/dev/headroom-reversible-context-compression/"><![CDATA[<p><img src="/assets/images/headroom-reversible-context-compression-hero.png" alt="صورة تجريدية لتكثّف البيانات" />
<em>السياق ليس مجانيًا. تكثيف الرموز المبعثرة بلا فقدان هو ما يفعله Headroom.</em></p>

<h2 id="نظرة-عامة">نظرة عامة</h2>

<p>أي فريق يشغّل وكلاء برمجة بالذكاء الاصطناعي يوميًا يعرف من أين تأتي أكبر تكلفة خفية. إنها السياق. تتراكم مخرجات الأدوات ونتائج RAG والسجلات والملفات وتاريخ المحادثة في كل دور، وتتحول تلك الرموز إلى الفاتورة. في سير العمل متعدد الوكلاء تنمو هذه التكلفة لا خطيًا بل تضاعفيًا، لأنه في كل مرة يُسقط فيها وكيل فرعي مخرَج JSON كبيرًا في السياق، تنمو رموز قراءة الذاكرة المؤقتة معه.</p>

<p>هذه المقالة ليست مجرد تعريف بأداة. تشغّل ThakiCloud بالفعل Headroom ضمن سلسلة أدواتها الإنتاجية، وهذه المرة سحبنا ثلاثة مخرجات أدوات JSON حقيقية من مستودعنا وشغّلنا Headroom عليها مباشرةً. نوثّق أمر التثبيت وكود الدمج والأرقام المقاسة لخفض الرموز بصيغة قابلة لإعادة الإنتاج. الخلاصة المختصرة: كلما زاد التكرار في بنية JSON زاد التوفير، وعلى بياناتنا بلغ خفض الرموز 71.2%. كل رقم قِيس في بيئة معزولة حقيقية دون خلط أي تقديرات.</p>

<h2 id="ما-هو-headroom">ما هو Headroom</h2>

<p>Headroom (اسم الحزمة على PyPI هو <code class="language-plaintext highlighter-rouge">headroom-ai</code>، وعلى GitHub <code class="language-plaintext highlighter-rouge">chopratejas/headroom</code>) أداة ضغط سياق فتح مصدرها المهندس السابق في Netflix، Tejas Chopra. هدفها المعلن واضح: ضغط مخرجات الأدوات والسجلات والملفات وأجزاء RAG قبل وصولها إلى نموذج LLM، لخفض الرموز مع الإبقاء على الإجابة كما هي.</p>

<p>معظم أدوات تقليل السياق الحالية غير قابلة للعكس. بمجرد القطع لا يمكنك استعادة الأصل. ميزة Headroom الجوهرية أنها تعمل محليًا وتغطي أنواع محتوى متعددة وقابلة للعكس. يمكن استعادة الأصل ضمن مدة صلاحية (TTL) محددة عبر تجزئات تتبّع. هذا يمنع بنيويًا الفشل التقليدي: “ضغطنا فضاع التفصيل لدى الوكيل.” يمكنك العمل على النسخة المضغوطة افتراضيًا واستعادة الأصل فقط عند الحاجة لقسم محدد.</p>

<p>هناك ثلاث طرق للربط: كمكتبة تستدعيها مباشرة، أو كوكيل (proxy)، أو كخادم MCP. تتعرف على نوع المحتوى وتضغط انتقائيًا، فتبقي على القيم الشاذة فقط في JSON أو على أسطر الفشل فقط في السجلات.</p>

<h3 id="البنية-الداخلية-smartcrusher-هو-الجوهر">البنية الداخلية: SmartCrusher هو الجوهر</h3>

<p>يوجّه Headroom إلى ضاغط مختلف لكل نوع محتوى. في هذه التجربة ظهرت التحويلات الفعلية في سجل الموجّه على هيئة <code class="language-plaintext highlighter-rouge">router:protected:user_message</code> و<code class="language-plaintext highlighter-rouge">router:mixed:...</code>، أي أنه يحمي رسالة المستخدم ويضغط فقط حمولة JSON في رسائل الأدوات.</p>

<ul>
  <li><strong>SmartCrusher</strong>: ضاغط JSON عام يتعامل مع مصفوفات القواميس والكائنات المتداخلة والأنواع المختلطة. لمخرجات أدوات JSON المتكررة (نتائج البحث، صفوف السجلات، قوائم السجلات) يطوي المفاتيح المكررة ويستنتج المخطط ليختصر بشكل حتمي. وقد تحمّل معظم التوفير في قياسنا.</li>
  <li><strong>ضاغط الكود</strong>: ضغط شيفرة المصدر بوعي بنيوي.</li>
  <li><strong>ضغط الصور</strong>: حمولات الصور تُختصر أيضًا.</li>
</ul>

<p>المخطط أدناه هو تدفق البيانات الذي رصدناه. يمر مخرج الأداة عبر الموجّه إلى SmartCrusher، وبينما يذهب السياق المضغوط إلى استدعاء LLM، يُحفظ الأصل منفصلًا للاستعادة القابلة للعكس عند الحاجة.</p>

<p><img src="/assets/images/headroom-reversible-context-compression-diagram.png" alt="مخطط مسار Headroom" />
<em>مخرج الأداة ← موجّه نوع المحتوى ← SmartCrusher ← سياق مضغوط ← LLM. يُحفظ الأصل بتجزئة تتبّع ومدة صلاحية للإبقاء على مسار استعادة قابل للعكس. (التسميات في الصورة المعروضة بالكورية.)</em></p>

<h2 id="التثبيت-والدمج">التثبيت والدمج</h2>

<p>وقت تشغيل Python لدينا موحّد في مفسّر واحد (3.12.8) داخل <code class="language-plaintext highlighter-rouge">.venv</code>. التثبيت سطر واحد.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">VIRTUAL_ENV</span><span class="o">=</span><span class="s2">"</span><span class="nv">$PWD</span><span class="s2">/.venv"</span> uv pip <span class="nb">install</span> <span class="s2">"headroom-ai[code,relevance]"</span>
</code></pre></div></div>

<p>تُفعّل الإضافة <code class="language-plaintext highlighter-rouge">[code,relevance]</code> الضغط الواعي ببنية الكود والترشيح المبني على الصلة. الضغط الدلالي للنص العادي يحتاج نموذجًا إضافيًا (نحو 261 ميجابايت)، لكن مسار JSON الأعلى تأثيرًا يعمل بهذا التثبيت الأساسي وحده.</p>

<p>أبسط دمج هو تمرير قائمة رسائل مباشرة. جوهر الغلاف الذي نستخدمه فعليًا (<code class="language-plaintext highlighter-rouge">scripts/headroom_compress.py</code>) أدناه. ضع مخرج الأداة محتوى لرسالة بدور <code class="language-plaintext highlighter-rouge">tool</code> واستدعِ <code class="language-plaintext highlighter-rouge">compress</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">headroom</span> <span class="kn">import</span> <span class="n">compress</span>

<span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Summarize this tool output</span><span class="sh">"</span><span class="p">},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
     <span class="sh">"</span><span class="s">tool_calls</span><span class="sh">"</span><span class="p">:</span> <span class="p">[{</span><span class="sh">"</span><span class="s">id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">c1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">,</span>
                     <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">arguments</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">{}</span><span class="sh">"</span><span class="p">}}]},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">tool_call_id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">c1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">raw_json_string</span><span class="p">},</span>
<span class="p">]</span>

<span class="n">result</span> <span class="o">=</span> <span class="nf">compress</span><span class="p">(</span><span class="n">messages</span><span class="p">,</span> <span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">claude-sonnet-4-5-20250929</span><span class="sh">"</span><span class="p">)</span>
<span class="n">compressed</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">messages</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">tokens_before</span><span class="p">,</span> <span class="sh">"</span><span class="s">-&gt;</span><span class="sh">"</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">tokens_after</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">transforms_applied</span><span class="p">)</span>
</code></pre></div></div>

<p>يحمل الكائن الذي يعيده <code class="language-plaintext highlighter-rouge">compress</code> الحقول <code class="language-plaintext highlighter-rouge">tokens_before</code> و<code class="language-plaintext highlighter-rouge">tokens_after</code> و<code class="language-plaintext highlighter-rouge">transforms_applied</code>، فيتحقق الكود لاحقًا مما فعله الضغط فعليًا. الجوهر أن هذه قيم قاستها المكتبة لا أرقام أبلغ عنها النموذج ذاتيًا. وفوق ذلك تحققنا مرة أخرى بمُرمِّز منفصل (tiktoken).</p>

<h2 id="نتائج-التجربة-الفعلية">نتائج التجربة الفعلية</h2>

<p>جرت التجربة في بيئة معزولة عبر git worktree. لا تمس هذه البنية شجرة العمل الرئيسية وتبقي النتائج فقط في دليل أدلة. بيانات الاختبار ثلاثة من مخرجات مستودعنا الحقيقية ذات بنية JSON متكررة بوضوح.</p>

<ol>
  <li><strong>skill_index.json</strong>: فهرس BM25 لبحث المهارات. تتكرر سجلات بمخطط متطابق على نطاق واسع.</li>
  <li><strong>seedance-prompts/raw-prompts.json</strong>: كتالوج من 605 موجّهات. نص اللغة الطبيعية هو الحصة الغالبة.</li>
  <li><strong>أرشيف خط زمني twitter</strong>: 1385 سجلًا زمنيًا. مصفوفة كائنات ببنية مفاتيح متطابقة.</li>
</ol>

<p>قِيست أعداد الرموز بمُرمِّز <code class="language-plaintext highlighter-rouge">cl100k_base</code>. سجّلنا البايتات والرموز معًا لأن الضغط يُحكم عليه لا بتوفير البايتات الخام بل بمقدار فائدته في وحدة الفوترة الفعلية، أي الرمز. النتائج أدناه.</p>

<table>
  <thead>
    <tr>
      <th>بيانات الاختبار</th>
      <th>الرموز الأصلية</th>
      <th>الرموز بعد الضغط</th>
      <th>خفض الرموز</th>
      <th>خفض البايتات</th>
      <th>الزمن</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>skill_index (فهرس BM25)</td>
      <td>1,618,287</td>
      <td>465,445</td>
      <td><strong>71.2%</strong></td>
      <td>64.9%</td>
      <td>2.08s</td>
    </tr>
    <tr>
      <td>twitter-timeline (مصفوفة سجلات)</td>
      <td>399,926</td>
      <td>192,465</td>
      <td><strong>51.9%</strong></td>
      <td>57.0%</td>
      <td>0.24s</td>
    </tr>
    <tr>
      <td>seedance-prompts (كتالوج موجّهات)</td>
      <td>1,085,592</td>
      <td>713,210</td>
      <td><strong>34.3%</strong></td>
      <td>38.5%</td>
      <td>0.57s</td>
    </tr>
  </tbody>
</table>

<p><img src="/assets/images/headroom-reversible-context-compression-results.png" alt="مخطط الضغط المقاس" />
<em>نسب الخفض المقاسة لثلاثة مخرجات أدوات JSON من مستودع ThakiCloud. البايتات والرموز معروضة معًا.</em></p>

<p>طريقة قراءة الأرقام مهمة. <strong>كلما زاد تكرار البنية زاد التوفير.</strong> skill_index فهرس لسجلات متطابقة المخطط مكتظة، فبلغ طي المفاتيح في SmartCrusher ذروته وخفض الرموز 71.2% كاملة. وخط twitter الزمني، وهو أيضًا مصفوفة كائنات منتظمة، اختُصر بأكثر من النصف. في المقابل seedance-prompts، حيث يشكّل نص الموجّهات الطبيعي معظم كل سجل، لم يكن أمامه إلا مجال ضيق للاختصار البنيوي فاستقر عند 34.3%. هذا الفرق يبرهن مباشرةً على نية التصميم بأن “JSON هو حيث يعمل أفضل ما يكون.”</p>

<p>التوقيت جدير بالملاحظة أيضًا. عالج فهرسًا من 1.6 مليون رمز في ثانيتين، والبقية في أقل من ثانية. هذا سريع بما يكفي لإدراجه قبل دخول مخرج الأداة إلى السياق مباشرةً دون تأخّر محسوس تقريبًا. ولأن الضغط حتمي، يعطي المدخل نفسه دائمًا المخرج نفسه، وهو أيضًا صديق للذاكرة المؤقتة.</p>

<p>تنويه أمين واحد. الأرقام أعلاه قياسات لتشغيل مفرد على ثلاث مجموعات بيانات. على أنواع JSON أخرى، خاصةً البيانات ذات القيم الفريدة غالبًا والمفاتيح المكررة القليلة، قد يكون الخفض أقل. ومع ذلك، ضمن نطاق قياسنا، يُعد مدى خفض رموز من 34 إلى 71% نتيجة ذات معنى واضح، على الأقل لمخرجات الأدوات ذات البنية المتكررة.</p>

<h2 id="التطبيق-على-منصة-thakicloud-لخدمات-aiml-على-k8s">التطبيق على منصة ThakiCloud لخدمات AI/ML على K8s</h2>

<p>النقطة التي اعتمدنا فيها Headroom هي بالضبط ما تُظهره التجربة أعلاه: <strong>مخرجات أدوات JSON الكبيرة ذات البنية المتكررة.</strong> قاعدة نظافة السياق لدينا (<code class="language-plaintext highlighter-rouge">ecc-token-strategy</code>) تنص على ذلك: مخرجات مصفوفات JSON المتكررة تُضغط حتميًا بـ SmartCrusher قبل دخول السياق، والنص العادي ليس هدفًا بل JSON هو الهدف، والأولوية تلخيص الوكيل الفرعي أولًا ثم ضغط headroom.</p>

<p>سبب أهمية هذا الشديدة في تنسيق وكلاء متعدد على K8s هو بنية التكلفة. في سير عمل تعمل فيه وكلاء فرعيون كثر، تعني نظافة السياق ثلاثة أمور دفعة واحدة. الأول التحكم في تكلفة الرموز. الثاني إدارة معدل إصابة الذاكرة المؤقتة؛ فالضغط الحتمي يضمن مخرجًا متطابقًا لمدخل متطابق فلا يكسر ذاكرة الموجّهات المؤقتة. الثالث إدارة زمن الاستجابة؛ فكلما صغر السياق أسرع النموذج في الرد.</p>

<p>تجدول خدمة LLM لدينا أعباء GPU عبر Kueue فوق K8s، وتتدفق طلبات استدلال كثيرة بالتوازي. في هذه البيئة، السياق المتضخم يكلّف أكثر من طلب واحد؛ فهو يلتهم الإنتاجية الكلية. يتيح Headroom إدراج هذه الطبقة دون أي تغيير في الكود تقريبًا. نضغط مصفوفة نتائج بحث أو سجلات بسطر واحد قبل دخولها السياق مباشرةً، ونستعيد بشكل قابل للعكس فقط عند الحاجة لقسم محدد.</p>

<p>وهو عملي من منظور عالِم البيانات أيضًا. في مسار RAG حيث تأتي الأجزاء المسترجعة محمّلة ببيانات وصفية متكررة (مفاتيح متطابقة مثل عنوان المصدر والطابع الزمني والدرجة)، فإن تلك البيانات الوصفية هي تحديدًا المنطقة التي يختصرها SmartCrusher أفضل اختصار. ولأنه يحفظ المتن ويختصر العبء البنيوي فقط، تؤمّن ميزانية سياق دون التضحية بدقة الاسترجاع.</p>

<h2 id="القيود-والاعتراضات">القيود والاعتراضات</h2>

<p>لا نوصي بهذه الأداة دون نقد. إليك القيود والاعتراضات بأمانة.</p>

<p><strong>أولًا، التنفيذ المحلي شرط مسبق.</strong> يحتاج Headroom إلى تشغيل عملية محلية، فلا يصلح في بيئات تنفيذ معزولة بالكامل. هناك أشكال نشر لا يلائمها هذا القيد.</p>

<p><strong>ثانيًا، التأثير على النص العادي محدود.</strong> كما تُظهر نتيجة seedance-prompts، فالبيانات ذات الحصة العالية من نص اللغة الطبيعية أمامها مجال ضيق للاختصار البنيوي. اختصار النص العادي دلاليًا يتطلب نموذجًا إضافيًا، وذلك المسار يتنازل عن بعض الحتمية والسرعة.</p>

<p><strong>ثالثًا، قد يكون مبالغًا فيه للفرق ذات المزوّد الواحد.</strong> إن كفى ضغط المزوّد الأصلي لنموذج واحد ولم تحتج ذاكرة عبر الوكلاء، فقد يفوق عبء تشغيل طبقة ضغط منفصلة المكسب.</p>

<p><strong>رابعًا، أقوى اعتراض هو “ألا يكفي التلخيص بوكيل فرعي؟”</strong> في الواقع قاعدتنا نفسها تقدّم تلخيص الوكيل الفرعي على ضغط headroom. التلخيص غير قابل للعكس لكنه يختصر أكثر بكثير ويضغط بالمعنى. فأين يقع Headroom إذن؟ الجواب “حين يفقد التلخيص تفاصيل قد تُحتاج لاحقًا.” القابلية للعكس تسد هذه الفجوة تحديدًا. تعمل على النسخة المضغوطة عادةً، وفي لحظة احتياجك لأصل سجل محدد، تستعيده بدقة ضمن مدة الصلاحية. التلخيص والضغط ليسا متنافسين بل متكاملين.</p>

<p>باختصار، يجسّد Headroom مبدأ “السياق ليس مجانيًا” بتصميم ملموس هو الضغط القابل للعكس. ضمن نطاق قياسنا خفض الرموز 34-71% على JSON المتكرر البنية، وبفضل الحتمية والقابلية للعكس لم يكسر الذاكرة المؤقتة ولم يفقد التفاصيل. إن كنت مهندسًا يهتم بكيفية تعامل ThakiCloud مع نظافة السياق كمشكلة تكلفة وموثوقية، فنحن المكان الذي يشغّل هذه الطبقة في الإنتاج.</p>

<hr />

<p>المصادر: Headroom (headroom-ai)، PyPI https://pypi.org/project/headroom-ai/ · GitHub https://github.com/chopratejas/headroom (المؤلف Tejas Chopra). الأرقام في هذه المقالة مقاسة مباشرةً على بيانات مستودع ThakiCloud.</p>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="headroom" /><category term="context-compression" /><category term="token-cost" /><category term="llm-serving" /><category term="rag" /><category term="mcp" /><summary type="html"><![CDATA[أكبر تكلفة خفية لوكيل البرمجة بالذكاء الاصطناعي هي السياق. شغّلنا Headroom (headroom-ai) مباشرةً على ثلاثة مخرجات أدوات JSON حقيقية من مستودع ThakiCloud وقِسنا خفض الرموز. نوضّح كيف خفض SmartCrusher الرموز بنسبة تصل إلى 71.2% بضغط قابل للعكس وبلا فقدان، من أمر التثبيت حتى الأرقام المقاسة.]]></summary></entry><entry xml:lang="ar"><title type="html">أشكال ومراجعة بمستوى Nature عبر الكود: تقرير عمودي أكاديمي بعد تشغيل nature-skills فعليًا</title><link href="https://thakicloud.github.io/ar/dev/nature-skills-academic-figure-polishing/" rel="alternate" type="text/html" title="أشكال ومراجعة بمستوى Nature عبر الكود: تقرير عمودي أكاديمي بعد تشغيل nature-skills فعليًا" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ar/dev/nature-skills-academic-figure-polishing</id><content type="html" xml:base="https://thakicloud.github.io/ar/dev/nature-skills-academic-figure-polishing/"><![CDATA[<p><img src="/assets/images/nature-skills-hero.png" alt="صورة تجريدية لمنحنيات بيانات متعددة اللوحات وألواح أشكال تطفو بأجواء أكاديمية" />
<em>تجسّد أجواء مهارة الأشكال الأكاديمية التي تتعامل مع الشكل بوصفه “حجة بصرية” لا مجرد “رسم جميل”.</em></p>

<h2 id="نظرة-عامة">نظرة عامة</h2>

<p>أكثر مهمتين يطلبهما الباحثون من Claude Code هما “أنشئ لي شكلًا يدخل في الورقة البحثية” و”حسّن هذه المسودة الإنجليزية لتصل إلى مستوى المجلة”. وكلتاهما، إذا أُسندتا إلى نموذج لغوي عام، تأتيان بنتائج متذبذبة في كل مرة. فالأشكال تكون أحجام خطوطها وألوانها عشوائية، والمراجعة تغيّر الجمل دون قواعد. تهدف حزمة skills مفتوحة المصدر nature-skills (Yuan1z0825/nature-skills) إلى تحجيم هذا التذبذب ضمن هيكل مُتحقَّق منه.</p>

<p>ومع انتشارها، قدّمتها بعض المنشورات بأنها حازت “أكثر من 20 ألف نجمة على GitHub”، لكن الرقم الفعلي الذي تحققتُ منه كان أصغر بكثير، نحو 265 فقط [تقديري]. وبما أن تضخيم عدد النجوم أمر شائع، فقد قيّمتُ في هذا المقال القيمة بنتائج القياس الفعلي بعد تشغيل الأداة بنفسي، لا بعدد النجوم. هذا تقرير تنفيذي قمتُ فيه باستنساخ nature-skills إلى بيئة ThakiCloud، ثم عرضتُ بيانات خدمة فعلية في شكل بمستوى جاهز للنشر باستخدام مهارة nature-figure بداخلها.</p>

<h2 id="ما-هي-هذه-الأداة">ما هي هذه الأداة</h2>

<p>التكوين الفعلي الذي تأكدت منه بعد استنساخ المستودع كان 12 مهارة (باستثناء الوحدات المشتركة) تحت <code class="language-plaintext highlighter-rouge">skills/</code>. فهي تغطي كامل سير العمل الأكاديمي: nature-figure (الأشكال العلمية)، nature-polishing (المراجعة الأكاديمية)، nature-academic-search (البحث في المراجع)، nature-citation، nature-reviewer، nature-response (الرد على المراجِعين) وغيرها. والترخيص هو MIT.</p>

<p>بطلة هذا المقال <strong>nature-figure هي الإصدار 2.0.0</strong>، وتمتلك بنية موجِّه (router) مقسّمة إلى طبقة ثابتة وطبقة ديناميكية. تضع المعرفة الكبيرة في التصميم وAPI والأنماط وQA في ملفات مرجعية عند الطلب، وتكتشف في كل مهمة الواجهة الخلفية (Python/R) لتحمّل فقط الأجزاء اللازمة. وهذا هو بالضبط نفس نمط الكشف التدريجي (progressive disclosure) الذي تؤكد عليه ThakiCloud.</p>

<p>أكثر التصميمات إثارة للإعجاب هو <strong>“عقد الشكل (figure contract)”</strong>. فهو يفرض، قبل كتابة أي كود، تثبيت جملة واحدة للاستنتاج الجوهري، وسلسلة الأدلة، وتصنيف النمط الأصلي (archetype)، والواجهة الخلفية، وعقد المجلة/التصدير أولًا. وتؤكد المهارة بحزم أن “الشكل حجة بصرية لا رسم جميل معزول”. كما تجعل اختيار الواجهة الخلفية <strong>بوابة حاجبة (blocking gate)</strong>؛ فإذا لم يحدد المستخدم Python أم R، تسأل “Python or R?” ثم تتوقف. وهذا تقليص لدرجة الحرية كي لا يختار النموذج قيمة افتراضية اعتباطيًا.</p>

<p><img src="/assets/images/nature-skills-diagram.png" alt="مخطط توجيه nature-figure من Figure Contract إلى بوابة الواجهة الخلفية وعقد QA" />
<em>مسار يبدأ بتحديد الاستنتاج الجوهري ثم اجتياز بوابة الواجهة الخلفية Python/R، ثم تطبيق rcParams وPALETTE لتصدير SVG/TIFF قابل للتحرير، وينتهي بعقد QA.</em></p>

<h2 id="التثبيت-والتكامل-أوامر-حقيقية">التثبيت والتكامل (أوامر حقيقية)</h2>

<p>جرى التحقق في صندوق رمل معزول خارج المستودع، ثم نُظّف بعد ذلك.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1) استنساخ المستودع الخارجي</span>
git clone <span class="nt">--depth</span> 1 https://github.com/Yuan1z0825/nature-skills

<span class="c"># 2) التحقق من اعتماديات الواجهة الخلفية Python (الـ .venv المشترك)</span>
.venv/bin/python <span class="nt">-c</span> <span class="s2">"import matplotlib; print(matplotlib.__version__)"</span>
<span class="c"># matplotlib 3.11.0</span>
</code></pre></div></div>

<p>تتضمن البداية السريعة لـ Python في nature-figure (<code class="language-plaintext highlighter-rouge">static/fragments/backend/python.md</code>) قيم <code class="language-plaintext highlighter-rouge">rcParams</code> لشكل بمستوى جاهز للنشر، ويُعرّف <code class="language-plaintext highlighter-rouge">references/api.md</code> لوحة ألوان PALETTE ملائمة للمجلات. والإعدادات الأساسية كالتالي.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mpl</span><span class="p">.</span><span class="n">rcParams</span><span class="p">.</span><span class="nf">update</span><span class="p">({</span>
    <span class="sh">"</span><span class="s">font.family</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">sans-serif</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">font.sans-serif</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Arial</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Helvetica</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">DejaVu Sans</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">sans-serif</span><span class="sh">"</span><span class="p">],</span>
    <span class="sh">"</span><span class="s">svg.fonttype</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">none</span><span class="sh">"</span><span class="p">,</span>   <span class="c1"># SVG 안의 텍스트를 편집 가능하게 유지
</span>    <span class="sh">"</span><span class="s">pdf.fonttype</span><span class="sh">"</span><span class="p">:</span> <span class="mi">42</span><span class="p">,</span>       <span class="c1"># PDF 안의 텍스트도 편집 가능한 TrueType
</span>    <span class="sh">"</span><span class="s">font.size</span><span class="sh">"</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span>           <span class="c1"># 슬라이드용 대형 패널이 아니면 7pt 기준
</span>    <span class="sh">"</span><span class="s">axes.linewidth</span><span class="sh">"</span><span class="p">:</span> <span class="mf">0.8</span><span class="p">,</span>
<span class="p">})</span>
<span class="c1"># api.md PALETTE 발췌
</span><span class="n">P</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">blue_main</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#0F4D92</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">red_strong</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#B64342</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">neutral_dark</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#4D4D4D</span><span class="sh">"</span><span class="p">}</span>
</code></pre></div></div>

<p>السطر <code class="language-plaintext highlighter-rouge">svg.fonttype: "none"</code> هو جوهر الأمر. فالتصدير المعتاد يحوّل النص إلى مسارات (path)، مما يجعل تحرير الأحرف من جديد في Illustrator مستحيلًا. أما هذا الإعداد فيُبقي النص بوصفه وسوم <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code>، بحيث يمكن تعديل التسميات كما هي في مرحلة تدقيق المجلة.</p>

<h2 id="نتائج-التجربة-الفعلية">نتائج التجربة الفعلية</h2>

<p>طبّقتُ قواعد المهارة (rcParams، PALETTE) كما هي، وعرضتُ بيانات ذات صلة مباشرة بـ ThakiCloud في شكل. الموضوع شكل من لوحتين يقارن بين FP16 وINT8 من حيث زمن الاستجابة (latency) والإنتاجية (throughput) وفق حجم الدفعة في خدمة الاستدلال على GPU. أما قيم منحنيات الخدمة في الرسم فهي توضيحية (schematic)، بينما <strong>القيم المقيسة الفعلية هي القيم الوصفية الملتقطة أثناء عملية العرض</strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>RENDER_MS=195.4
SVG_BYTES=24131
PNG_BYTES=254233          # 600 dpi
SVG_EDITABLE_TEXT_TAGS=36
PANELS=2 (a:latency, b:throughput)
RCPARAMS_FONT_SIZE=7.0
SVG_FONTTYPE=none
</code></pre></div></div>

<p>النتائج الأساسية ثلاث. أولًا، انتهى عرض الشكل ذي اللوحتين خلال نحو 195 ملّي ثانية. ثانيًا، كان حجم PNG بدقة 600dpi نحو 254 كيلوبايت، وSVG نحو 24 كيلوبايت، أي خفيف. ثالثًا، وهو أهم تحقق، <strong>كان داخل SVG المُولّد 36 وسم <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code></strong>. وهذا دليل مباشر على أن “النص القابل للتحرير” الذي وعدت به المهارة قد تحقق فعلًا. فلو جرى التحويل إلى مسارات لكان عدد وسوم <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> صفرًا.</p>

<p><img src="/assets/images/nature-skills-results.png" alt="شكل من لوحتين بأسلوب Nature يقارن زمن الاستجابة والإنتاجية بين FP16 وINT8" />
<em>ناتج فعلي معروض بتطبيق rcParams وPALETTE الخاصة بـ nature-figure. اليسار (a) يبيّن زمن الاستجابة حسب حجم الدفعة، واليمين (b) يبيّن الإنتاجية. قيم منحنيات الخدمة بيانات توضيحية.</em></p>

<p>كل هذه القيم التقطتُها بنفسي عبر stdout بعد تشغيلها مباشرة، وليست اقتباسًا خارجيًا. والجوهر أن المهارة تثبت الجودة بدليل تنفيذي، لا بادّعاء نثري بأنها “رسمت بشكل جميل”.</p>

<h2 id="التطبيق-والدلالات-لمنصة-thakicloud-k8s-aiml-saas">التطبيق والدلالات لمنصة ThakiCloud K8s AI/ML SaaS</h2>

<p>تُظهر nature-skills نسيجين في آنٍ واحد.</p>

<p>من منظور ممارسة علم البيانات، فإن فكرة <strong>تثبيت نمط الرسوم البيانية برموز مُتحقَّق منها (tokens)</strong> مفيدة فورًا. فتقارير ThakiCloud ولوحاتها تتذبذب فيها الألوان والخطوط والمحاور في كل مرة، لكن تثبيت rcParams وPALETTE في مكان واحد كما في nature-figure يرفع متوسط الجودة. وعلى وجه الخصوص، فإن نمط تصدير SVG قابل للتحرير عبر <code class="language-plaintext highlighter-rouge">svg.fonttype: "none"</code> يمكن استخدامه كما هو في المواد التسويقية والندوات التي يعالجها فريق التصميم لاحقًا. وشكل النتائج في هذا المقال هو الدليل.</p>

<p>ومن منظور استراتيجية المنصة، تُظهر nature-skills <strong>إشارة على ملاءمة المنتج للسوق (PMF) في العمود الأكاديمي</strong>. فهي ليست مهارة عامة، بل كثّفت القواعد في استخدام ضيق وعميق هو “التقديم لمجلة Nature”، ومن ثَمّ يرتفع اتساق النتائج. وبالنسبة إلى ThakiCloud التي تشغّل AI/ML SaaS على K8s، تُعدّ المهارة العمودية التي تضع قواعد المجال طبقةً رقيقة فوق نموذج لغوي عام نمطًا جوهريًا للتمايز. ويمكن نسخ الهيكل نفسه إلى أعمدة داخلية مثل الطب والمالية وبراءات الاختراع.</p>

<h2 id="القيود-والحجج-المضادة">القيود والحجج المضادة</h2>

<p>أولًا، <strong>تضخيم عدد النجوم</strong>. اختلف عدد “أكثر من 20 ألف نجمة” في بعض المنشورات اختلافًا كبيرًا عن الواقع (نحو 265) [تقديري]. ويؤكد هذا المثال مجددًا الحاجة إلى إجراء يتمثل في التشغيل بنفسك بدلًا من الثقة المباشرة بالإشارات الفيروسية.</p>

<p>ثانيًا، <strong>مسؤولية صحة بيانات الشكل تقع على المستخدم.</strong> فالمهارة ترسم الشكل جيدًا، لكنها لا تضمن دقة الأرقام التي توضع فيه. ولهذا السبب نفسه حدّدتُ منحنيات الخدمة في هذا المقال بوصفها أمثلة. وفي الأوراق أو التقارير الحقيقية يجب إدخال القيم المقيسة فقط.</p>

<p>ثالثًا، قد تكون <strong>إلزامية بوابة الواجهة الخلفية</strong> عائقًا في خطوط الأنابيب الآلية. فسلوك السؤال “Python or R?” والتوقف في كل مرة هو صمام أمان في الوضع التحاوري، لكنه يحتاج في الدفعات غير المراقَبة إلى غلاف يثبّت الواجهة الخلفية مسبقًا.</p>

<p>وخلاصة القول، تُعدّ nature-skills مثالًا جيدًا على “المهارة العمودية التي تكثّف قواعد المجال في كود”. وعندما نحكم على القيمة بأدلة قياس فعلية مثل 36 وسم نص قابل للتحرير، لا بعدد النجوم، فإن تصميمها يستحق التعلّم منه بحق.</p>

<h2 id="المصادر">المصادر</h2>

<ul>
  <li>nature-skills (GitHub, MIT): <a href="https://github.com/Yuan1z0825/nature-skills">github.com/Yuan1z0825/nature-skills</a></li>
  <li>جميع القيم المقيسة في هذا المقال هي قيم عُرضت محليًا بعد استنساخ nature-figure v2.0.0 مباشرة. وعدد النجوم (نحو 265) تقدير بحسب البحث.</li>
</ul>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="claude-skills" /><category term="academic-writing" /><category term="matplotlib" /><category term="data-visualization" /><category term="nature-figure" /><category term="skill-marketplace" /><summary type="html"><![CDATA[قمنا باستنساخ حزمة skills مفتوحة المصدر لـ Claude باسم nature-skills، التي تجمع بين توليد الأشكال العلمية والمراجعة الأكاديمية وفق معايير مجلة Nature، ثم استخدمنا nature-figure لعرض بيانات خدمة ThakiCloud في شكل من لوحتين بمستوى جاهز للنشر. قِسنا فعليًا حتى 36 وسم نص قابل للتحرير في SVG، ولخّصنا الدلالات من منظور الملاءمة العمودية للمنتج في سوق الـ skills.]]></summary></entry><entry xml:lang="ar"><title type="html">ميتا-سكيل تتعامل مع المهارات ‘كأنها برمجيات’: تقرير تحقّق مباشر من yao-meta-skill v1.1.0</title><link href="https://thakicloud.github.io/ar/dev/yao-meta-skill-engineering-governance/" rel="alternate" type="text/html" title="ميتا-سكيل تتعامل مع المهارات ‘كأنها برمجيات’: تقرير تحقّق مباشر من yao-meta-skill v1.1.0" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ar/dev/yao-meta-skill-engineering-governance</id><content type="html" xml:base="https://thakicloud.github.io/ar/dev/yao-meta-skill-engineering-governance/"><![CDATA[<p><img src="/assets/images/yao-meta-skill-hero.png" alt="صورة تجريدية لكتل وحدوية تُشكّل خط تجميع دقيقًا مع بوابات حوكمة متوهّجة" />
<em>رسم مفاهيمي للميتا-سكيل التي تتعامل مع المهارة لا كموجّه لمرّة واحدة، بل كـ”أصل قابل لإعادة الاستخدام” مرفق بالإصدار والتحقّق والحوكمة.</em></p>

<h2 id="نظرة-عامة">نظرة عامة</h2>

<p>في بيئات الوكلاء مثل Claude Code وCursor وCodex CLI، لم تعد المهارة (Skill) مجرّد مجموعة من الموجّهات. إنها أقرب إلى منتج قدرات يغلّف العمل المتكرّر لإعادة استخدامه عبر عدّة أُطُر تشغيل (harness). لكن كلّما تكاثرت المهارات، كبرت في الوقت نفسه ثلاث مشكلات: تباين الجودة، وتصادم المُحفِّزات (triggers)، وتكلفة السياق. ومشروع yao-meta-skill مفتوح المصدر — الذي صار حديث الناس بعد أن أوصى به المؤثّر الصيني @vista8 (نحو 113K متابع) بوصفه “أقوى من Skill-creator الرسمية من Anthropic” — يستهدف هذه النقطة بالذات.</p>

<p>اسم YAO اختصار لـ “Yielding AI Outcomes”، ويصف المستودع نفسه بأنه “نظام صارم للهندسة والتقييم والحوكمة وقابلية النقل لمهارات الوكلاء القابلة لإعادة الاستخدام”. ولم آخذ هذا الادّعاء كما هو، بل استنسخته مباشرةً في بيئة عمل ThakiCloud ثم شغّلت فعليًّا بوابات التحقّق المحلية التي يوفّرها المستودع. هذا المقال تقرير تنفيذ يفكّك بنية yao-meta-skill انطلاقًا من تلك النتائج المقيسة، ويتأمّل ما يمكن استعارته من منظور تشغيل <code class="language-plaintext highlighter-rouge">.claude/skills</code> الداخلي.</p>

<h2 id="ما-هي-هذه-الأداة">ما هي هذه الأداة</h2>

<p>yao-meta-skill هي “مهارة تصنع مهارات”، أي ميتا-سكيل. تأخذ العمل المتكرّر — مثل ملاحظات سير العمل، ومجموعات الموجّهات، ونصوص المحادثات، وكتب التشغيل (runbooks)، وأنماط المستندات — مُدخَلًا، وتحوّله إلى حزمة مهارة قابلة للتحقّق. ويتلخّص تصميمها الجوهري في ثلاثة أعمدة.</p>

<p>أولًا، <strong>Skill IR (التمثيل الوسيط — Intermediate Representation)</strong>. تُوصَف أولًا النيّة والمُحفِّزات والمُدخَلات والمُخرَجات والحدود (boundaries) والمراجع والمخرجات المتوقّعة في تمثيل وسيط محايد للمنصّات. ثم تحوّل المُصرِّفات (compilers) والمحوّلات (adapters) المستهدفة هذا الـ IR إلى خمسة أهداف: OpenAI وClaude والمهارات العامة للوكلاء والحزم المتوافقة مع Agent-Skills وسير العمل الموجَّه نحو VS Code. وفكرة وصف المهارة مرّة واحدة وتصريفها إلى بيئات متعدّدة تستهدف بدقّة عبء إدارة المهارة نفسها مرّتين عبر Claude Code وCursor داخليًّا.</p>

<p>ثانيًا، <strong>Output Eval Lab</strong>. وهي طبقة تتحقّق من جودة مخرجات المهارة بالبيانات: فحص المُحفِّزات، وتأكيدات المخرجات (assertions)، وأدلّة التنفيذ، وأدلّة الزمن والرموز (tokens)، وقابلية إعادة إنتاج القياس المرجعي (benchmark)، وحُزَم المراجعة المُعمّاة. وما يلفت النظر أن البنية تجعل الكود يتحقّق فعليًّا، بدلًا من أن يدّعي النموذج “أن الأمر نجح”.</p>

<p>ثالثًا، <strong>Review Studio 2.0</strong>. تجمع النيّة والمُحفِّزات وتقييم المخرجات وتكلفة السياق وفحوص وقت التشغيل وأدلّة الإصدار في صفحة بوابة HTML واحدة. إنها بوابة تُثبّت بصريًّا ما الذي يجب اجتيازه قبل إصدار أي مهارة.</p>

<p>الرخصة MIT، ويُعلن البيان الوصفي (manifest) درجة النضج بأنها “governed”، ومرحلة دورة الحياة بأنها “library”، ودورية المراجعة بأنها “quarterly”. فالنيّة في إدارة المهارات كالكود — بالإصدارات والدرجات ودوريات المراجعة — تتجلّى من مستوى البيانات الوصفية نفسه.</p>

<p><img src="/assets/images/yao-meta-skill-diagram.png" alt="رسم بياني لتدفّق البيانات من Skill IR عبر المُصرِّفات المستهدفة وOutput Eval Lab وReview Studio" />
<em>خطّ معالجة تمرّ فيه مُدخَلات العمل المتكرّر عبر Skill IR، فتُصرَّف إلى منصّات متعدّدة، ثم تجتاز بوابتَي Output Eval Lab وReview Studio لتنتهي كأدلّة إصدار.</em></p>

<h2 id="التثبيت-والتكامل-أوامر-حقيقية">التثبيت والتكامل (أوامر حقيقية)</h2>

<p>جرى التحقّق في صندوق رمل معزول. ووفقًا للقواعد الداخلية، وُضِعت شجرة العمل خارج المستودع وجرى تنظيفها بعد الانتهاء.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1) استنساخ المستودع الخارجي</span>
git clone <span class="nt">--depth</span> 1 https://github.com/yaojingang/yao-meta-skill

<span class="c"># 2) تثبيت الاعتماد الأدنى في الـ .venv المشترك (قاعدة python-runtime)</span>
<span class="nv">VIRTUAL_ENV</span><span class="o">=</span><span class="s2">"</span><span class="nv">$REPO_ROOT</span><span class="s2">/.venv"</span> uv pip <span class="nb">install</span> <span class="s2">"PyYAML==6.0.3"</span>
</code></pre></div></div>

<p>اعتماديات المستودع خفيفة على نحو مدهش. فمتطلّبات التكامل المستمر (<code class="language-plaintext highlighter-rouge">requirements-ci.txt</code>) كانت سطرًا واحدًا فقط: <code class="language-plaintext highlighter-rouge">PyYAML==6.0.3</code>. أي أن أدوات التحقّق مبنية حول مكتبة بايثون القياسية الخالصة بلا أُطُر تشغيل ثقيلة — وهذه إشارة جيّدة لإدراجها في خطّ تكامل مستمر.</p>

<p>والتركيب الفعلي الذي قِسته فور الاستنساخ كان كالآتي: 632 ملفًّا متتبَّعًا، و77 اختبارًا، و29 تقييمًا (evals)، و10 مدخلات في أطلس المهارات (skill_atlas)، و3 مخطّطات (schemas)، وقالبَين (templates). فهذه ليست “مهارة” واحدة، بل أقرب إلى مصنع صغير ينتج المهارات ويتحقّق منها ويحوكمها.</p>

<p><img src="/assets/images/yao-meta-skill-results.png" alt="مخطّط لتركيب مستودع yao-meta-skill ونتائج بوابات التحقّق المحلية" />
<em>إلى اليسار: التركيب المقيس للمستودع (مقياس لوغاريتمي). وإلى اليمين: اجتياز بوابات التحقّق المحلية الأربع جميعها.</em></p>

<h2 id="نتائج-التحقّق-الفعلية">نتائج التحقّق الفعلية</h2>

<p>عرّف ملف <code class="language-plaintext highlighter-rouge">Makefile</code> أكثر من 25 هدف تحقّق. وقد شغّلت فعليًّا أربعة منها — Skill IR والمُصرِّف وتقييم المخرجات والتدقيق (lint) — وقَيّدت النتائج.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make skill-ir-check
<span class="c"># python3 tests/verify_skill_ir.py        -&gt; {"ok": true}</span>
<span class="c"># python3 tests/verify_skill_ir_paths.py  -&gt; {"ok": true}</span>

make compiler-check
<span class="c"># python3 tests/verify_compile_skill.py    -&gt; {"ok": true}</span>

make output-eval-check
<span class="c"># python3 tests/verify_output_eval_lab.py  -&gt; {"ok": true}</span>

python3 scripts/lint_skill.py ./   <span class="c"># مقابل ملف SKILL.md المُرفق</span>
<span class="c"># {"ok": true, "failures": [], "warnings": []}</span>
</code></pre></div></div>

<p>اجتازت البوابات الأربع جميعها بـ <code class="language-plaintext highlighter-rouge">ok: true</code>، وأبلغ التدقيق عن صفر إخفاقات وصفر تحذيرات. وهذه الأرقام قَيّدتها بتشغيلها بنفسي، لا باقتباس من مصدر خارجي. والمثير للاهتمام أن خرج التحقّق يأتي بصيغة JSON حتمية على هيئة <code class="language-plaintext highlighter-rouge">{"ok": true}</code> لا نصًّا إنشائيًّا. وهذه صيغة قابلة للقراءة الآلية تستطيع خطوط المعالجة الأعلى أن تبني عليها البوابات تلقائيًّا — وهو الاتجاه ذاته الذي يقوم عليه مبدأ ThakiCloud القائل إن “الصيغة يملكها الكود”.</p>

<p>غير أن قيدًا واحدًا تكشّف أيضًا بالقياس. إذ أصدر <code class="language-plaintext highlighter-rouge">lint_skill.py</code> خطأ استخدام عند استدعائه بلا وسائط، واشترط تحديد دليل المهارة صراحةً. وأرجع سكربت قياس حجم السياق (<code class="language-plaintext highlighter-rouge">context_sizer.py</code>) تقديرًا للرموز قيمته 0 في بعض المسارات، وبدا حسّاسًا لطريقة تمرير الوسائط. أي إن التنبيه التشغيلي هو: “أهداف make تعمل جيّدًا، لكن استدعاء السكربتات الفردية مباشرةً يتطلّب مطابقة الواجهة بدقّة”.</p>

<h2 id="التطبيق-والدلالات-لمنصّة-thakicloud-k8s-aiml-saas">التطبيق والدلالات لمنصّة ThakiCloud K8s AI/ML SaaS</h2>

<p>تشغّل ThakiCloud بالفعل أكثر من ألف مهارة وقاعدة داخلية. وعند هذا الحجم، فإن أكبر تكلفة ليست المهارات نفسها، بل ضريبة السياق التي تدفعها كل مهارة مُفهرَسة في كل جلسة، إضافةً إلى تصادم المُحفِّزات. وتتلخّص النقاط الجديرة بالاستعارة من yao-meta-skill في ثلاث.</p>

<p>أولًا، <strong>التبنّي الجزئي لفكرة Skill IR</strong>. فبدلًا من إدارة المهارات الداخلية مرّتين عبر Claude Code وCursor، يقلّل وصف النيّة والمُحفِّزات والحدود وصفًا محايدًا مرّة واحدة ثم التصريف لكل بيئة من سطح الإدارة. وقد يكون التبنّي الكامل مبالغًا فيه، لكن بنينة وصف (description) المهارات الجديدة ومُحفِّزاتها كأنها مخطّط IR تفيد وحدها.</p>

<p>ثانيًا، <strong>استعارة بوابات على نمط Output Eval Lab</strong>. فلدينا داخليًّا بالفعل بوابات تحرير وسكربتات تحقّق حتمية، لكن تقييم المُحفِّزات — أي الفحص بالبيانات عمّا إذا كان المُحفِّز يُطلَق كما هو مقصود — ضعيف نسبيًّا. وهذا نمط قابل للاستخدام المباشر لتقليل ضوضاء المشتّتات (distractor noise) في موجّه المهارات.</p>

<p>ثالثًا، <strong>بوابة إصدار واحدة على نمط Review Studio</strong>. فبوابة تؤكّد النيّة والمُحفِّزات وتكلفة السياق ووقت التشغيل في صفحة واحدة قبل دمج مهارة جديدة، متماثلة فلسفيًّا مع بوابات النشر (ArgoCD وKueue) لمنصّة AI/ML SaaS العاملة فوق K8s. فكما نضع بوابة على نشر الكود، نضع بوابة على نشر المهارة.</p>

<h2 id="القيود-والحجج-المضادة">القيود والحجج المضادة</h2>

<p>تفاديًا للتلخيص المتفائل وحده، أُسجّل الحجج المضادة بوضوح.</p>

<p>أولًا، <strong>مصدر ادّعاء “أقوى من الرسمية” هو توصية مؤثّر</strong>. صحيح أن بنية المستودع والتحقّق المحلي متينان، لكن Skill-creator الرسمية من Anthropic تمتاز بحلقات إنشاء سريعة تبدأ بالمحادثة، وهذا غرض مختلف. والأداتان متكاملتان لا متنافستان. ومقارنة “الأقوى” تكون دقيقة فقط حين تُحصَر ببناء أصول فِرَق تحتاج إلى حوكمة.</p>

<p>ثانيًا، <strong>تكلفة التبنّي</strong>. فإدخال مصنع بحجم 632 ملفًّا كما هو مبالغة لفرد واحد أو فريق صغير. والمسار الواقعي هو الاستعارة الانتقائية للأفكار الجوهرية (IR، تقييم المُحفِّزات، البوابة الواحدة).</p>

<p>ثالثًا، <strong>حسّاسية الواجهة التشغيلية</strong>. فكما تأكّد بالقياس سابقًا، كانت السكربتات الفردية حسّاسة للوسائط وأرجع بعض القياسات قيمة 0. وعند الإدراج في التكامل المستمر، يُغلَّف الأمر على مستوى أهداف make وتُثبَّت واجهات السكربتات الفردية.</p>

<p>في الختام، تُعدّ yao-meta-skill من أكثر الأمثلة مفتوحة المصدر تجسيدًا ملموسًا لاتّجاه “هندسة المهارات كأنها برمجيات”. وحتى من دون تبنّيها بالكامل، فإن أي منظّمة تصير فيها المهارات أصولًا ستجد مبادئ تصميمها جديرةً بالدراسة.</p>

<h2 id="المصادر">المصادر</h2>

<ul>
  <li>yao-meta-skill (GitHub, MIT): <a href="https://github.com/yaojingang/yao-meta-skill">github.com/yaojingang/yao-meta-skill</a></li>
  <li>البيان الوصفي للمستودع ونتائج التحقّق: جميع الأرقام في هذا المقال مقيسة محليًّا باستنساخ v1.1.0.</li>
</ul>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="claude-skills" /><category term="meta-skill" /><category term="skill-governance" /><category term="skill-ir" /><category term="agent-skills" /><category term="evaluation" /><summary type="html"><![CDATA[قمنا باستنساخ أداة الميتا-سكيل مفتوحة المصدر yao-meta-skill — التي شاع أنها أقوى من Skill-creator الرسمية من Anthropic — مباشرةً في بيئة ThakiCloud وشغّلنا بوابات التحقّق المحلية. نفكّك بنية Skill IR وOutput Eval Lab وReview Studio 2.0 بأرقام مقيسة، ونلخّص الدلالات من منظور حوكمة .claude/skills الداخلية.]]></summary></entry><entry xml:lang="en"><title type="html">Cutting Token Cost by 34-71% with Reversible Compression: A Headroom Field Report and ThakiCloud Context Hygiene</title><link href="https://thakicloud.github.io/en/dev/headroom-reversible-context-compression/" rel="alternate" type="text/html" title="Cutting Token Cost by 34-71% with Reversible Compression: A Headroom Field Report and ThakiCloud Context Hygiene" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/en/dev/headroom-reversible-context-compression</id><content type="html" xml:base="https://thakicloud.github.io/en/dev/headroom-reversible-context-compression/"><![CDATA[<p><img src="/assets/images/headroom-reversible-context-compression-hero.png" alt="Abstract image of data condensing" />
<em>Context is not free. Condensing scattered tokens losslessly is what Headroom does.</em></p>

<h2 id="overview">Overview</h2>

<p>Any team running AI coding agents daily knows where the biggest hidden cost comes from. It is context. Tool outputs, RAG results, logs, files, and conversation history pile up every turn, and those tokens become the bill. In multi-agent workflows this cost grows not linearly but multiplicatively, because every time one subagent drops a large search-result JSON into context, the cache-read tokens grow alongside it.</p>

<p>This post is not a simple tool introduction. ThakiCloud already runs Headroom in its production tool chain, and this time we pulled three real JSON tool outputs from our own repo and ran Headroom directly against them. We document the install command, the integration code, and the measured token reductions in a reproducible form. The short version: the more repetitive the JSON structure, the larger the savings, and on our data the token reduction reached up to 71.2%. Every number was measured in a real sandbox, with no estimates mixed in.</p>

<h2 id="what-is-headroom">What Is Headroom</h2>

<p>Headroom (PyPI package <code class="language-plaintext highlighter-rouge">headroom-ai</code>, GitHub <code class="language-plaintext highlighter-rouge">chopratejas/headroom</code>) is a context-compression tool open-sourced by ex-Netflix engineer Tejas Chopra. Its stated goal is clear: compress tool outputs, logs, files, and RAG chunks before they reach the LLM, reducing tokens while keeping the answer identical.</p>

<p>Most existing context-reduction tools are irreversible. Once you cut, you cannot get the original back. Headroom’s key differentiator is that it runs locally, covers multiple content types, and is reversible. The original can be restored within a configured TTL via breadcrumb hashes. This structurally prevents the classic failure of “we compressed and the agent lost the details.” You can run on the compressed version by default and restore the original only when a specific section is needed.</p>

<p>There are three ways to attach it: as a library you call directly, as a proxy, or as an MCP server. It recognizes content type and compresses selectively, keeping only the outliers in JSON or only the failure lines in logs.</p>

<h3 id="internals-smartcrusher-is-the-core">Internals: SmartCrusher Is the Core</h3>

<p>Headroom routes to a different compressor per content type. In this experiment the transforms that actually fired showed up in the router log as <code class="language-plaintext highlighter-rouge">router:protected:user_message</code> and <code class="language-plaintext highlighter-rouge">router:mixed:...</code>, meaning it protects the user message and compresses only the JSON payload of tool messages.</p>

<ul>
  <li><strong>SmartCrusher</strong>: a general-purpose JSON compressor that handles arrays of dicts, nested objects, and mixed types. For repetitive JSON tool output (search results, log rows, record lists) it folds redundant keys and infers schema to reduce deterministically. It accounted for most of the savings in our measurement.</li>
  <li><strong>Code compressor</strong>: structure-aware source-code compression.</li>
  <li><strong>Image compression</strong>: image payloads are also reduced.</li>
</ul>

<p>The diagram below is the data flow we observed. Tool output passes through the router into SmartCrusher, and while the compressed context goes to the LLM call, the original is stored separately for reversible restoration when needed.</p>

<p><img src="/assets/images/headroom-reversible-context-compression-diagram.png" alt="Headroom pipeline diagram" />
<em>Tool output → Content-Type Router → SmartCrusher → compressed context → LLM. The original is kept with a breadcrumb hash and TTL to preserve a reversible restoration path. (Labels are in Korean in the rendered image.)</em></p>

<h2 id="install-and-integration">Install and Integration</h2>

<p>Our Python runtime is unified into a single interpreter (3.12.8) <code class="language-plaintext highlighter-rouge">.venv</code>. Installation is one line.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">VIRTUAL_ENV</span><span class="o">=</span><span class="s2">"</span><span class="nv">$PWD</span><span class="s2">/.venv"</span> uv pip <span class="nb">install</span> <span class="s2">"headroom-ai[code,relevance]"</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">[code,relevance]</code> extra enables code structure-aware compression and relevance-based filtering. Semantic compression of plain text needs an additional model (about 261MB), but the highest-impact JSON path works with this base install alone.</p>

<p>Integration is simplest by passing a message list directly. The core of the wrapper we actually use (<code class="language-plaintext highlighter-rouge">scripts/headroom_compress.py</code>) is below. Put the tool output as the <code class="language-plaintext highlighter-rouge">content</code> of a <code class="language-plaintext highlighter-rouge">tool</code>-role message and call <code class="language-plaintext highlighter-rouge">compress</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">headroom</span> <span class="kn">import</span> <span class="n">compress</span>

<span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Summarize this tool output</span><span class="sh">"</span><span class="p">},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
     <span class="sh">"</span><span class="s">tool_calls</span><span class="sh">"</span><span class="p">:</span> <span class="p">[{</span><span class="sh">"</span><span class="s">id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">c1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">,</span>
                     <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">arguments</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">{}</span><span class="sh">"</span><span class="p">}}]},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">tool_call_id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">c1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">raw_json_string</span><span class="p">},</span>
<span class="p">]</span>

<span class="n">result</span> <span class="o">=</span> <span class="nf">compress</span><span class="p">(</span><span class="n">messages</span><span class="p">,</span> <span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">claude-sonnet-4-5-20250929</span><span class="sh">"</span><span class="p">)</span>
<span class="n">compressed</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">messages</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">tokens_before</span><span class="p">,</span> <span class="sh">"</span><span class="s">-&gt;</span><span class="sh">"</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">tokens_after</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">transforms_applied</span><span class="p">)</span>
</code></pre></div></div>

<p>The object <code class="language-plaintext highlighter-rouge">compress</code> returns carries <code class="language-plaintext highlighter-rouge">tokens_before</code>, <code class="language-plaintext highlighter-rouge">tokens_after</code>, and <code class="language-plaintext highlighter-rouge">transforms_applied</code>, so code can verify after the fact what the compression actually did. The point is that these are values the library measured, not numbers the model self-reports. On top of that we cross-checked once more with a separate tokenizer (tiktoken).</p>

<h2 id="real-experiment-results">Real Experiment Results</h2>

<p>The experiment ran in an isolated git worktree sandbox. The structure never touches the main working tree and keeps only results in an evidence directory. The test data is three of our repo’s real artifacts with clearly repetitive JSON structure.</p>

<ol>
  <li><strong>skill_index.json</strong>: a BM25 index for skill search. Records with identical schema repeat at scale.</li>
  <li><strong>seedance-prompts/raw-prompts.json</strong>: a catalog of 605 prompts. Natural-language text is the dominant share.</li>
  <li><strong>twitter timeline archive</strong>: 1,385 timeline records. An array of objects with identical key structure.</li>
</ol>

<p>Token counts were measured with the <code class="language-plaintext highlighter-rouge">cl100k_base</code> tokenizer. We recorded both bytes and tokens because compression should be judged not by raw byte savings but by how much it helps in the actual billing unit, the token. The results are below.</p>

<table>
  <thead>
    <tr>
      <th>Test data</th>
      <th>Original tokens</th>
      <th>Compressed tokens</th>
      <th>Token reduction</th>
      <th>Byte reduction</th>
      <th>Time</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>skill_index (BM25 index)</td>
      <td>1,618,287</td>
      <td>465,445</td>
      <td><strong>71.2%</strong></td>
      <td>64.9%</td>
      <td>2.08s</td>
    </tr>
    <tr>
      <td>twitter-timeline (record array)</td>
      <td>399,926</td>
      <td>192,465</td>
      <td><strong>51.9%</strong></td>
      <td>57.0%</td>
      <td>0.24s</td>
    </tr>
    <tr>
      <td>seedance-prompts (prompt catalog)</td>
      <td>1,085,592</td>
      <td>713,210</td>
      <td><strong>34.3%</strong></td>
      <td>38.5%</td>
      <td>0.57s</td>
    </tr>
  </tbody>
</table>

<p><img src="/assets/images/headroom-reversible-context-compression-results.png" alt="Measured compression chart" />
<em>Measured reduction rates for three JSON tool outputs from the ThakiCloud repo. Bytes and tokens are shown together.</em></p>

<p>How to read the numbers matters. <strong>The more repetitive the structure, the larger the savings.</strong> skill_index is an index of densely repeating identical-schema records, so SmartCrusher’s key folding maximizes and cut tokens by a full 71.2%. The twitter timeline, also a uniform object array, was reduced by more than half. By contrast seedance-prompts, where natural-language prompt text makes up most of each record, had little room to trim via structural compression and landed at 34.3%. This difference directly demonstrates the design intent that “JSON is where it works best.”</p>

<p>The timing is also worth noting. It processed a 1.6-million-token index in two seconds, and the rest in under a second. That is fast enough to inline right before tool output enters context with almost no perceptible latency. Because the compression is deterministic, the same input always yields the same output, which is also cache-friendly.</p>

<p>One honest caveat. The numbers above are single-run measurements over three datasets. On other kinds of JSON, especially data with mostly unique values and few repeated keys, the reduction may be lower. Still, within our measured range, a span of 34-71% token reduction is a clearly meaningful result, at least for repetitive-structure tool output.</p>

<h2 id="application-to-the-thakicloud-k8s-aiml-saas-platform">Application to the ThakiCloud K8s AI/ML SaaS Platform</h2>

<p>The point where we adopted Headroom is exactly what the experiment above shows: <strong>repetitive-structure, large JSON tool output.</strong> Our context-hygiene rule (<code class="language-plaintext highlighter-rouge">ecc-token-strategy</code>) spells this out: repetitive-structure JSON array tool output is compressed deterministically with SmartCrusher before entering context, plain text is not a target but JSON is, and the priority is subagent summarization first, then headroom compression.</p>

<p>The reason this matters so much in K8s multi-agent orchestration is the structure of the cost. In a workflow where many subagents run, context hygiene means three things at once. First, token cost control. Second, cache hit-rate management; deterministic compression guarantees identical output for identical input, so it does not break the prompt cache. Third, latency management; the smaller the context, the faster the model responds.</p>

<p>Our LLM serving schedules GPU workloads with Kueue on top of K8s, and many inference requests flow concurrently. In this environment, a bloated context costs more than one request; it eats overall throughput. Headroom lets us insert this layer with almost no code change. We compress a search-result or log array in one line right before it enters context, and restore reversibly only when a specific section is needed.</p>

<p>It is practical from a data-scientist’s perspective too. In a RAG pipeline where retrieved chunks come loaded with repetitive metadata (identical keys like source URL, timestamp, score), that metadata is precisely the area SmartCrusher trims best. Because it preserves the body and reduces only the structural overhead, you secure context budget without sacrificing retrieval accuracy.</p>

<h2 id="limitations-and-counterarguments">Limitations and Counterarguments</h2>

<p>We do not recommend this tool uncritically. Here are the honest limitations and counterarguments.</p>

<p><strong>First, local execution is a precondition.</strong> Headroom needs to run a local process, so it cannot be used in fully sandboxed, isolated execution environments. There are deployment shapes this constraint does not fit.</p>

<p><strong>Second, the effect on plain text is limited.</strong> As the seedance-prompts result shows, data with a high share of natural-language text has little room to trim via structural compression. Reducing plain text semantically requires an additional model, and that path gives up some determinism and speed.</p>

<p><strong>Third, it may be overkill for single-provider teams.</strong> If one model provider’s native compaction is enough and you do not need cross-agent memory, the operational burden of adding a separate compression layer may outweigh the gain.</p>

<p><strong>Fourth, the strongest counterargument is “couldn’t you just summarize with a subagent?”</strong> In fact our own rule prioritizes subagent summarization over headroom compression. Summarization is irreversible but reduces far more and compresses by meaning. So where does Headroom fit? The answer is “when summarizing would lose details that might be needed later.” Reversibility fills exactly this gap. You run on the compressed version normally, and the moment you need the original of a specific record, you restore it precisely within the TTL. Summarization and compression are not competitors but complements.</p>

<p>In short, Headroom implements the principle that “context is not free” with the concrete design of reversible compression. Within our measured range it cut tokens by 34-71% on repetitive-structure JSON, and thanks to determinism and reversibility it neither broke the cache nor lost the details. If you are an engineer interested in how ThakiCloud treats context hygiene as a cost and reliability problem, we are the place that runs this layer in production.</p>

<hr />

<p>Sources: Headroom (headroom-ai), PyPI https://pypi.org/project/headroom-ai/ · GitHub https://github.com/chopratejas/headroom (author Tejas Chopra). The figures in this post are measured directly on ThakiCloud repo data.</p>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="headroom" /><category term="context-compression" /><category term="token-cost" /><category term="llm-serving" /><category term="rag" /><category term="mcp" /><summary type="html"><![CDATA[The biggest hidden cost of an AI coding agent is context. We ran Headroom (headroom-ai) directly against three real JSON tool outputs from the ThakiCloud repo and measured the token reduction. We walk through how SmartCrusher's lossless, reversible compression cut tokens by up to 71.2%, from install command to measured numbers.]]></summary></entry><entry xml:lang="en"><title type="html">Nature-Grade Figures and Polishing as Code: A Hands-On Report on Running nature-skills</title><link href="https://thakicloud.github.io/en/dev/nature-skills-academic-figure-polishing/" rel="alternate" type="text/html" title="Nature-Grade Figures and Polishing as Code: A Hands-On Report on Running nature-skills" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/en/dev/nature-skills-academic-figure-polishing</id><content type="html" xml:base="https://thakicloud.github.io/en/dev/nature-skills-academic-figure-polishing/"><![CDATA[<p><img src="/assets/images/nature-skills-hero.png" alt="Abstract image of multi-panel data curves and figure plates floating in an academic atmosphere" />
<em>Capturing the spirit of an academic figure skill that treats a figure not as a “pretty plot” but as a “visual argument.”</em></p>

<h2 id="overview">Overview</h2>

<p>The two tasks researchers most often hand to Claude Code are “make a figure for my paper” and “polish this English draft to journal level.” Hand either to a general-purpose LLM and the output wobbles every time. Figures get arbitrary font sizes and colors; polishing rewrites sentences with no consistent rules. The open-source skill package nature-skills (Yuan1z0825/nature-skills) aims to demote that variability into a verified scaffold.</p>

<p>As it gained attention, some shared posts described it as having “20K+ GitHub stars,” but the actual number I confirmed was far smaller, around 265 [estimated]. Star-count inflation is common, so in this article I evaluated its value not by stars but by the measured results of running the tool directly. This is an implementation report that clones nature-skills into the ThakiCloud environment and uses its nature-figure skill to render real serving data into a submission-grade figure.</p>

<h2 id="what-this-tool-is">What This Tool Is</h2>

<p>The actual composition I confirmed after cloning the repository was 12 skills under <code class="language-plaintext highlighter-rouge">skills/</code> (excluding shared modules). It covers the entire academic workflow: nature-figure (scientific figures), nature-polishing (academic polishing), nature-academic-search (literature search), nature-citation, nature-reviewer, nature-response (reviewer responses), and more. The license is MIT.</p>

<p>The star of this article, <strong>nature-figure, is version 2.0.0</strong>, and it has a router structure split into static and dynamic layers. The large design, API, pattern, and QA knowledge lives in on-demand reference files, and for each task it detects the backend (Python/R) and loads only the fragment it needs. This is exactly the same pattern as the progressive disclosure that ThakiCloud emphasizes.</p>

<p>The most impressive design is the <strong>“figure contract.”</strong> Before writing any code, it forces you to fix a one-sentence core conclusion, the evidence chain, the archetype classification, the backend, and the journal/export contract first. The skill insists that “a figure is a visual argument, not an isolated pretty plot.” It also puts backend selection behind a <strong>blocking gate</strong>. If the user does not specify Python or R, it asks “Python or R?” and stops. It reduces the degrees of freedom so the model cannot pick a default on its own.</p>

<p><img src="/assets/images/nature-skills-diagram.png" alt="nature-figure routing diagram from Figure Contract through the backend gate to the QA contract" />
<em>The flow defines the core conclusion, passes the Python/R backend gate, applies rcParams and PALETTE to export editable SVG/TIFF, and finishes with the QA contract.</em></p>

<h2 id="installation-and-integration-real-commands">Installation and Integration (Real Commands)</h2>

<p>Verification ran in an isolated sandbox outside the repository and was cleaned up afterward.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1) Clone the external repository</span>
git clone <span class="nt">--depth</span> 1 https://github.com/Yuan1z0825/nature-skills

<span class="c"># 2) Confirm the Python backend dependency (shared .venv)</span>
.venv/bin/python <span class="nt">-c</span> <span class="s2">"import matplotlib; print(matplotlib.__version__)"</span>
<span class="c"># matplotlib 3.11.0</span>
</code></pre></div></div>

<p>nature-figure’s Python quick-start (<code class="language-plaintext highlighter-rouge">static/fragments/backend/python.md</code>) specifies the <code class="language-plaintext highlighter-rouge">rcParams</code> for submission-grade figures, and <code class="language-plaintext highlighter-rouge">references/api.md</code> defines a journal-friendly PALETTE. The core settings are as follows.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mpl</span><span class="p">.</span><span class="n">rcParams</span><span class="p">.</span><span class="nf">update</span><span class="p">({</span>
    <span class="sh">"</span><span class="s">font.family</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">sans-serif</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">font.sans-serif</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Arial</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Helvetica</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">DejaVu Sans</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">sans-serif</span><span class="sh">"</span><span class="p">],</span>
    <span class="sh">"</span><span class="s">svg.fonttype</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">none</span><span class="sh">"</span><span class="p">,</span>   <span class="c1"># keep text inside the SVG editable
</span>    <span class="sh">"</span><span class="s">pdf.fonttype</span><span class="sh">"</span><span class="p">:</span> <span class="mi">42</span><span class="p">,</span>       <span class="c1"># keep text in PDF as editable TrueType
</span>    <span class="sh">"</span><span class="s">font.size</span><span class="sh">"</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span>           <span class="c1"># 7pt baseline unless it is a large slide panel
</span>    <span class="sh">"</span><span class="s">axes.linewidth</span><span class="sh">"</span><span class="p">:</span> <span class="mf">0.8</span><span class="p">,</span>
<span class="p">})</span>
<span class="c1"># PALETTE excerpt from api.md
</span><span class="n">P</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">blue_main</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#0F4D92</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">red_strong</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#B64342</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">neutral_dark</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#4D4D4D</span><span class="sh">"</span><span class="p">}</span>
</code></pre></div></div>

<p>The single line <code class="language-plaintext highlighter-rouge">svg.fonttype: "none"</code> is the key. A typical export converts text to outlines (paths), making the letters uneditable in Illustrator. This setting keeps text as <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> tags, so labels can be edited directly during the journal proofing stage.</p>

<h2 id="real-experiment-results">Real Experiment Results</h2>

<p>Applying the skill’s rules (rcParams, PALETTE) verbatim, I rendered data directly relevant to ThakiCloud into a figure. The subject is a two-panel figure comparing latency and throughput of GPU inference serving across batch sizes for FP16 versus INT8. The serving-curve numbers in the plot itself are schematic, while the <strong>measured values are the meta-numbers captured during rendering</strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>RENDER_MS=195.4
SVG_BYTES=24131
PNG_BYTES=254233          # 600 dpi
SVG_EDITABLE_TEXT_TAGS=36
PANELS=2 (a:latency, b:throughput)
RCPARAMS_FONT_SIZE=7.0
SVG_FONTTYPE=none
</code></pre></div></div>

<p>There are three key results. First, rendering the two-panel figure finished in about 195 milliseconds. Second, the 600dpi PNG was about 254KB and the SVG about 24KB, both lightweight. Third, and the most important verification: the generated SVG contained <strong>36 <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> tags</strong>. This is direct evidence that the “editable text” the skill promises was actually upheld. Had it been converted to outlines, the <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> tag count would be 0.</p>

<p><img src="/assets/images/nature-skills-results.png" alt="A Nature-style two-panel figure comparing FP16 and INT8 inference latency and throughput" />
<em>The actual output rendered by applying nature-figure’s rcParams and PALETTE. Left (a) shows latency by batch size, right (b) shows throughput. The serving-curve values are example data.</em></p>

<p>These numbers were all captured to stdout by running it myself, not quoted externally. The key point is that the skill proves quality with execution evidence rather than claiming in prose that it “drew something pretty.”</p>

<h2 id="application-and-implications-for-the-thakicloud-k8s-aiml-saas-platform">Application and Implications for the ThakiCloud K8s AI/ML SaaS Platform</h2>

<p>nature-skills demonstrates two threads at once.</p>

<p>From a data-science practitioner’s perspective, the idea of <strong>fixing chart style with verified tokens</strong> is immediately useful. ThakiCloud’s reports and dashboards tend to wobble in color, font, and axes every time, but pinning rcParams and PALETTE in one place like nature-figure raises the average quality. In particular, the pattern of exporting editable SVG with <code class="language-plaintext highlighter-rouge">svg.fonttype: "none"</code> can be used directly for marketing and seminar materials that the design team post-processes. The result figure in this article is the proof.</p>

<p>From a platform-strategy perspective, nature-skills shows a <strong>PMF (Product-Market Fit) signal for the academic vertical</strong>. Rather than a general-purpose skill, it condenses rules into the narrow, deep use case of “Nature journal submission,” which is why the output is so consistent. For ThakiCloud, which operates a K8s-based AI/ML SaaS, a vertical skill that layers thin domain rules on top of a general-purpose LLM is a core differentiation pattern. The same scaffold can be replicated into in-house verticals such as healthcare, finance, and patents.</p>

<h2 id="limitations-and-counterarguments">Limitations and Counterarguments</h2>

<p>First, <strong>star-count inflation</strong>. The “20K+ stars” in some shared posts differed greatly from the actual figure (around 265) [estimated]. This case reconfirms that you should not trust viral signals at face value and instead run the tool yourself.</p>

<p>Second, <strong>responsibility for the truth of the figure data rests with the user.</strong> The skill draws figures well, but it does not guarantee the accuracy of the numbers that go into them. That is exactly why I explicitly marked the serving curves as examples in this article. In a real paper or report, only measured values should go in.</p>

<p>Third, <strong>the enforcement of the backend gate</strong> can become friction in an automation pipeline. The behavior of asking “Python or R?” and stopping each time is a safeguard in interactive use, but unattended batches need a wrapper that fixes the backend in advance.</p>

<p>In conclusion, nature-skills is a good example of “a vertical skill that condenses domain rules into code.” When you judge its value by measured evidence such as 36 editable text tags rather than by stars, its design has plenty worth learning from.</p>

<h2 id="sources">Sources</h2>

<ul>
  <li>nature-skills (GitHub, MIT): <a href="https://github.com/Yuan1z0825/nature-skills">github.com/Yuan1z0825/nature-skills</a></li>
  <li>All measured numbers in this article were rendered locally by cloning nature-figure v2.0.0 directly. The star count (around 265) is an estimate based on a search.</li>
</ul>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="claude-skills" /><category term="academic-writing" /><category term="matplotlib" /><category term="data-visualization" /><category term="nature-figure" /><category term="skill-marketplace" /><summary type="html"><![CDATA[We cloned nature-skills, an open-source Claude skill package that bundles Nature-journal-grade scientific figure generation with academic polishing, and used nature-figure to render ThakiCloud serving data into a submission-grade two-panel figure. We measured everything down to 36 editable SVG text tags and lay out the implications from a vertical-PMF perspective on the skill marketplace.]]></summary></entry><entry xml:lang="en"><title type="html">Treating Skills Like Software: A Hands-On Verification of yao-meta-skill v1.1.0</title><link href="https://thakicloud.github.io/en/dev/yao-meta-skill-engineering-governance/" rel="alternate" type="text/html" title="Treating Skills Like Software: A Hands-On Verification of yao-meta-skill v1.1.0" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/en/dev/yao-meta-skill-engineering-governance</id><content type="html" xml:base="https://thakicloud.github.io/en/dev/yao-meta-skill-engineering-governance/"><![CDATA[<p><img src="/assets/images/yao-meta-skill-hero.png" alt="Abstract modular blocks forming a precision assembly line with glowing governance gates" />
<em>A conceptual view of treating skills not as one-off prompts but as reusable assets carrying versioning, verification, and governance.</em></p>

<h2 id="overview">Overview</h2>

<p>In agent environments like Claude Code, Cursor, and Codex CLI, a Skill is no longer just a bundle of prompts. It is closer to a capability product that packages repetitive work for reuse across multiple harnesses. But as skills multiply, three problems grow at once: quality variance, trigger collisions, and context cost. yao-meta-skill, an open-source project that went viral after Chinese influencer @vista8 (~113K followers) recommended it as “more powerful than Anthropic’s official Skill-creator,” takes direct aim at this.</p>

<p>YAO stands for “Yielding AI Outcomes,” and the repository describes itself as “a rigorous engineering, evaluation, governance, and portability system for reusable agent skills.” Rather than take that claim at face value, I cloned it into a ThakiCloud workspace and actually ran the local verification gates the repo ships with. This is an implementation report that dissects yao-meta-skill’s structure from those measured results and considers what we can borrow for our own <code class="language-plaintext highlighter-rouge">.claude/skills</code> operations.</p>

<h2 id="what-this-tool-is">What This Tool Is</h2>

<p>yao-meta-skill is a “skill that makes skills” — a meta-skill. It takes repetitive work such as workflow notes, prompt sets, conversation transcripts, runbooks, and document patterns, and converts them into a verifiable skill package. Its core design rests on three pillars.</p>

<p>First, <strong>Skill IR (Intermediate Representation)</strong>. Intent, triggers, inputs, outputs, boundaries, references, and expected artifacts are first described in a platform-neutral intermediate representation. Target compilers and adapters then convert this IR into five targets: OpenAI, Claude, generic agent skills, Agent-Skills-compatible packages, and VS Code-oriented workflows. Describing a skill once and compiling it to many environments precisely targets the burden of maintaining the same skill twice across Claude Code and Cursor.</p>

<p>Second, the <strong>Output Eval Lab</strong>. It is a layer that verifies skill output quality with data: trigger checks, output assertions, execution evidence, timing and token evidence, benchmark reproducibility, and blind-review packs. The fact that code actually verifies, instead of the model merely claiming “it worked,” is striking.</p>

<p>Third, <strong>Review Studio 2.0</strong>. It consolidates intent, triggers, output evaluation, context cost, runtime checks, and release evidence into a single HTML gate page — a checkpoint that fixes exactly what a skill must pass before release.</p>

<p>The license is MIT, and the manifest declares a maturity tier of “governed,” a lifecycle stage of “library,” and a review cadence of “quarterly.” The intent to manage skills like code — with versions, tiers, and review cadences — is visible at the metadata level itself.</p>

<p><img src="/assets/images/yao-meta-skill-diagram.png" alt="Diagram of the flow from Skill IR through target compilers, Output Eval Lab, and Review Studio" />
<em>Repetitive-work inputs pass through Skill IR, compile to multiple platforms, then clear the Output Eval Lab and Review Studio gates to finish as release evidence.</em></p>

<h2 id="installation-and-integration-real-commands">Installation and Integration (Real Commands)</h2>

<p>Verification ran in an isolated sandbox. Per our rules, the worktree lived outside the repository and was cleaned up afterward.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1) Clone the external repo</span>
git clone <span class="nt">--depth</span> 1 https://github.com/yaojingang/yao-meta-skill

<span class="c"># 2) Install the minimal dependency into the shared .venv (python-runtime rule)</span>
<span class="nv">VIRTUAL_ENV</span><span class="o">=</span><span class="s2">"</span><span class="nv">$REPO_ROOT</span><span class="s2">/.venv"</span> uv pip <span class="nb">install</span> <span class="s2">"PyYAML==6.0.3"</span>
</code></pre></div></div>

<p>The dependencies are surprisingly light. The CI requirements (<code class="language-plaintext highlighter-rouge">requirements-ci.txt</code>) were a single line: <code class="language-plaintext highlighter-rouge">PyYAML==6.0.3</code>. This means the verification tooling is built around the pure-Python standard library rather than heavy runtimes — a good sign for slotting it into a CI pipeline.</p>

<p>The actual composition I measured right after cloning was: 632 tracked files, 77 tests, 29 evals, 10 skill-atlas entries, 3 schemas, and 2 templates. This is not a single “skill” but closer to a small factory that produces, verifies, and governs skills.</p>

<p><img src="/assets/images/yao-meta-skill-results.png" alt="Chart of the yao-meta-skill repository composition and local verification gate results" />
<em>Left: the repository’s measured composition (log scale). Right: all four local verification gates passing.</em></p>

<h2 id="real-verification-results">Real Verification Results</h2>

<p>The <code class="language-plaintext highlighter-rouge">Makefile</code> defined more than 25 verification targets. I actually ran four of them — skill IR, compiler, output eval, and lint — and captured the results.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make skill-ir-check
<span class="c"># python3 tests/verify_skill_ir.py        -&gt; {"ok": true}</span>
<span class="c"># python3 tests/verify_skill_ir_paths.py  -&gt; {"ok": true}</span>

make compiler-check
<span class="c"># python3 tests/verify_compile_skill.py    -&gt; {"ok": true}</span>

make output-eval-check
<span class="c"># python3 tests/verify_output_eval_lab.py  -&gt; {"ok": true}</span>

python3 scripts/lint_skill.py ./   <span class="c"># against the bundled SKILL.md</span>
<span class="c"># {"ok": true, "failures": [], "warnings": []}</span>
</code></pre></div></div>

<p>All four gates passed with <code class="language-plaintext highlighter-rouge">ok: true</code>, and lint reported zero failures and zero warnings. These numbers were captured by running it myself, not quoted from elsewhere. What is notable is that the verification output is deterministic JSON in the form <code class="language-plaintext highlighter-rouge">{"ok": true}</code> rather than prose. This is a machine-readable format an upstream pipeline can gate on automatically — the same direction as ThakiCloud’s own principle that “format is owned by code.”</p>

<p>One limitation also surfaced through measurement. <code class="language-plaintext highlighter-rouge">lint_skill.py</code> raised a usage error when called without arguments and required an explicit skill directory. The context-sizing script (<code class="language-plaintext highlighter-rouge">context_sizer.py</code>) returned a token estimate of 0 on some paths, appearing sensitive to how arguments are passed. In short, “the make targets work well, but calling individual scripts directly requires matching the interface precisely” is a practical caveat.</p>

<h2 id="application-and-implications-for-the-thakicloud-k8s-aiml-saas-platform">Application and Implications for the ThakiCloud K8s AI/ML SaaS Platform</h2>

<p>ThakiCloud already operates over a thousand in-house skills and rules. At this scale, the biggest cost is not the skills themselves but the context tax every indexed skill pays each session, plus trigger collisions. There are three points worth borrowing from yao-meta-skill.</p>

<p>First, <strong>partial adoption of the Skill IR idea</strong>. Instead of maintaining in-house skills twice across Claude Code and Cursor, describing intent, triggers, and boundaries neutrally once and compiling per environment reduces the management surface. Full adoption may be overkill, but structuring new skills’ descriptions and triggers like an IR schema already helps.</p>

<p>Second, <strong>borrowing the Output Eval Lab style of gating</strong>. We already have editorial gates and deterministic verification scripts, but trigger evaluation — checking with data whether a trigger fires as intended — is relatively weak. This pattern is directly usable for reducing skill-router distractor noise.</p>

<p>Third, <strong>a Review Studio-style single release gate</strong>. A checkpoint that confirms intent, triggers, context cost, and runtime on one page before merging a new skill is philosophically isomorphic to the deployment gates (ArgoCD, Kueue) of an AI/ML SaaS running on K8s. Just as we gate code deployment, we gate skill deployment.</p>

<h2 id="limitations-and-counterarguments">Limitations and Counterarguments</h2>

<p>To avoid a one-sided endorsement, here are the counterarguments.</p>

<p>First, <strong>the “more powerful than the official” claim traces back to an influencer recommendation</strong>. The repository structure and local verification are solid, but Anthropic’s official Skill-creator excels at fast, conversation-first creation loops — a different purpose. The two are complementary rather than competitive. The “more powerful” comparison is accurate only when scoped to building governed team assets.</p>

<p>Second, <strong>adoption cost</strong>. Bringing in a 632-file factory wholesale is overkill for a solo or small team. Selectively borrowing the core ideas (IR, trigger evaluation, single gate) is the realistic path.</p>

<p>Third, <strong>interface sensitivity</strong>. As measured above, individual scripts were sensitive to arguments and some metrics returned 0. When slotting into CI, wrap things at the make-target level and pin the interfaces of individual scripts.</p>

<p>In conclusion, yao-meta-skill is one of the most concrete open-source examples of “engineering skills like software.” Even without adopting all of it, any organization where skills become assets will find its design principles worth studying.</p>

<h2 id="sources">Sources</h2>

<ul>
  <li>yao-meta-skill (GitHub, MIT): <a href="https://github.com/yaojingang/yao-meta-skill">github.com/yaojingang/yao-meta-skill</a></li>
  <li>Repository manifest and verification results: all numbers in this article were measured locally by cloning v1.1.0.</li>
</ul>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="claude-skills" /><category term="meta-skill" /><category term="skill-governance" /><category term="skill-ir" /><category term="agent-skills" /><category term="evaluation" /><summary type="html"><![CDATA[yao-meta-skill, an open-source meta-skill rumored to be more powerful than Anthropic's official Skill-creator, gets cloned into the ThakiCloud environment and put through its local verification gates. We break down Skill IR, the Output Eval Lab, and Review Studio 2.0 with measured numbers, and draw lessons for in-house .claude/skills governance.]]></summary></entry><entry xml:lang="ko"><title type="html">로컬에서 도는 멀티에이전트: Gemma 4 26B로 10개 병렬 서브에이전트 오케스트레이션</title><link href="https://thakicloud.github.io/ko/agentops/gemma4-local-multi-agent-orchestration/" rel="alternate" type="text/html" title="로컬에서 도는 멀티에이전트: Gemma 4 26B로 10개 병렬 서브에이전트 오케스트레이션" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ko/agentops/gemma4-local-multi-agent-orchestration</id><content type="html" xml:base="https://thakicloud.github.io/ko/agentops/gemma4-local-multi-agent-orchestration/"><![CDATA[<p>멀티에이전트 오케스트레이션이라고 하면 보통 클라우드 API를 떠올립니다. 그런데 최근 커뮤니티에서 공유된 데모는 다른 방향을 보여줍니다. Gemma 4 26B를 <strong>로컬 머신에서 띄워</strong> 10개의 병렬 서브에이전트로 SVG 아트 갤러리를 코딩하고, 100 tokens/sec 이상의 처리량을 달성했다는 것입니다.</p>

<p>저희 ThakiCloud는 K8s 기반 AI/ML SaaS 플랫폼에서 모델 서빙과 멀티에이전트 워크플로를 직접 다룹니다. 이 데모가 왜 온프레미스 추론 경제성의 변곡점을 보여주는지, 그리고 운영 관점에서 무엇을 시사하는지 짚어보겠습니다.</p>

<h2 id="무엇이-달라졌나-로컬-멀티에이전트가-실용-영역에-들어왔다">무엇이 달라졌나: 로컬 멀티에이전트가 실용 영역에 들어왔다</h2>

<p>핵심은 두 가지가 동시에 성립했다는 점입니다.</p>

<ul>
  <li><strong>모델이 충분히 작고 빠르다</strong>: Gemma 4 26B 같은 중형 오픈웨이트 모델이 로컬 GPU에서 실용적인 처리량으로 돌아갑니다.</li>
  <li><strong>에이전트를 병렬로 띄울 수 있다</strong>: 단일 모델 인스턴스 위에서 다수의 서브에이전트를 병렬로 fan-out 해 독립 작업을 분배합니다.</li>
</ul>

<p>10개 서브에이전트가 각각 SVG 작품을 생성하고 그 결과를 갤러리로 조립하는 구조는, 클라우드 API 비용 없이 로컬에서 멀티에이전트 패턴을 검증할 수 있다는 것을 보여줍니다. (100+ tokens/sec는 작성자의 로컬 환경 자가 보고 수치이므로 [추정]으로 받아들이는 것이 정확합니다. 하드웨어·양자화·배치 설정에 따라 크게 달라집니다.)</p>

<h2 id="멀티에이전트-오케스트레이션의-운영-관점">멀티에이전트 오케스트레이션의 운영 관점</h2>

<p>병렬 서브에이전트를 띄우는 것은 멋지지만, 운영에는 규율이 필요합니다. 저희가 멀티에이전트 워크플로를 다루며 얻은 원칙은 이렇습니다.</p>

<ul>
  <li><strong>워커는 싸게, 게이트만 비싸게</strong>: 탐색·생성 같은 fan-out 작업은 작은 로컬 모델로 충분합니다. 합성·검증 같은 판단 단계만 강한 모델에 배분합니다. 전부 같은 모델로 돌리면 품질도, 비용도 최적이 아닙니다.</li>
  <li><strong>병렬은 자원 경합을 부른다</strong>: 10개 서브에이전트를 동시에 띄우면 GPU 메모리와 KV 캐시가 경합합니다. 순차 처리와 병렬 처리의 트레이드오프를 작업 성격에 맞춰 결정해야 합니다.</li>
  <li><strong>검증 단계가 품질을 만든다</strong>: 병렬 워커의 산출물을 모은 뒤, 적대적(adversarial) 검증 단계를 한 번 더 두면 모델 등급을 올리지 않고도 품질이 올라갑니다. 품질 문제는 모델이 약해서가 아니라 검증이 없어서 나는 경우가 많습니다.</li>
</ul>

<h2 id="thakicloud-관점-온프레미스-추론-경제성">ThakiCloud 관점: 온프레미스 추론 경제성</h2>

<p>로컬 멀티에이전트 데모가 의미 있는 진짜 이유는 <strong>데이터 주권과 비용</strong>입니다. 민감한 코드·문서를 외부 API에 보내지 않고 사내 GPU에서 처리하려는 수요가 분명히 존재합니다. 중형 오픈웨이트 모델이 실용 처리량에 도달하면서, 이 수요는 더 이상 이론이 아니라 운영 가능한 옵션이 됩니다.</p>

<p>저희가 다루는 영역이 바로 이 지점입니다. K8s 위에서 모델 서빙을 표준화하고, Kueue로 GPU 워크로드를 큐잉하며, 멀티에이전트 오케스트레이션을 재현 가능하게 운영하는 일입니다. 로컬 단일 머신 데모를 조직 규모의 서빙 인프라로 확장하면, 자원 스케줄링·격리·관측성이 핵심 과제가 됩니다. 단순히 모델을 띄우는 것과, 다수의 테넌트가 안정적으로 멀티에이전트를 돌리게 하는 것은 다른 문제입니다.</p>

<h2 id="마치며">마치며</h2>

<p>Gemma 4 26B 로컬 멀티에이전트 데모는 “온프레미스 추론이 실용 영역에 들어왔다”는 신호입니다. 모델이 작아지고 빨라지면서, 멀티에이전트 패턴을 클라우드 비용 없이 검증할 수 있게 되었습니다. 이를 조직 규모로 키우는 일에 관심 있는 엔지니어라면, 이런 서빙·스케줄링 문제가 매일의 과제인 곳입니다.</p>

<hr />

<p>출처: Gemma 4 26B 로컬 멀티에이전트 오케스트레이션 커뮤니티 데모. Gemma 모델 정보: https://ai.google.dev/gemma (처리량 수치는 작성자 로컬 벤치 자가 보고 [추정])</p>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="agentops" /><category term="gemma4" /><category term="multi-agent" /><category term="local-inference" /><category term="on-premise" /><category term="orchestration" /><category term="llm-serving" /><summary type="html"><![CDATA[Gemma 4 26B를 로컬에서 띄워 10개 병렬 서브에이전트로 SVG 아트 갤러리를 코딩하는 데모를 분석합니다. 온프레미스 추론 경제성과 멀티에이전트 오케스트레이션을 ThakiCloud K8s 서빙 관점에서 정리합니다.]]></summary></entry><entry xml:lang="ko"><title type="html">토큰 비용을 34~71% 깎는 가역 압축: Headroom 실측 리포트와 ThakiCloud 컨텍스트 위생</title><link href="https://thakicloud.github.io/ko/dev/headroom-reversible-context-compression/" rel="alternate" type="text/html" title="토큰 비용을 34~71% 깎는 가역 압축: Headroom 실측 리포트와 ThakiCloud 컨텍스트 위생" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ko/dev/headroom-reversible-context-compression</id><content type="html" xml:base="https://thakicloud.github.io/ko/dev/headroom-reversible-context-compression/"><![CDATA[<p><img src="/assets/images/headroom-reversible-context-compression-hero.png" alt="데이터가 응축되는 추상 이미지" />
<em>컨텍스트는 공짜가 아닙니다. 흩어진 토큰을 무손실로 응축하는 것이 Headroom의 일입니다.</em></p>

<h2 id="개요">개요</h2>

<p>AI 코딩 에이전트를 매일 돌리는 팀이라면 가장 큰 숨은 비용이 어디서 나오는지 알고 있습니다. 바로 컨텍스트입니다. 도구 출력, RAG 결과, 로그, 파일, 대화 히스토리가 매 턴 쌓이고, 그 토큰이 그대로 청구서가 됩니다. 멀티에이전트 워크플로에서는 이 비용이 선형이 아니라 곱셈으로 늘어납니다. 서브에이전트 한 개가 큰 검색 결과 JSON을 컨텍스트에 넣을 때마다 캐시 read 토큰이 함께 불어나기 때문입니다.</p>

<p>이 글은 단순 도구 소개가 아닙니다. 저희 ThakiCloud는 이미 Headroom을 프로덕션 도구 체인에 채택해 운영 중이고, 이번에는 저희 repo의 실제 JSON 도구 출력 3종을 가져와 Headroom을 직접 돌렸습니다. 설치 명령, 통합 코드, 그리고 실측 토큰 절감 수치까지 재현 가능한 형태로 정리합니다. 결론을 먼저 말하면, 반복 구조가 강한 JSON일수록 절감이 크고, 저희 데이터에서는 토큰 기준 최대 71.2%까지 줄었습니다. 모든 수치는 샌드박스에서 실제로 측정한 값이며, 추정값을 섞지 않았습니다.</p>

<h2 id="headroom이란-무엇인가">Headroom이란 무엇인가</h2>

<p>Headroom(PyPI 패키지명 <code class="language-plaintext highlighter-rouge">headroom-ai</code>, GitHub <code class="language-plaintext highlighter-rouge">chopratejas/headroom</code>)은 넷플릭스 출신 엔지니어 Tejas Chopra가 오픈소스화한 컨텍스트 압축 도구입니다. 표방하는 목표는 명확합니다. 도구 출력, 로그, 파일, RAG 청크를 LLM에 닿기 전에 압축해 토큰을 줄이되 답은 동일하게 유지하는 것입니다.</p>

<p>기존 컨텍스트 절감 도구들은 대부분 비가역입니다. 한 번 자르면 원본을 되돌릴 수 없습니다. Headroom의 핵심 차별점은 로컬에서 동작하고, 여러 콘텐츠 타입을 커버하며, 가역(reversible)이라는 점입니다. 원본은 설정된 TTL 안에서 브레드크럼 해시로 복원할 수 있습니다. 즉 “압축했더니 에이전트가 디테일을 잃었다”는 전형적 실패를 구조적으로 막습니다. 압축본을 우선 투입하고, 특정 섹션이 필요할 때만 원본을 복원하는 운영이 가능합니다.</p>

<p>붙이는 방식도 세 가지입니다. 라이브러리로 직접 호출하거나, 프록시로 끼우거나, MCP 서버로 띄울 수 있습니다. 콘텐츠 타입을 인식해서 JSON의 이상치만 남기거나 로그의 실패 라인만 남기는 식으로 선택적으로 압축합니다.</p>

<h3 id="내부-구성-smartcrusher가-핵심">내부 구성: SmartCrusher가 핵심</h3>

<p>Headroom은 콘텐츠 타입별로 다른 압축기를 라우팅합니다. 이번 실험에서 실제로 동작한 변환기는 라우터 로그에 <code class="language-plaintext highlighter-rouge">router:protected:user_message</code>, <code class="language-plaintext highlighter-rouge">router:mixed:...</code>로 찍혔습니다. 즉 사용자 메시지는 보호하고, 도구 메시지의 JSON 페이로드만 골라 압축한다는 뜻입니다.</p>

<ul>
  <li><strong>SmartCrusher</strong>: 딕셔너리 배열, 중첩 객체, 혼합 타입을 다루는 범용 JSON 압축기입니다. 반복 구조의 JSON 도구 출력(검색 결과, 로그 행, 레코드 리스트)에서 중복되는 키를 폴딩하고 스키마를 추론해 결정론적으로 줄입니다. 이번 측정에서 절감의 대부분을 책임진 컴포넌트입니다.</li>
  <li><strong>코드 압축기</strong>: 소스 코드를 구조 인식으로 압축합니다.</li>
  <li><strong>이미지 압축</strong>: 이미지 페이로드도 절감 대상입니다.</li>
</ul>

<p>아래 다이어그램이 이번에 관측한 데이터 흐름입니다. 도구 출력이 라우터를 거쳐 SmartCrusher로 들어가고, 압축 컨텍스트가 LLM 호출로 가는 동안 원본은 별도로 보관되어 필요 시 가역 복원됩니다.</p>

<p><img src="/assets/images/headroom-reversible-context-compression-diagram.png" alt="Headroom 파이프라인 다이어그램" />
<em>도구 출력 → Content-Type Router → SmartCrusher → 압축 컨텍스트 → LLM. 원본은 브레드크럼 해시와 TTL로 보관되어 가역 복원 경로를 유지합니다.</em></p>

<h2 id="설치-및-통합">설치 및 통합</h2>

<p>저희 환경의 Python 런타임은 단일 인터프리터(3.12.8) <code class="language-plaintext highlighter-rouge">.venv</code>로 통합되어 있습니다. 설치는 한 줄입니다.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">VIRTUAL_ENV</span><span class="o">=</span><span class="s2">"</span><span class="nv">$PWD</span><span class="s2">/.venv"</span> uv pip <span class="nb">install</span> <span class="s2">"headroom-ai[code,relevance]"</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">[code,relevance]</code> extra는 코드 구조 인식 압축과 관련도 기반 필터링을 켭니다. 평문 텍스트의 의미 기반 압축까지 쓰려면 추가 모델(약 261MB)이 필요하지만, 가장 효과가 큰 JSON 경로는 이 기본 설치만으로 동작합니다.</p>

<p>통합은 메시지 리스트를 그대로 넘기는 방식이 가장 단순합니다. 저희가 실제로 쓰는 래퍼(<code class="language-plaintext highlighter-rouge">scripts/headroom_compress.py</code>)의 핵심은 다음과 같습니다. 도구 출력을 <code class="language-plaintext highlighter-rouge">tool</code> 역할 메시지의 <code class="language-plaintext highlighter-rouge">content</code>로 넣고 <code class="language-plaintext highlighter-rouge">compress</code>를 호출하면 끝입니다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="n">headroom</span> <span class="kn">import</span> <span class="n">compress</span>

<span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Summarize this tool output</span><span class="sh">"</span><span class="p">},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="bp">None</span><span class="p">,</span>
     <span class="sh">"</span><span class="s">tool_calls</span><span class="sh">"</span><span class="p">:</span> <span class="p">[{</span><span class="sh">"</span><span class="s">id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">c1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">,</span>
                     <span class="sh">"</span><span class="s">function</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">arguments</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">{}</span><span class="sh">"</span><span class="p">}}]},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">tool</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">tool_call_id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">c1</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">raw_json_string</span><span class="p">},</span>
<span class="p">]</span>

<span class="n">result</span> <span class="o">=</span> <span class="nf">compress</span><span class="p">(</span><span class="n">messages</span><span class="p">,</span> <span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">claude-sonnet-4-5-20250929</span><span class="sh">"</span><span class="p">)</span>
<span class="n">compressed</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">messages</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">][</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">tokens_before</span><span class="p">,</span> <span class="sh">"</span><span class="s">-&gt;</span><span class="sh">"</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">tokens_after</span><span class="p">,</span> <span class="n">result</span><span class="p">.</span><span class="n">transforms_applied</span><span class="p">)</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">compress</code>가 반환하는 객체에는 <code class="language-plaintext highlighter-rouge">tokens_before</code>, <code class="language-plaintext highlighter-rouge">tokens_after</code>, <code class="language-plaintext highlighter-rouge">transforms_applied</code>가 들어 있어, 압축이 실제로 무엇을 했는지 코드가 사후 검증할 수 있습니다. 모델이 자기보고하는 숫자가 아니라 라이브러리가 측정한 값이라는 점이 중요합니다. 저희는 여기에 더해 별도 토크나이저(tiktoken)로 한 번 더 교차 검증했습니다.</p>

<h2 id="실제-실험-결과">실제 실험 결과</h2>

<p>실험은 격리된 git worktree 샌드박스에서 진행했습니다. 메인 작업 트리를 건드리지 않고, 결과만 evidence 디렉터리에 남기는 구조입니다. 테스트 데이터는 저희 repo의 실제 산출물 중 반복 구조가 뚜렷한 JSON 3종을 골랐습니다.</p>

<ol>
  <li><strong>skill_index.json</strong>: 스킬 검색용 BM25 인덱스. 동일 스키마의 레코드가 대량 반복됩니다.</li>
  <li><strong>seedance-prompts/raw-prompts.json</strong>: 프롬프트 카탈로그 605개. 자연어 텍스트 비중이 높습니다.</li>
  <li><strong>twitter timeline 아카이브</strong>: 타임라인 레코드 1,385개. 동일 키 구조의 객체 배열입니다.</li>
</ol>

<p>토큰 카운트는 <code class="language-plaintext highlighter-rouge">cl100k_base</code> 토크나이저로 측정했습니다. 바이트와 토큰을 모두 기록한 이유는, 압축이 단순 바이트 절감이 아니라 실제 청구 단위인 토큰에서 얼마나 효과가 있는지를 봐야 하기 때문입니다. 측정 결과는 다음과 같습니다.</p>

<table>
  <thead>
    <tr>
      <th>테스트 데이터</th>
      <th>원본 토큰</th>
      <th>압축 토큰</th>
      <th>토큰 절감</th>
      <th>바이트 절감</th>
      <th>소요</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>skill_index (BM25 인덱스)</td>
      <td>1,618,287</td>
      <td>465,445</td>
      <td><strong>71.2%</strong></td>
      <td>64.9%</td>
      <td>2.08s</td>
    </tr>
    <tr>
      <td>twitter-timeline (레코드 배열)</td>
      <td>399,926</td>
      <td>192,465</td>
      <td><strong>51.9%</strong></td>
      <td>57.0%</td>
      <td>0.24s</td>
    </tr>
    <tr>
      <td>seedance-prompts (프롬프트 카탈로그)</td>
      <td>1,085,592</td>
      <td>713,210</td>
      <td><strong>34.3%</strong></td>
      <td>38.5%</td>
      <td>0.57s</td>
    </tr>
  </tbody>
</table>

<p><img src="/assets/images/headroom-reversible-context-compression-results.png" alt="실측 압축률 차트" />
<em>ThakiCloud repo의 JSON 도구 출력 3종에 대한 실측 절감률. 바이트와 토큰을 함께 표기했습니다.</em></p>

<p>수치를 읽는 방법이 중요합니다. <strong>반복 구조가 강할수록 절감이 큽니다.</strong> skill_index는 동일 스키마 레코드가 빽빽하게 반복되는 인덱스라 SmartCrusher의 키 폴딩 효과가 극대화되어 토큰을 71.2%나 줄였습니다. twitter timeline도 균일한 객체 배열이라 절반 이상 절감했습니다. 반면 seedance-prompts는 자연어 프롬프트 텍스트가 레코드의 대부분을 차지해, 구조 압축으로 깎을 여지가 상대적으로 적어 34.3%에 그쳤습니다. 이 차이가 바로 “JSON 경로에서 가장 효과가 크다”는 설계 의도를 그대로 보여줍니다.</p>

<p>소요 시간도 주목할 만합니다. 160만 토큰짜리 인덱스를 2초 만에 처리했고, 나머지는 1초 미만입니다. 이 정도면 도구 출력이 컨텍스트로 들어가기 직전에 인라인으로 끼워도 체감 지연이 거의 없습니다. 결정론적 압축이라 같은 입력에는 항상 같은 출력이 나오고, 따라서 캐시 친화적이기도 합니다.</p>

<p>한 가지 정직하게 짚을 점이 있습니다. 위 수치는 한 번씩 측정한 단일 런 값이며, 데이터셋 3종에 대한 결과입니다. 다른 종류의 JSON, 특히 값이 거의 유니크하고 반복 키가 적은 데이터에서는 절감률이 더 낮게 나올 수 있습니다. 그래도 저희 실측 범위에서 토큰 기준 34~71%라는 폭은, 적어도 반복 구조 도구 출력에 대해서는 충분히 의미 있는 결과입니다.</p>

<h2 id="thakicloud-k8s-aiml-saas-플랫폼-적용-및-시사점">ThakiCloud K8s AI/ML SaaS 플랫폼 적용 및 시사점</h2>

<p>저희가 Headroom을 채택한 지점은 정확히 위 실험이 보여준 곳입니다. 바로 <strong>반복 구조의 대용량 JSON 도구 출력</strong>입니다. 저희 컨텍스트 위생 룰(<code class="language-plaintext highlighter-rouge">ecc-token-strategy</code>)에는 이런 규칙이 명시되어 있습니다. 반복 구조의 JSON 배열 도구 출력은 컨텍스트에 넣기 전에 SmartCrusher로 결정론적으로 압축한다, 평문은 압축 대상이 아니라 JSON 경로에 한정한다, 그리고 우선순위는 서브에이전트 요약이 먼저고 그다음이 headroom 압축이다.</p>

<p>이것이 K8s 위 멀티에이전트 오케스트레이션에서 특히 중요한 이유는 비용의 구조 때문입니다. 다수의 서브에이전트가 도는 워크플로에서 컨텍스트 위생은 곧 세 가지를 동시에 의미합니다. 첫째는 토큰 비용 통제입니다. 둘째는 캐시 히트율 관리입니다. 결정론적 압축은 동일 입력에 동일 출력을 보장하므로 프롬프트 캐시를 깨지 않습니다. 셋째는 응답 지연 관리입니다. 컨텍스트가 작을수록 모델이 더 빨리 응답합니다.</p>

<p>저희 LLM 서빙은 K8s 위에서 Kueue로 GPU 워크로드를 스케줄링하고, 다수의 추론 요청이 동시에 흐릅니다. 이 환경에서 컨텍스트가 비대해지면 그 비용은 한 요청에 그치지 않고 전체 처리량을 갉아먹습니다. Headroom은 이 레이어를 코드를 거의 바꾸지 않고 끼워 넣을 수 있게 해줍니다. 검색 결과나 로그 배열을 컨텍스트에 넣기 직전에 한 줄로 압축하고, 특정 섹션이 필요할 때만 가역 복원하는 운영이 가능합니다.</p>

<p>데이터 사이언티스트 관점에서도 실용적입니다. RAG 파이프라인에서 검색된 청크가 반복 메타데이터(소스 URL, 타임스탬프, 점수 같은 동일 키)를 잔뜩 달고 오는 경우, 그 메타데이터 부분이 바로 SmartCrusher가 가장 잘 깎는 영역입니다. 본문은 보존하면서 구조적 군더더기만 줄이므로, 검색 정확도를 희생하지 않고 컨텍스트 예산을 확보할 수 있습니다.</p>

<h2 id="한계-및-반론">한계 및 반론</h2>

<p>이 도구를 무비판적으로 권하지는 않습니다. 솔직한 한계와 반론을 정리합니다.</p>

<p><strong>첫째, 로컬 실행이 전제입니다.</strong> Headroom은 로컬 프로세스를 실행할 수 있어야 동작하므로, 샌드박스로 완전히 격리된 실행 환경에서는 쓸 수 없습니다. 이 제약이 맞지 않는 배포 형태가 분명히 있습니다.</p>

<p><strong>둘째, 평문에는 효과가 제한적입니다.</strong> 위 seedance-prompts 결과가 보여주듯, 자연어 텍스트 비중이 높은 데이터는 구조 압축으로 깎을 여지가 적습니다. 평문까지 의미 기반으로 줄이려면 추가 모델을 설치해야 하고, 그 경로는 결정론성과 속도를 일부 포기하게 됩니다.</p>

<p><strong>셋째, 단일 프로바이더만 쓰는 팀에는 과잉일 수 있습니다.</strong> 한 모델 프로바이더의 네이티브 compaction만으로 충분하고 크로스 에이전트 메모리가 필요 없다면, 별도 압축 레이어를 도입하는 운영 부담이 이득보다 클 수 있습니다.</p>

<p><strong>넷째, 가장 강한 반론은 “그냥 서브에이전트로 요약하면 되지 않나”입니다.</strong> 실제로 저희 룰의 우선순위도 서브에이전트 요약이 headroom 압축보다 앞섭니다. 요약은 비가역이지만 절감 폭이 훨씬 크고 의미 단위로 압축됩니다. 그렇다면 Headroom의 자리는 어디인가. 답은 “요약하면 디테일을 잃는데, 그 디테일이 나중에 필요할지 모를 때”입니다. 가역성이 바로 이 빈틈을 메웁니다. 압축본으로 평소에 돌다가, 특정 레코드의 원본이 필요해지는 순간 TTL 안에서 정확히 복원합니다. 요약과 압축은 경쟁 관계가 아니라 보완 관계입니다.</p>

<p>정리하면 Headroom은 “컨텍스트는 공짜가 아니다”라는 원칙을 가역 압축이라는 구체적 설계로 구현한 사례입니다. 저희 실측 범위에서 반복 구조 JSON에 대해 토큰을 34~71% 줄였고, 결정론성과 가역성 덕분에 캐시를 깨지 않으면서 디테일도 잃지 않았습니다. ThakiCloud가 컨텍스트 위생을 어떻게 비용과 신뢰성 문제로 다루는지에 관심 있는 엔지니어라면, 이런 레이어를 프로덕션에서 직접 운영하는 곳이 저희입니다.</p>

<hr />

<p>출처: Headroom (headroom-ai), PyPI https://pypi.org/project/headroom-ai/ · GitHub https://github.com/chopratejas/headroom (작성자 Tejas Chopra). 본문 수치는 ThakiCloud repo 데이터로 직접 측정한 실측값입니다.</p>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="headroom" /><category term="context-compression" /><category term="token-cost" /><category term="llm-serving" /><category term="rag" /><category term="mcp" /><summary type="html"><![CDATA[AI 코딩 에이전트의 가장 큰 숨은 비용은 컨텍스트입니다. Headroom(headroom-ai)을 ThakiCloud repo의 실제 JSON 도구 출력 3종에 직접 돌려 토큰 절감률을 측정했습니다. SmartCrusher가 무손실 가역 압축으로 토큰을 최대 71.2% 줄이는 과정을 설치 명령부터 측정 수치까지 정리합니다.]]></summary></entry><entry xml:lang="ko"><title type="html">LLM 없이 PDF를 마크다운으로: LiteParse와 RAG 인제스트 비용·데이터 주권</title><link href="https://thakicloud.github.io/ko/dev/liteparse-model-free-pdf-parser-rag/" rel="alternate" type="text/html" title="LLM 없이 PDF를 마크다운으로: LiteParse와 RAG 인제스트 비용·데이터 주권" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ko/dev/liteparse-model-free-pdf-parser-rag</id><content type="html" xml:base="https://thakicloud.github.io/ko/dev/liteparse-model-free-pdf-parser-rag/"><![CDATA[<p>RAG 파이프라인의 첫 단계는 문서 인제스트입니다. 그리고 그 첫 단계에서 가장 흔히 막히는 것이 PDF 파싱입니다. 최근 LLM 기반 파서가 늘었지만, LLM을 매 문서에 돌리면 비용과 지연이 누적되고, 민감한 문서를 외부 모델에 보내는 데이터 주권 문제도 생깁니다. LlamaIndex(Jerry Liu)가 발표한 <strong>LiteParse</strong>는 다른 방향을 택합니다. <strong>LLM 없이</strong> PDF를 마크다운으로 변환하는 Apache 2.0 오픈소스 파서입니다.</p>

<p>저희 ThakiCloud는 K8s 기반 AI/ML SaaS 플랫폼에서 RAG 문서 인제스트를 다룹니다. 모델 비의존 파서가 왜 비용·주권 관점에서 매력적인지, 그리고 어디까지 헷지해야 하는지 짚어보겠습니다.</p>

<h2 id="무엇이-다른가-모델-비의존model-free-파싱">무엇이 다른가: 모델 비의존(model-free) 파싱</h2>

<p>LiteParse의 핵심 차별점은 <strong>파싱에 LLM을 쓰지 않는다</strong>는 것입니다. 이 설계가 주는 이점은 분명합니다.</p>

<ul>
  <li><strong>비용</strong>: 문서당 LLM 호출 비용이 없습니다. 대량 문서를 인제스트할 때 비용이 선형으로 폭증하지 않습니다.</li>
  <li><strong>지연</strong>: LLM 추론 왕복이 없으므로 파싱이 빠릅니다.</li>
  <li><strong>데이터 주권</strong>: 문서를 외부 모델에 보내지 않습니다. 민감 문서를 사내에서 처리하려는 조직에 결정적 이점입니다.</li>
  <li><strong>결정론</strong>: LLM 파서는 같은 문서도 호출마다 다르게 풀 수 있지만, 규칙 기반 파서는 재현 가능합니다.</li>
</ul>

<p>LiteParse 측은 모델 비의존 파서 범주에서 여러 벤치마크 최고 점수를 주장합니다. 다만 이 주장은 <strong>자체 측정이고 model-free 범주에 한정</strong>된다는 점을 명시해야 합니다. LLM 기반 파서와의 절대 비교가 아니라, “모델을 안 쓰는 파서 중에서”라는 조건이 붙습니다. 속도·정확도 주장은 이 범주 한정으로 헷지하는 것이 정직합니다.</p>

<h2 id="rag-인제스트-관점에서의-트레이드오프">RAG 인제스트 관점에서의 트레이드오프</h2>

<p>모델 비의존 파서가 만능은 아닙니다. 트레이드오프를 분명히 해야 합니다.</p>

<ul>
  <li><strong>구조가 복잡한 문서</strong>: 표, 다단 레이아웃, 스캔된 이미지 PDF는 규칙 기반 파서가 어려워하는 영역입니다. LLM 비전 파서가 더 나을 수 있습니다.</li>
  <li><strong>하이브리드 전략</strong>: 대부분의 일반 문서는 model-free 파서로 빠르고 싸게 처리하고, 구조가 복잡한 소수만 LLM 파서로 처리하는 하이브리드가 현실적입니다. 비용과 품질을 분리하는 설계입니다.</li>
</ul>

<h2 id="thakicloud-관점-인제스트-비용을-1급-시민으로">ThakiCloud 관점: 인제스트 비용을 1급 시민으로</h2>

<p>RAG 파이프라인을 프로덕션에서 운영하면, 인제스트 비용이 의외로 큰 비중을 차지합니다. 문서가 많고 자주 갱신될수록, 파싱에 LLM을 쓰는지 여부가 운영 비용을 좌우합니다. LiteParse 같은 model-free 파서를 기본 경로로 두고, 복잡한 문서만 LLM 파서로 에스컬레이션하는 라우팅이 비용 효율적입니다.</p>

<p>저희가 다루는 영역이 이 지점입니다. K8s 위에서 문서 인제스트 파이프라인을 표준화하고, 문서 유형에 따라 파서를 라우팅하며, 민감 문서를 사내에서 처리해 데이터 주권을 보장하는 일입니다. 인제스트를 단순 전처리가 아니라 비용·주권·품질이 만나는 1급 설계 문제로 다룹니다.</p>

<h2 id="마치며">마치며</h2>

<p>LiteParse는 “RAG 인제스트에 항상 LLM이 필요한 것은 아니다”라는 메시지를 줍니다. 모델 비의존 파서는 비용·지연·데이터 주권에서 분명한 이점이 있고, 복잡한 문서는 LLM 파서로 보완하는 하이브리드가 현실적입니다. 인제스트 비용을 1급 시민으로 다루는 일에 관심 있는 엔지니어라면, 이런 문제가 매일의 과제인 곳입니다.</p>

<hr />

<p>출처: LlamaIndex LiteParse (Apache 2.0). GitHub: https://github.com/run-llama/llama_cloud_services (벤치마크 점수는 자체 측정, model-free 범주 한정 주장).</p>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="liteparse" /><category term="llamaindex" /><category term="pdf-parsing" /><category term="rag" /><category term="document-ingest" /><category term="open-source" /><summary type="html"><![CDATA[LlamaIndex가 발표한 LiteParse는 LLM 없이 PDF를 마크다운으로 변환하는 Apache 2.0 오픈소스 파서입니다. 모델 비의존 파서의 비용·데이터 주권 이점과 한계를 ThakiCloud RAG 문서 인제스트 관점에서 정리합니다.]]></summary></entry><entry xml:lang="ko"><title type="html">Nature급 그림과 교열을 코드로: nature-skills를 직접 돌려 본 학술 버티컬 리포트</title><link href="https://thakicloud.github.io/ko/dev/nature-skills-academic-figure-polishing/" rel="alternate" type="text/html" title="Nature급 그림과 교열을 코드로: nature-skills를 직접 돌려 본 학술 버티컬 리포트" /><published>2026-06-21T00:00:00+09:00</published><updated>2026-06-21T00:00:00+09:00</updated><id>https://thakicloud.github.io/ko/dev/nature-skills-academic-figure-polishing</id><content type="html" xml:base="https://thakicloud.github.io/ko/dev/nature-skills-academic-figure-polishing/"><![CDATA[<p><img src="/assets/images/nature-skills-hero.png" alt="다중 패널 데이터 곡선과 그림판이 학술적 분위기로 떠 있는 추상 이미지" />
<em>그림을 ‘예쁜 플롯’이 아니라 ‘시각적 논증’으로 다루는 학술 그림 스킬의 분위기를 담았습니다.</em></p>

<h2 id="개요">개요</h2>

<p>연구자가 Claude Code에 가장 자주 의뢰하는 두 가지 작업은 “논문에 들어갈 그림을 만들어 달라”와 “이 영문 초고를 저널 수준으로 다듬어 달라”입니다. 둘 다 일반적인 LLM에게 맡기면 결과가 매번 흔들립니다. 그림은 폰트 크기와 색상이 제멋대로이고, 교열은 규칙 없이 문장을 바꿔 버립니다. 오픈소스 스킬 패키지 nature-skills(Yuan1z0825/nature-skills)는 이 변동성을 검증된 골격으로 강등시키는 것을 목표로 합니다.</p>

<p>화제가 되면서 일부 공유 글은 “GitHub 2만+ 스타”라고 소개했지만, 제가 확인한 실제 수치는 그보다 훨씬 작은 약 265개 수준이었습니다[추정]. 별 개수의 과장은 흔한 일이므로, 이 글에서는 별점이 아니라 도구를 직접 돌려 본 실측 결과로 가치를 평가했습니다. nature-skills를 ThakiCloud 환경에 클론하고, 그 안의 nature-figure 스킬로 실제 서빙 데이터를 제출 등급 그림으로 렌더링한 구현 리포트입니다.</p>

<h2 id="이-도구는-무엇인가">이 도구는 무엇인가</h2>

<p>저장소를 클론해 확인한 실제 구성은 <code class="language-plaintext highlighter-rouge">skills/</code> 아래 12개의 스킬(공유 모듈 제외)이었습니다. nature-figure(과학 그림), nature-polishing(학술 교열), nature-academic-search(문헌 검색), nature-citation, nature-reviewer, nature-response(리뷰어 응답) 등 학술 워크플로 전체를 커버합니다. 라이선스는 MIT입니다.</p>

<p>이번 글의 주역인 <strong>nature-figure는 버전 2.0.0</strong>으로, 정적 계층과 동적 계층으로 분리된 라우터 구조를 갖습니다. 큰 설계·API·패턴·QA 지식은 온디맨드 참조 파일에 두고, 매 작업마다 백엔드(Python/R)를 감지해 필요한 조각만 로드합니다. 이는 ThakiCloud가 강조하는 점진적 공개(progressive disclosure)와 정확히 같은 패턴입니다.</p>

<p>가장 인상적인 설계는 <strong>“그림 계약(figure contract)”</strong> 입니다. 코드를 작성하기 전에 핵심 결론 한 문장, 증거 사슬, 아키타입 분류, 백엔드, 저널/내보내기 계약을 먼저 확정하도록 강제합니다. 스킬은 “그림은 시각적 논증이지 고립된 예쁜 플롯이 아니다”라고 못 박습니다. 또한 백엔드 선택을 <strong>차단 게이트(blocking gate)</strong> 로 둡니다. 사용자가 Python인지 R인지 명시하지 않으면 “Python or R?”을 묻고 멈춥니다. 모델이 임의로 기본값을 고르지 못하게 자유도를 줄인 것입니다.</p>

<p><img src="/assets/images/nature-skills-diagram.png" alt="Figure Contract에서 백엔드 게이트와 QA 계약으로 이어지는 nature-figure 라우팅 다이어그램" />
<em>핵심 결론을 정의하고 Python/R 백엔드 게이트를 통과한 뒤, rcParams와 PALETTE를 적용해 편집 가능한 SVG/TIFF를 내보내고 QA 계약으로 마무리되는 흐름입니다.</em></p>

<h2 id="설치-및-통합-실제-명령">설치 및 통합 (실제 명령)</h2>

<p>검증은 저장소 바깥의 격리 샌드박스에서 진행한 뒤 정리했습니다.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1) 외부 저장소 클론</span>
git clone <span class="nt">--depth</span> 1 https://github.com/Yuan1z0825/nature-skills

<span class="c"># 2) Python 백엔드 의존성 확인 (공용 .venv)</span>
.venv/bin/python <span class="nt">-c</span> <span class="s2">"import matplotlib; print(matplotlib.__version__)"</span>
<span class="c"># matplotlib 3.11.0</span>
</code></pre></div></div>

<p>nature-figure의 Python 빠른 시작(<code class="language-plaintext highlighter-rouge">static/fragments/backend/python.md</code>)에는 제출 등급 그림을 위한 <code class="language-plaintext highlighter-rouge">rcParams</code>가 명시되어 있고, <code class="language-plaintext highlighter-rouge">references/api.md</code>에는 저널 친화적인 PALETTE가 정의되어 있습니다. 핵심 설정은 다음과 같습니다.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mpl</span><span class="p">.</span><span class="n">rcParams</span><span class="p">.</span><span class="nf">update</span><span class="p">({</span>
    <span class="sh">"</span><span class="s">font.family</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">sans-serif</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">font.sans-serif</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="sh">"</span><span class="s">Arial</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">Helvetica</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">DejaVu Sans</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">sans-serif</span><span class="sh">"</span><span class="p">],</span>
    <span class="sh">"</span><span class="s">svg.fonttype</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">none</span><span class="sh">"</span><span class="p">,</span>   <span class="c1"># SVG 안의 텍스트를 편집 가능하게 유지
</span>    <span class="sh">"</span><span class="s">pdf.fonttype</span><span class="sh">"</span><span class="p">:</span> <span class="mi">42</span><span class="p">,</span>       <span class="c1"># PDF 안의 텍스트도 편집 가능한 TrueType
</span>    <span class="sh">"</span><span class="s">font.size</span><span class="sh">"</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span>           <span class="c1"># 슬라이드용 대형 패널이 아니면 7pt 기준
</span>    <span class="sh">"</span><span class="s">axes.linewidth</span><span class="sh">"</span><span class="p">:</span> <span class="mf">0.8</span><span class="p">,</span>
<span class="p">})</span>
<span class="c1"># api.md PALETTE 발췌
</span><span class="n">P</span> <span class="o">=</span> <span class="p">{</span><span class="sh">"</span><span class="s">blue_main</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#0F4D92</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">red_strong</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#B64342</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">neutral_dark</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">#4D4D4D</span><span class="sh">"</span><span class="p">}</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">svg.fonttype: "none"</code> 한 줄이 핵심입니다. 일반적인 내보내기는 텍스트를 외곽선(path)으로 변환해 일러스트레이터에서 글자를 다시 편집할 수 없게 만듭니다. 이 설정은 텍스트를 <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> 태그로 유지해, 저널 교정 단계에서 라벨을 그대로 수정할 수 있게 합니다.</p>

<h2 id="실제-실험-결과">실제 실험 결과</h2>

<p>스킬의 규칙(rcParams, PALETTE)을 그대로 적용해, ThakiCloud와 직접 관련된 데이터를 그림으로 렌더링했습니다. 주제는 GPU 추론 서빙의 배치 크기에 따른 지연(latency)과 처리량(throughput)을 FP16과 INT8로 비교하는 2패널 그림입니다. 플롯에 들어간 서빙 곡선 수치 자체는 예시(schematic)이며, 측정한 <strong>실측값은 렌더링 과정에서 캡처한 메타 수치</strong>입니다.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>RENDER_MS=195.4
SVG_BYTES=24131
PNG_BYTES=254233          # 600 dpi
SVG_EDITABLE_TEXT_TAGS=36
PANELS=2 (a:latency, b:throughput)
RCPARAMS_FONT_SIZE=7.0
SVG_FONTTYPE=none
</code></pre></div></div>

<p>핵심 결과는 세 가지입니다. 첫째, 2패널 그림 렌더링이 약 195밀리초로 끝났습니다. 둘째, 600dpi PNG는 약 254KB, SVG는 약 24KB로 가벼웠습니다. 셋째, 그리고 가장 중요한 검증인데, <strong>생성된 SVG 안에 <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> 태그가 36개</strong> 존재했습니다. 이는 스킬이 약속한 “편집 가능한 텍스트”가 실제로 지켜졌다는 직접 증거입니다. 외곽선으로 변환됐다면 <code class="language-plaintext highlighter-rouge">&lt;text&gt;</code> 태그가 0개여야 합니다.</p>

<p><img src="/assets/images/nature-skills-results.png" alt="FP16과 INT8의 추론 지연과 처리량을 비교한 Nature 스타일 2패널 그림" />
<em>nature-figure의 rcParams와 PALETTE를 적용해 렌더링한 실제 결과물입니다. 왼쪽(a)은 배치 크기별 지연, 오른쪽(b)은 처리량을 보여 줍니다. 서빙 곡선 값은 예시 데이터입니다.</em></p>

<p>이 수치들은 모두 제가 직접 실행해 stdout으로 캡처한 값이며, 외부 인용이 아닙니다. 스킬이 산문으로 “예쁘게 그렸습니다”라고 주장하는 대신, 실행 증거로 품질을 증명한다는 점이 핵심입니다.</p>

<h2 id="thakicloud-k8s-aiml-saas-플랫폼-적용-및-시사점">ThakiCloud K8s AI/ML SaaS 플랫폼 적용 및 시사점</h2>

<p>nature-skills는 두 가지 결을 동시에 보여 줍니다.</p>

<p>데이터 과학 실무 관점에서는, <strong>차트 스타일을 검증된 토큰으로 고정</strong>한다는 발상이 즉시 유용합니다. ThakiCloud의 리포트와 대시보드는 매번 색·폰트·축이 흔들리기 쉬운데, nature-figure처럼 rcParams와 PALETTE를 한곳에 박아 두면 평균 품질이 올라갑니다. 특히 <code class="language-plaintext highlighter-rouge">svg.fonttype: "none"</code>으로 편집 가능한 SVG를 내보내는 패턴은, 디자인팀이 후처리하는 마케팅·세미나 자료에 그대로 쓸 수 있습니다. 본 글의 결과 그림이 그 증명입니다.</p>

<p>플랫폼 전략 관점에서는, nature-skills가 <strong>학술 버티컬의 PMF(Product-Market Fit) 신호</strong>를 보여 줍니다. 범용 스킬이 아니라 “Nature 저널 제출”이라는 좁고 깊은 사용처에 규칙을 응축했고, 그래서 결과의 일관성이 높습니다. K8s 기반 AI/ML SaaS를 운영하는 ThakiCloud 입장에서, 범용 LLM 위에 도메인 규칙을 얇게 얹은 버티컬 스킬은 차별화의 핵심 패턴입니다. 같은 골격을 의료, 금융, 특허 같은 사내 버티컬에 복제할 수 있습니다.</p>

<h2 id="한계-및-반론">한계 및 반론</h2>

<p>첫째, <strong>별 개수 과장</strong>입니다. 일부 공유 글의 “2만+ 스타”는 실제(약 265)와 크게 차이가 났습니다[추정]. 바이럴 신호를 그대로 신뢰하지 말고 직접 돌려 보는 절차가 필요하다는 점을, 이 사례가 다시 확인해 줍니다.</p>

<p>둘째, <strong>그림 데이터의 진위 책임은 사용자에게 있습니다.</strong> 스킬은 그림을 잘 그려 주지만, 거기에 들어가는 수치의 정확성은 보장하지 않습니다. 본 글에서 서빙 곡선을 예시로 명시한 이유도 이것입니다. 실제 논문이나 리포트에서는 측정값만 넣어야 합니다.</p>

<p>셋째, <strong>백엔드 게이트의 강제성</strong>은 자동화 파이프라인에서는 마찰이 될 수 있습니다. “Python or R?”을 매번 묻고 멈추는 동작은 대화형에서는 안전장치지만, 무인 배치에서는 백엔드를 미리 고정해 두는 래핑이 필요합니다.</p>

<p>결론적으로 nature-skills는 “도메인 규칙을 코드로 응축한 버티컬 스킬”의 좋은 사례입니다. 별점이 아니라 36개의 편집 가능한 텍스트 태그 같은 실측 증거로 가치를 판단할 때, 그 설계는 충분히 배울 점이 있습니다.</p>

<h2 id="출처">출처</h2>

<ul>
  <li>nature-skills (GitHub, MIT): <a href="https://github.com/Yuan1z0825/nature-skills">github.com/Yuan1z0825/nature-skills</a></li>
  <li>본 글의 모든 실측 수치는 nature-figure v2.0.0을 직접 클론해 로컬에서 렌더링한 값입니다. 별 개수(약 265)는 검색 기준 추정치입니다.</li>
</ul>]]></content><author><name>{&quot;name&quot;=&gt;nil, &quot;avatar&quot;=&gt;nil, &quot;bio&quot;=&gt;nil, &quot;location&quot;=&gt;&quot;Seoul, Korea&quot;, &quot;email&quot;=&gt;&quot;info@thakicloud.co.kr&quot;, &quot;uri&quot;=&gt;nil, &quot;home&quot;=&gt;nil, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;Website&quot;, &quot;icon&quot;=&gt;&quot;fas fa-fw fa-link&quot;, &quot;url&quot;=&gt;&quot;https://thakicloud.co.kr&quot;}, {&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github&quot;, &quot;url&quot;=&gt;&quot;https://github.com/thakicloud&quot;}]}</name><email>info@thakicloud.co.kr</email></author><category term="dev" /><category term="claude-skills" /><category term="academic-writing" /><category term="matplotlib" /><category term="data-visualization" /><category term="nature-figure" /><category term="skill-marketplace" /><summary type="html"><![CDATA[Nature 저널 기준의 과학 그림 생성과 학술 교열을 묶은 오픈소스 Claude 스킬 패키지 nature-skills를 직접 클론하고, nature-figure로 ThakiCloud 서빙 데이터를 제출 등급 2패널 그림으로 렌더링했습니다. 편집 가능한 SVG 36개 텍스트 태그까지 실측하고, 스킬 마켓플레이스의 버티컬 PMF 관점에서 시사점을 정리합니다.]]></summary></entry></feed>