ArcControl

inference_available

Workflows

ProfilesCustom Profile
Model ProfileQwen 3.6 27B NeoCode Q4 (Preserve Thinking)
/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf
Default: Qwen 3.6 27B Heretic (Preserve Thinking)
Loaded: qwen36-27b-neocoder-q4-preserve
OpenClaw: arc/qwen36-27b-neocoder-q4-preserve
Synchronization: synchronized
API Reference
Base URL: http://arcangle:10101
Action endpoints accept GET or POST; POST is preferred for automation.

Status:
  GET  http://arcangle:10101/api/status
       Current dashboard state.
  GET  http://arcangle:10101/state.json
       Alias for current dashboard state.
  GET  http://arcangle:10101/api/gpu-history?minutes=60
       GPU telemetry history.
  GET  http://arcangle:10101/api/profiles
       Available desired-state profiles and active match.
  GET  http://arcangle:10101/api/model-profiles
       Available llama model profiles and selected model.
  GET  http://arcangle:10101/api/openclaw-sync
       ArcControl/OpenClaw model synchronization state.
  GET  http://arcangle:10101/api/gpu-workloads
       Registered full-use GPU workloads.
  GET  http://arcangle:10101/api/startup-services
       Card services configured to start with ArcControl.
  POST http://arcangle:10101/api/startup-services
       Set whether a card service starts with ArcControl; pass key and enabled.

Inference:
  POST http://arcangle:10101/api/model-profile
       Select llama model profile; pass slug and optional restart.
  POST http://arcangle:10101/api/model-profile-default
       Set the default llama model profile used when active state is missing.
  POST http://arcangle:10101/api/openclaw-sync
       Retry OpenClaw sync for the currently loaded ArcControl model.
  POST http://arcangle:10101/api/do/llama-gpu
       Switch llama.cpp to regular GPU mode.
  POST http://arcangle:10101/api/do/llama-gpu-turbo
       Switch llama.cpp to GPU turboquant mode.
  POST http://arcangle:10101/api/do/llama-cpu
       Switch llama.cpp to CPU mode.
  POST http://arcangle:10101/api/do/llama-stop
       Stop llama.cpp inference.
  POST http://arcangle:10101/api/do/llama-bounce
       Restart the current llama.cpp mode.
  POST http://arcangle:10101/api/bounce/llama
       Alias: restart the current llama.cpp mode.

Whisper / Dictation:
  POST http://arcangle:10101/api/do/voice-gpu
       Start GPU dictation stack and Kokoro GPU.
  POST http://arcangle:10101/api/do/voice-cpu
       Start CPU tiny dictation on port 8000.
  POST http://arcangle:10101/api/do/voice-stop
       Stop dictation and Kokoro.
  POST http://arcangle:10101/api/do/voice-bounce
       Restart the current dictation mode.
  POST http://arcangle:10101/api/bounce/voice
       Alias: restart the current dictation mode.

Kokoro TTS:
  POST http://arcangle:10101/api/do/kokoro-gpu
       Start Kokoro TTS in GPU mode.
  POST http://arcangle:10101/api/do/kokoro-cpu
       Start Kokoro TTS in CPU mode.
  POST http://arcangle:10101/api/do/kokoro-stop
       Stop Kokoro TTS.
  POST http://arcangle:10101/api/do/kokoro-bounce
       Restart the current standalone Kokoro TTS mode.
  POST http://arcangle:10101/api/bounce/kokoro
       Alias: restart the current standalone Kokoro TTS mode.

Sidecars:
  POST http://arcangle:10101/api/do/llama-proxy-start
       Start llama logging proxy on port 19090.
  POST http://arcangle:10101/api/do/llama-proxy-stop
       Stop llama logging proxy on port 19090.
  POST http://arcangle:10101/api/do/ttscleaner-start
       Start TTS Cleaner proxy on port 8881.
  POST http://arcangle:10101/api/do/ttscleaner-stop
       Stop TTS Cleaner proxy on port 8881.
  POST http://arcangle:10101/api/do/kokoro-reader-start
       Start Kokoro Reader on port 9999.
  POST http://arcangle:10101/api/do/kokoro-reader-stop
       Stop Kokoro Reader on port 9999.
  POST http://arcangle:10101/api/do/llama-proxy-bounce
       Restart the llama logging proxy.
  POST http://arcangle:10101/api/do/ttscleaner-bounce
       Restart TTS Cleaner.
  POST http://arcangle:10101/api/do/kokoro-reader-bounce
       Restart Kokoro Reader.
  POST http://arcangle:10101/api/bounce/llama-proxy
       Alias: restart the llama logging proxy.
  POST http://arcangle:10101/api/bounce/ttscleaner
       Alias: restart TTS Cleaner.
  POST http://arcangle:10101/api/bounce/kokoro-reader
       Alias: restart Kokoro Reader.

GPU Workloads:
  POST http://arcangle:10101/api/do/gpu-workload-launch-comfy
       Start ComfyUI in place without changing other services.
  POST http://arcangle:10101/api/do/gpu-workload-start-comfy
       Make room and start ComfyUI.
  POST http://arcangle:10101/api/do/gpu-workload-stop-comfy
       Stop ComfyUI.
  POST http://arcangle:10101/api/do/gpu-workload-bounce-comfy
       Restart ComfyUI in place.
  POST http://arcangle:10101/api/bounce/gpu-workload/comfy
       Alias: restart ComfyUI in place.
  POST http://arcangle:10101/api/do/comfy-start
       Alias: start ComfyUI in place.
  POST http://arcangle:10101/api/do/comfy-start-managed
       Alias: make room and start ComfyUI.
  POST http://arcangle:10101/api/do/comfy-free
       Ask ComfyUI to unload models and free memory.
  POST http://arcangle:10101/api/do/comfy-stop
       Alias: stop ComfyUI.

Workflows:
  POST http://arcangle:10101/api/do/profile-gpu-inference
       Apply GPU Inference profile.
  POST http://arcangle:10101/api/do/profile-gpu-turbo
       Apply experimental GPU Turbo profile.
  POST http://arcangle:10101/api/do/profile-comfy-work
       Apply Comfy Work profile.
  POST http://arcangle:10101/api/do/profile-free-vram
       Apply Free VRAM profile.
  POST http://arcangle:10101/api/do/retask-gpu
       CPU inference, CPU dictation, start ComfyUI.
  POST http://arcangle:10101/api/do/restore-gpu
       Unload ComfyUI models, restore GPU inference and voice.
  POST http://arcangle:10101/api/do/free-vram
       CPU inference and CPU dictation, unload ComfyUI models without stopping ComfyUI.

Model Cache:
  GET  http://arcangle:10101/api/model-cache
       Current inference/voice model cache retouch settings and runtime state.
  POST http://arcangle:10101/api/model-cache
       Set enabled and interval_seconds for warm_models.sh retouching.
  POST http://arcangle:10101/api/do/model-cache-warm
       Run warm_models.sh immediately.

Logs:
  GET  http://arcangle:10101/logs?service=llama&lines=240
       Inference server log.
  GET  http://arcangle:10101/logs?service=voice&lines=120
       Voice orchestration log.
  GET  http://arcangle:10101/logs?service=kokoro&lines=120
       Kokoro log.
  GET  http://arcangle:10101/logs?service=kokoro_reader&lines=120
       Kokoro Reader log.
  GET  http://arcangle:10101/logs?service=comfy&lines=120
       ComfyUI log.

Examples:
  curl -X POST http://arcangle:10101/api/do/llama-gpu-turbo
  curl -X POST http://arcangle:10101/api/do/llama-cpu
  curl -X POST http://arcangle:10101/api/do/llama-stop
  curl -X POST http://arcangle:10101/api/do/voice-cpu
  curl -X POST http://arcangle:10101/api/do/kokoro-stop
  curl http://arcangle:10101/api/status
Cache Inference/Voice Models
model cache disabled
GPU ownerllama.cpp + Whisper
Inference routegpu / running
Voice routegpu / running
TTS routespeaches / running

llama gpu health check passed

0 operation(s)

GPU Load / VRAM Pressure - 5 Min

no samples · 0 samples
VRAM pressure GPU load Power

GPU Load / VRAM Pressure - 1 Hour

no samples · 0 samples
VRAM pressure GPU load Power

GPU Processes

PIDNameVRAM MB
1157395 Whisper dictation 1030 MB / 3.16%
uriel    1157395  0.5  4.3 100794656 5751288 ?   SLsl Jun30   4:41 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app
1434466 llama.cpp 27334 MB / 83.83%
uriel    1434466  2.8 16.0 100095040 21106456 ?  Sl   Jun30  22:35 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {"preserve_thinking":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20

Inference

Statusrunning
Modegpu
HTTPok
PID1156162
Health19091
Start with ArcControl

Dictation

Statusrunning
Modegpu
Port8000
Start with ArcControl

TTS

Statusstopped
Modestopped
PIDNone
Health8880
Start with ArcControl

Llama Proxy

Statusrunning
Port19090
Backendhttp://arcangle:19091
PID1650305
Start with ArcControl

TTS Cleaner

Statusrunning
Port8881
LLMhttp://arcangle:19090
PID3999358
Start with ArcControl

Kokoro Reader

Statusrunning
Port9999
TTS APIhttp://arcangle:8881
PID14846
Start with ArcControl

ComfyUI

Statusstopped
Modestopped
PID
URL8188
Start with ArcControl

Inference Log

Open llama log
[35177] 767.54.752.059 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
[35177] 767.54.754.439 I reasoning-budget: activated, budget=0 tokens
[35177] 767.54.754.441 I reasoning-budget: budget=0, forcing immediately
[35177] 767.54.754.441 I reasoning-budget: forced sequence complete, done
[35177] 767.54.754.558 I slot launch_slot_: id  0 | task 45479 | processing task, is_child = 0
[35177] 767.55.698.807 I slot print_timing: id  0 | task 45479 | prompt eval time =      75.22 ms /    27 tokens (    2.79 ms per token,   358.97 tokens per second)
[35177] 767.55.698.810 I slot print_timing: id  0 | task 45479 |        eval time =     869.01 ms /    57 tokens (   15.25 ms per token,    65.59 tokens per second)
[35177] 767.55.698.810 I slot print_timing: id  0 | task 45479 |       total time =     944.22 ms /    84 tokens
[35177] 767.55.698.811 I slot print_timing: id  0 | task 45479 |    graphs reused =      43670
[35177] 767.55.700.088 I slot      release: id  0 | task 45479 | stop processing: n_tokens = 43014, truncated = 0
[35177] 767.55.700.284 I srv  update_slots: all slots are idle
792.58.655.693 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 768.13.542.376 I srv  params_from_: Chat format: peg-native
[35177] 768.13.587.382 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
[35177] 768.13.589.675 I reasoning-budget: activated, budget=0 tokens
[35177] 768.13.589.677 I reasoning-budget: budget=0, forcing immediately
[35177] 768.13.589.677 I reasoning-budget: forced sequence complete, done
[35177] 768.13.589.780 I slot launch_slot_: id  0 | task 45539 | processing task, is_child = 0
[35177] 768.14.878.815 I slot print_timing: id  0 | task 45539 | prompt eval time =      84.39 ms /    60 tokens (    1.41 ms per token,   711.02 tokens per second)
[35177] 768.14.878.819 I slot print_timing: id  0 | task 45539 |        eval time =    1204.63 ms /    80 tokens (   15.06 ms per token,    66.41 tokens per second)
[35177] 768.14.878.820 I slot print_timing: id  0 | task 45539 |       total time =    1289.01 ms /   140 tokens
[35177] 768.14.878.820 I slot print_timing: id  0 | task 45539 |    graphs reused =      43748
[35177] 768.14.880.076 I slot      release: id  0 | task 45539 | stop processing: n_tokens = 43153, truncated = 0
[35177] 768.14.880.272 I srv  update_slots: all slots are idle
793.00.328.693 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 768.15.228.704 I srv  params_from_: Chat format: peg-native
[35177] 768.15.282.367 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000
[35177] 768.15.285.056 I reasoning-budget: activated, budget=0 tokens
[35177] 768.15.285.059 I reasoning-budget: budget=0, forcing immediately
[35177] 768.15.285.059 I reasoning-budget: forced sequence complete, done
[35177] 768.15.285.183 I slot launch_slot_: id  0 | task 45622 | processing task, is_child = 0
[35177] 768.15.366.817 I slot create_check: id  0 | task 45622 | created context checkpoint 10 of 32 (pos_min = 43154, pos_max = 43154, n_tokens = 43155, size = 149.626 MiB)
[35177] 768.16.637.466 I slot print_timing: id  0 | task 45622 | prompt eval time =     140.72 ms /    57 tokens (    2.47 ms per token,   405.05 tokens per second)
[35177] 768.16.637.469 I slot print_timing: id  0 | task 45622 |        eval time =    1211.54 ms /    80 tokens (   15.14 ms per token,    66.03 tokens per second)
[35177] 768.16.637.469 I slot print_timing: id  0 | task 45622 |       total time =    1352.27 ms /   137 tokens
[35177] 768.16.637.470 I slot print_timing: id  0 | task 45622 |    graphs reused =      43825
[35177] 768.16.638.700 I slot      release: id  0 | task 45622 | stop processing: n_tokens = 43289, truncated = 0
[35177] 768.16.638.885 I srv  update_slots: all slots are idle
793.02.158.713 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 768.17.050.717 I srv  params_from_: Chat format: peg-native
[35177] 768.17.102.733 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000
[35177] 768.17.105.080 I reasoning-budget: activated, budget=0 tokens
[35177] 768.17.105.082 I reasoning-budget: budget=0, forcing immediately
[35177] 768.17.105.082 I reasoning-budget: forced sequence complete, done
[35177] 768.17.105.197 I slot launch_slot_: id  0 | task 45705 | processing task, is_child = 0
[35177] 768.17.256.573 I slot create_check: id  0 | task 45705 | created context checkpoint 11 of 32 (pos_min = 43466, pos_max = 43466, n_tokens = 43467, size = 149.626 MiB)
[35177] 768.18.810.441 I slot print_timing: id  0 | task 45705 | n_decoded =    100, tg =  65.30 t/s
[35177] 768.20.103.277 I slot print_timing: id  0 | task 45705 | prompt eval time =     173.74 ms /   182 tokens (    0.95 ms per token,  1047.52 tokens per second)
[35177] 768.20.103.281 I slot print_timing: id  0 | task 45705 |        eval time =    2824.32 ms /   185 tokens (   15.27 ms per token,    65.50 tokens per second)
[35177] 768.20.103.281 I slot print_timing: id  0 | task 45705 |       total time =    2998.06 ms /   367 tokens
[35177] 768.20.103.282 I slot print_timing: id  0 | task 45705 |    graphs reused =      44007
[35177] 768.20.104.558 I slot      release: id  0 | task 45705 | stop processing: n_tokens = 43655, truncated = 0
[35177] 768.20.104.764 I srv  update_slots: all slots are idle
793.06.770.559 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 768.21.630.462 I srv  params_from_: Chat format: peg-native
[35177] 768.21.680.136 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.845 (> 0.100 thold), f_keep = 0.716
[35177] 768.21.682.419 I reasoning-budget: activated, budget=0 tokens
[35177] 768.21.682.421 I reasoning-budget: budget=0, forcing immediately
[35177] 768.21.682.422 I reasoning-budget: forced sequence complete, done
[35177] 768.21.682.531 I slot launch_slot_: id  0 | task 45893 | processing task, is_child = 0
[35177] 768.21.682.555 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [43466, 43466] against 31267...
[35177] 768.21.682.557 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [43154, 43154] against 31267...
[35177] 768.21.682.557 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [42815, 42815] against 31267...
[35177] 768.21.682.558 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [42456, 42456] against 31267...
[35177] 768.21.682.558 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [42188, 42188] against 31267...
[35177] 768.21.682.559 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [41914, 41914] against 31267...
[35177] 768.21.682.559 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [41602, 41602] against 31267...
[35177] 768.21.682.560 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [41225, 41225] against 31267...
[35177] 768.21.682.560 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [40713, 40713] against 31267...
[35177] 768.21.682.560 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [40258, 40258] against 31267...
[35177] 768.21.682.561 I slot update_slots: id  0 | task 45893 | Checking checkpoint with [39925, 39925] against 31267...
[35177] 768.21.682.561 W slot update_slots: id  0 | task 45893 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
[35177] 768.21.682.566 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 39925, pos_max = 39925, n_tokens = 39926, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.693.936 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 40258, pos_max = 40258, n_tokens = 40259, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.707.553 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 40713, pos_max = 40713, n_tokens = 40714, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.720.686 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 41225, pos_max = 41225, n_tokens = 41226, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.734.447 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 41602, pos_max = 41602, n_tokens = 41603, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.744.172 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 41914, pos_max = 41914, n_tokens = 41915, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.753.458 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 42188, pos_max = 42188, n_tokens = 42189, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.765.676 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 42456, pos_max = 42456, n_tokens = 42457, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.776.975 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 42815, pos_max = 42815, n_tokens = 42816, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.788.948 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 43154, pos_max = 43154, n_tokens = 43155, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.21.800.354 W slot update_slots: id  0 | task 45893 | erased invalidated context checkpoint (pos_min = 43466, pos_max = 43466, n_tokens = 43467, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 768.25.170.633 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  12288, progress = 0.33, t =   3.49 s / 3522.85 tokens per second
[35177] 768.25.775.502 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  14336, progress = 0.39, t =   4.09 s / 3502.60 tokens per second
[35177] 768.26.391.021 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  16384, progress = 0.44, t =   4.71 s / 3479.68 tokens per second
[35177] 768.27.013.537 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  18432, progress = 0.50, t =   5.33 s / 3457.52 tokens per second
[35177] 768.27.646.817 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  20480, progress = 0.55, t =   5.96 s / 3433.78 tokens per second
[35177] 768.28.288.980 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  22528, progress = 0.61, t =   6.61 s / 3410.01 tokens per second
[35177] 768.28.942.291 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  24576, progress = 0.66, t =   7.26 s / 3385.24 tokens per second
[35177] 768.29.610.285 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  26624, progress = 0.72, t =   7.93 s / 3358.33 tokens per second
[35177] 768.30.296.166 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  28672, progress = 0.77, t =   8.61 s / 3328.68 tokens per second
[35177] 768.30.997.546 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  30720, progress = 0.83, t =   9.32 s / 3297.91 tokens per second
[35177] 768.31.715.129 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  32768, progress = 0.89, t =  10.03 s / 3266.16 tokens per second
[35177] 768.32.453.339 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  34816, progress = 0.94, t =  10.77 s / 3232.44 tokens per second
[35177] 768.33.149.036 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  36499, progress = 0.99, t =  11.47 s / 3183.10 tokens per second
[35177] 768.33.282.533 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  36975, progress = 1.00, t =  11.60 s / 3187.50 tokens per second
[35177] 768.33.361.321 I slot create_check: id  0 | task 45893 | created context checkpoint 1 of 32 (pos_min = 36974, pos_max = 36974, n_tokens = 36975, size = 149.626 MiB)
[35177] 768.33.381.838 I slot print_timing: id  0 | task 45893 | prompt processing, n_tokens =  37011, progress = 1.00, t =  11.70 s / 3163.52 tokens per second
[35177] 768.33.491.888 I slot print_timing: id  0 | task 45893 | prompt eval time =   11730.70 ms / 37015 tokens (    0.32 ms per token,  3155.39 tokens per second)
[35177] 768.33.491.891 I slot print_timing: id  0 | task 45893 |        eval time =      78.64 ms /     6 tokens (   13.11 ms per token,    76.30 tokens per second)
[35177] 768.33.491.891 I slot print_timing: id  0 | task 45893 |       total time =   11809.34 ms / 37021 tokens
[35177] 768.33.491.892 I slot print_timing: id  0 | task 45893 |    graphs reused =      44011
[35177] 768.33.493.020 I slot      release: id  0 | task 45893 | stop processing: n_tokens = 37020, truncated = 0
[35177] 768.33.493.208 I srv  update_slots: all slots are idle
820.19.685.429 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 795.34.541.915 I srv  params_from_: Chat format: peg-native
[35177] 795.34.590.685 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 0.999
[35177] 795.34.593.418 I reasoning-budget: activated, budget=0 tokens
[35177] 795.34.593.420 I reasoning-budget: budget=0, forcing immediately
[35177] 795.34.593.420 I reasoning-budget: forced sequence complete, done
[35177] 795.34.593.523 I slot launch_slot_: id  0 | task 45920 | processing task, is_child = 0
[35177] 795.34.593.549 I slot update_slots: id  0 | task 45920 | Checking checkpoint with [36974, 36974] against 36995...
[35177] 795.34.616.337 W slot update_slots: id  0 | task 45920 | restored context checkpoint (pos_min = 36974, pos_max = 36974, n_tokens = 36975, n_past = 36975, size = 149.626 MiB)
[35177] 795.34.751.012 I slot print_timing: id  0 | task 45920 | prompt eval time =      75.95 ms /    40 tokens (    1.90 ms per token,   526.64 tokens per second)
[35177] 795.34.751.016 I slot print_timing: id  0 | task 45920 |        eval time =      81.52 ms /     6 tokens (   13.59 ms per token,    73.60 tokens per second)
[35177] 795.34.751.016 I slot print_timing: id  0 | task 45920 |       total time =     157.47 ms /    46 tokens
[35177] 795.34.751.017 I slot print_timing: id  0 | task 45920 |    graphs reused =      44015
[35177] 795.34.752.409 I slot      release: id  0 | task 45920 | stop processing: n_tokens = 37020, truncated = 0
[35177] 795.34.752.636 I srv  update_slots: all slots are idle

Other Logs

2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:27-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:27-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:27-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:28-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:30-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:30-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:30-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:31-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:31-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T04:42:34-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /favicon/favicon.ico HTTP/1.1" 200 -
2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -

State File

{
  "transitioning": false,
  "gpu_state": "inference_available",
  "message": "llama gpu health check passed",
  "inference": {
    "status": "running",
    "mode": "gpu",
    "pid": 1156162,
    "launcher_pid": null,
    "pgid": 1156162,
    "port": 19091,
    "url": "http://arcangle:19091",
    "health_url": "http://arcangle:19091/health",
    "port_listening": true,
    "http_ok": true,
    "http_probe": {
      "ok": true,
      "status": 200,
      "body": "{\"status\":\"ok\"}"
    },
    "log": "/servers/run/state/logs/llama.log",
    "model_profile": "qwen36-27b-neocoder-q4-preserve",
    "model_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
    "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf"
  },
  "speech_input": {
    "status": "running",
    "mode": "gpu",
    "port": 8000,
    "url": "http://arcangle:8000",
    "tts": {
      "service": "kokoro",
      "status": "stopped",
      "mode": "stopped",
      "pid": null,
      "launcher_pid": null,
      "pgid": 1776540,
      "port": 8880,
      "url": "http://arcangle:8880",
      "health_url": "http://arcangle:8880/health",
      "port_listening": false,
      "http_ok": false,
      "http_probe": {
        "ok": false,
        "error": "port closed"
      },
      "log": "/servers/run/state/logs/kokoro.log"
    },
    "note": "Port 8000 remains the dictation endpoint; GPU dictation is served by /servers/speaches.",
    "log": "/servers/run/state/logs/voice.log",
    "containers": {
      "vad-shim": false,
      "speaches": true,
      "whisper-stt": false
    }
  },
  "text_to_speech": {
    "service": "kokoro",
    "status": "stopped",
    "mode": "stopped",
    "pid": null,
    "launcher_pid": null,
    "pgid": 1776540,
    "port": 8880,
    "url": "http://arcangle:8880",
    "health_url": "http://arcangle:8880/health",
    "port_listening": false,
    "http_ok": false,
    "http_probe": {
      "ok": false,
      "error": "port closed"
    },
    "log": "/servers/run/state/logs/kokoro.log"
  },
  "gpu_workload": {
    "service": null,
    "status": "stopped",
    "mode": "stopped",
    "pid": null,
    "pgid": null,
    "port": 8188,
    "url": "http://arcangle:8188",
    "log": "/servers/run/state/logs/comfy.log"
  },
  "gpu_workloads": {
    "file": "/servers/run/state/gpu-workloads.json",
    "items": [
      {
        "label": "ComfyUI",
        "slug": "comfy",
        "description": "ComfyUI image/video workflow",
        "full_use": true,
        "status": "stopped",
        "mode": "stopped",
        "pid": null,
        "pgid": null,
        "port": 8188,
        "url": "http://arcangle:8188",
        "log": "/servers/run/state/logs/comfy.log"
      }
    ]
  },
  "sidecars": {
    "llama_proxy": {
      "service": "llama_proxy",
      "label": "Llama Proxy",
      "status": "running",
      "mode": "proxy",
      "pid": 1650305,
      "pgid": 1650305,
      "port": 19090,
      "url": "http://arcangle:19090",
      "health_url": "http://arcangle:19090/health",
      "port_listening": true,
      "http_ok": true,
      "http_probe": {
        "ok": true,
        "status": 200,
        "body": "{\"status\":\"ok\"}"
      },
      "backend": "http://arcangle:19091",
      "log": "/servers/run/state/logs/llama-proxy.log"
    },
    "ttscleaner": {
      "service": "ttscleaner",
      "label": "TTS Cleaner",
      "status": "running",
      "mode": "proxy",
      "pid": 3999358,
      "port": 8881,
      "url": "http://arcangle:8881",
      "health_url": "http://arcangle:8881/health",
      "port_listening": true,
      "http_ok": true,
      "http_probe": {
        "ok": true,
        "status": 200,
        "body": "{\"message\":\"OK\"}"
      },
      "backend": "http://arcangle:8000",
      "llm_backend": "http://arcangle:19090",
      "log": "/servers/ttscleaner/proxy.log"
    },
    "kokoro_reader": {
      "service": "kokoro_reader",
      "label": "Kokoro Reader",
      "status": "running",
      "mode": "reader",
      "pid": 14846,
      "port": 9999,
      "url": "http://arcangle:9999",
      "health_url": "http://arcangle:9999",
      "port_listening": true,
      "http_ok": true,
      "http_probe": {
        "ok": true,
        "status": 200,
        "body": "<!doctype html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"utf-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n    <meta name=\"theme-color\" content=\"#101820\">\n    <title>Kokoro Reader</title>\n    <link rel=\"manifest\" href=\"/s"
      },
      "backend": "http://arcangle:8881",
      "log": "/servers/kokoro-reader/kokoro-reader.log"
    }
  },
  "nvidia": {
    "available": true,
    "gpus": [
      {
        "name": "NVIDIA GeForce RTX 5090",
        "memory_used_mb": "28393",
        "memory_total_mb": "32607",
        "gpu_util_percent": "0",
        "power_draw_w": "58.87",
        "power_limit_w": "575.00",
        "temperature_c": "41",
        "memory_used_percent": 87.08,
        "power_percent": 10.24,
        "util_percent": 0.0
      }
    ],
    "processes": {
      "available": true,
      "items": [
        {
          "pid": 1157395,
          "process_name": "/home/ubuntu/speaches/.venv/bin/python",
          "friendly_name": "Whisper dictation",
          "used_gpu_memory_mb": 1030.0,
          "gpu_capacity_mb": 32607.0,
          "used_gpu_memory_percent": 3.16,
          "ps_aux": "uriel    1157395  0.5  4.3 100794656 5751288 ?   SLsl Jun30   4:41 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app"
        },
        {
          "pid": 1434466,
          "process_name": "/servers/llcpp6/build/bin/llama-server",
          "friendly_name": "llama.cpp",
          "used_gpu_memory_mb": 27334.0,
          "gpu_capacity_mb": 32607.0,
          "used_gpu_memory_percent": 83.83,
          "ps_aux": "uriel    1434466  2.8 16.0 100095040 21106456 ?  Sl   Jun30  22:35 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {\"preserve_thinking\":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20"
        }
      ]
    }
  },
  "routing": {
    "gpu_owner": "llama.cpp + Whisper",
    "inference_route": "gpu / running",
    "voice_route": "gpu / running",
    "tts_route": "speaches / running"
  },
  "progress": {
    "active": false,
    "percent": 100,
    "label": "llama gpu health check passed",
    "operations": []
  },
  "model_cache": {
    "comfy_cpu_limit_percent": 80,
    "enabled": false,
    "gpu_util_limit_percent": 40,
    "guard": {
      "blocked": false,
      "blocked_reasons": [],
      "comfy_cpu_percent": 0.0,
      "gpu_util_percent": 0.0
    },
    "guard_window_seconds": 120,
    "interval_seconds": 120,
    "last_blocked_at": "2026-07-01T04:15:30-04:00",
    "last_exit_code": 0,
    "last_finished_at": "2026-06-12T19:04:02-04:00",
    "last_run_at": "2026-06-12T19:04:02-04:00",
    "last_started_at": "2026-06-12T19:04:01-04:00",
    "lists": {
      "inference": "/servers/run/cache/inference-active.txt",
      "stt": "/servers/run/cache/stt.txt",
      "tts": "/servers/run/cache/tts.txt"
    },
    "message": "model cache disabled",
    "next_run_due_at": "2026-06-12T19:06:02-04:00",
    "running": false,
    "script": "/servers/run/warm_models.sh",
    "settings": {
      "enabled": false,
      "interval_seconds": 120
    },
    "updated_at": "2026-07-01T04:42:33-04:00"
  },
  "startup_services": {
    "file": "/servers/run/state/startup-services.json",
    "services": [
      {
        "key": "inference",
        "label": "Inference",
        "description": "Start selected llama GPU model when ArcControl starts.",
        "action": "llama-gpu",
        "enabled": true
      },
      {
        "key": "speech_input",
        "label": "Dictation",
        "description": "Start GPU dictation/Speaches when ArcControl starts.",
        "action": "voice-gpu",
        "enabled": false
      },
      {
        "key": "text_to_speech",
        "label": "Standalone Kokoro",
        "description": "Start standalone Kokoro GPU service when ArcControl starts.",
        "action": "kokoro-gpu",
        "enabled": false
      },
      {
        "key": "sidecars.llama_proxy",
        "label": "Llama Proxy",
        "description": "Start llama logging proxy when ArcControl starts.",
        "action": "llama-proxy-start",
        "enabled": true
      },
      {
        "key": "sidecars.ttscleaner",
        "label": "TTS Cleaner",
        "description": "Start TTS Cleaner when ArcControl starts.",
        "action": "ttscleaner-start",
        "enabled": true
      },
      {
        "key": "sidecars.kokoro_reader",
        "label": "Kokoro Reader",
        "description": "Start Kokoro Reader when ArcControl starts.",
        "action": "kokoro-reader-start",
        "enabled": true
      },
      {
        "key": "gpu_workload.comfy",
        "label": "ComfyUI",
        "description": "Start ComfyUI when ArcControl starts.",
        "action": "gpu-workload-launch-comfy",
        "enabled": false
      }
    ],
    "updated_at": "2026-06-30T15:01:58-04:00"
  },
  "model_profiles": {
    "active": "qwen36-27b-neocoder-q4-preserve",
    "active_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
    "default": "qwen36-27b-heretic-preserve",
    "default_label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
    "items": [
      {
        "slug": "gemma4-12b-q4-heretic",
        "label": "Gemma 4 12B Q4 Heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/gemma4-12b-q4-heretic",
        "model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
        "size_gb": 6.9,
        "size_label": "6.9 GB",
        "cache_files": [
          "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
          "/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/gemma4-12b-q4-heretic.json"
      },
      {
        "slug": "gemma4-12b-q6-heretic",
        "label": "Gemma 4 12B Q6 Heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/gemma4-12b-q6-heretic",
        "model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
        "size_gb": 9.1,
        "size_label": "9.1 GB",
        "cache_files": [
          "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
          "/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/gemma4-12b-q6-heretic.json"
      },
      {
        "slug": "gemma4-26b-a4b-heretic",
        "label": "gemma4-26B-A4B-heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/gemma4-26b-a4b-heretic",
        "model_path": "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
        "size_gb": 15.6,
        "size_label": "15.6 GB",
        "cache_files": [
          "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
          "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.mmproj-f16.gguf"
        ],
        "path": "/servers/run/model-profiles/gemma4-26b-a4b-heretic.json"
      },
      {
        "slug": "qwen36-27b-heretic-preserve",
        "label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": true,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-heretic-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
        "size_gb": 16.0,
        "size_label": "16.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-heretic-preserve.json"
      },
      {
        "slug": "qwen36-27b-heretic",
        "label": "Qwen 3.6 27B Heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-heretic",
        "model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
        "size_gb": 16.0,
        "size_label": "16.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-heretic.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-nl-preserve",
        "label": "Qwen 3.6 27B NeoCode IQ4_NL (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
        "size_gb": 15.0,
        "size_label": "15.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-nl",
        "label": "Qwen 3.6 27B NeoCode IQ4_NL",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
        "size_gb": 15.0,
        "size_label": "15.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-xs-preserve",
        "label": "Qwen 3.6 27B NeoCode IQ4_XS (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
        "size_gb": 14.3,
        "size_label": "14.3 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-xs",
        "label": "Qwen 3.6 27B NeoCode IQ4_XS",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
        "size_gb": 14.3,
        "size_label": "14.3 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q4-preserve",
        "label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
        "description": "",
        "active": true,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q4-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
        "size_gb": 15.7,
        "size_label": "15.7 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q4",
        "label": "Qwen 3.6 27B NeoCode Q4",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q4",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
        "size_gb": 15.7,
        "size_label": "15.7 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q6-preserve",
        "label": "Qwen 3.6 27B NeoCode Q6 (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q6-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
        "size_gb": 20.9,
        "size_label": "20.9 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q6",
        "label": "Qwen 3.6 27B NeoCode Q6",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q6",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
        "size_gb": 20.9,
        "size_label": "20.9 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6.json"
      },
      {
        "slug": "qwen36-unc",
        "label": "Qwen3.6 35B Uncensored",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-unc",
        "model_path": "/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
        "size_gb": 19.7,
        "size_label": "19.7 GB",
        "cache_files": [
          "/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
          "/models/Q3.6/mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-unc.json"
      }
    ],
    "dir": "/servers/run/model-profiles",
    "state_file": "/servers/run/state/active-model-profile.json",
    "default_state_file": "/servers/run/state/default-model-profile.json",
    "cache_file": "/servers/run/cache/inference-active.txt"
  },
  "openclaw_sync": {
    "status": "synchronized",
    "expected_model": "qwen36-27b-neocoder-q4-preserve",
    "loaded_model": "qwen36-27b-neocoder-q4-preserve",
    "loaded_models": [
      "qwen36-27b-neocoder-q4-preserve"
    ],
    "expected_openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
    "openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
    "synchronized": true,
    "error": "",
    "sessions": {
      "stale_local": 0,
      "stale_keys": [],
      "cloud_overrides": 1,
      "checked": 43
    },
    "state_file": "/servers/run/state/openclaw-sync-state.json"
  },
  "state_file": "/servers/run/state/system-state.json",
  "updated_at": "2026-07-01T04:42:36-04:00",
  "profiles": {
    "active": "custom",
    "active_label": "Custom Profile",
    "items": [
      {
        "slug": "gpu-inference",
        "label": "GPU Inference",
        "description": "GPU llama, GPU dictation, GPU Kokoro; unload Comfy models if Comfy is running; model cache off.",
        "desired": {
          "llama": "gpu",
          "voice": "gpu",
          "kokoro": "gpu",
          "comfy": "unload_models",
          "model_cache": false
        },
        "momentary": false,
        "active": false
      },
      {
        "slug": "gpu-turbo",
        "label": "GPU Turbo",
        "description": "Experimental turboquant llama profile; GPU dictation and Kokoro; unload Comfy models if Comfy is running; model cache off.",
        "desired": {
          "llama": "gpu-turbo",
          "voice": "gpu",
          "kokoro": "gpu",
          "comfy": "unload_models",
          "model_cache": false
        },
        "momentary": false,
        "active": false
      },
      {
        "slug": "comfy-work",
        "label": "Comfy Work",
        "description": "CPU inference, CPU-lite dictation, Comfy running, model files kept warm.",
        "desired": {
          "llama": "cpu",
          "voice": "cpu-lite",
          "kokoro": "stopped",
          "comfy": "running",
          "model_cache": true
        },
        "momentary": false,
        "active": false
      },
      {
        "slug": "free-vram",
        "label": "Free VRAM",
        "description": "One-shot VRAM cleanup: CPU inference and dictation, Kokoro stopped, Comfy kept running with models unloaded.",
        "desired": {
          "llama": "cpu",
          "voice": "cpu-lite",
          "kokoro": "stopped",
          "comfy": "unload_models",
          "model_cache": true
        },
        "momentary": true,
        "active": false
      }
    ],
    "watch": {
      "enabled": false,
      "profile": "gpu-inference",
      "updated_at": "2026-06-03T18:03:26-04:00"
    },
    "file": "/servers/run/state/profiles.json"
  }
}