ArcControl

inference_available

Workflows

ProfilesCustom Profile
Model ProfileQwen 3.6 27B NeoCode Q4 (Preserve Thinking)
/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf
Default: Qwen 3.6 27B Heretic (Preserve Thinking)
Loaded: qwen36-27b-neocoder-q4-preserve
OpenClaw: arc/qwen36-27b-neocoder-q4-preserve
Synchronization: synchronized
API Reference
Base URL: http://arcangle:10101
Action endpoints accept GET or POST; POST is preferred for automation.

Status:
  GET  http://arcangle:10101/api/status
       Current dashboard state.
  GET  http://arcangle:10101/state.json
       Alias for current dashboard state.
  GET  http://arcangle:10101/api/gpu-history?minutes=60
       GPU telemetry history.
  GET  http://arcangle:10101/api/profiles
       Available desired-state profiles and active match.
  GET  http://arcangle:10101/api/model-profiles
       Available llama model profiles and selected model.
  GET  http://arcangle:10101/api/openclaw-sync
       ArcControl/OpenClaw model synchronization state.
  GET  http://arcangle:10101/api/gpu-workloads
       Registered full-use GPU workloads.
  GET  http://arcangle:10101/api/startup-services
       Card services configured to start with ArcControl.
  POST http://arcangle:10101/api/startup-services
       Set whether a card service starts with ArcControl; pass key and enabled.

Inference:
  POST http://arcangle:10101/api/model-profile
       Select llama model profile; pass slug and optional restart.
  POST http://arcangle:10101/api/model-profile-default
       Set the default llama model profile used when active state is missing.
  POST http://arcangle:10101/api/openclaw-sync
       Retry OpenClaw sync for the currently loaded ArcControl model.
  POST http://arcangle:10101/api/do/llama-gpu
       Switch llama.cpp to regular GPU mode.
  POST http://arcangle:10101/api/do/llama-gpu-turbo
       Switch llama.cpp to GPU turboquant mode.
  POST http://arcangle:10101/api/do/llama-cpu
       Switch llama.cpp to CPU mode.
  POST http://arcangle:10101/api/do/llama-stop
       Stop llama.cpp inference.
  POST http://arcangle:10101/api/do/llama-bounce
       Restart the current llama.cpp mode.
  POST http://arcangle:10101/api/bounce/llama
       Alias: restart the current llama.cpp mode.

Whisper / Dictation:
  POST http://arcangle:10101/api/do/voice-gpu
       Start GPU dictation stack and Kokoro GPU.
  POST http://arcangle:10101/api/do/voice-cpu
       Start CPU tiny dictation on port 8000.
  POST http://arcangle:10101/api/do/voice-stop
       Stop dictation and Kokoro.
  POST http://arcangle:10101/api/do/voice-bounce
       Restart the current dictation mode.
  POST http://arcangle:10101/api/bounce/voice
       Alias: restart the current dictation mode.

Kokoro TTS:
  POST http://arcangle:10101/api/do/kokoro-gpu
       Start Kokoro TTS in GPU mode.
  POST http://arcangle:10101/api/do/kokoro-cpu
       Start Kokoro TTS in CPU mode.
  POST http://arcangle:10101/api/do/kokoro-stop
       Stop Kokoro TTS.
  POST http://arcangle:10101/api/do/kokoro-bounce
       Restart the current standalone Kokoro TTS mode.
  POST http://arcangle:10101/api/bounce/kokoro
       Alias: restart the current standalone Kokoro TTS mode.

Sidecars:
  POST http://arcangle:10101/api/do/llama-proxy-start
       Start llama logging proxy on port 19090.
  POST http://arcangle:10101/api/do/llama-proxy-stop
       Stop llama logging proxy on port 19090.
  POST http://arcangle:10101/api/do/ttscleaner-start
       Start TTS Cleaner proxy on port 8881.
  POST http://arcangle:10101/api/do/ttscleaner-stop
       Stop TTS Cleaner proxy on port 8881.
  POST http://arcangle:10101/api/do/kokoro-reader-start
       Start Kokoro Reader on port 9999.
  POST http://arcangle:10101/api/do/kokoro-reader-stop
       Stop Kokoro Reader on port 9999.
  POST http://arcangle:10101/api/do/llama-proxy-bounce
       Restart the llama logging proxy.
  POST http://arcangle:10101/api/do/ttscleaner-bounce
       Restart TTS Cleaner.
  POST http://arcangle:10101/api/do/kokoro-reader-bounce
       Restart Kokoro Reader.
  POST http://arcangle:10101/api/bounce/llama-proxy
       Alias: restart the llama logging proxy.
  POST http://arcangle:10101/api/bounce/ttscleaner
       Alias: restart TTS Cleaner.
  POST http://arcangle:10101/api/bounce/kokoro-reader
       Alias: restart Kokoro Reader.

GPU Workloads:
  POST http://arcangle:10101/api/do/gpu-workload-launch-comfy
       Start ComfyUI in place without changing other services.
  POST http://arcangle:10101/api/do/gpu-workload-start-comfy
       Make room and start ComfyUI.
  POST http://arcangle:10101/api/do/gpu-workload-stop-comfy
       Stop ComfyUI.
  POST http://arcangle:10101/api/do/gpu-workload-bounce-comfy
       Restart ComfyUI in place.
  POST http://arcangle:10101/api/bounce/gpu-workload/comfy
       Alias: restart ComfyUI in place.
  POST http://arcangle:10101/api/do/comfy-start
       Alias: start ComfyUI in place.
  POST http://arcangle:10101/api/do/comfy-start-managed
       Alias: make room and start ComfyUI.
  POST http://arcangle:10101/api/do/comfy-free
       Ask ComfyUI to unload models and free memory.
  POST http://arcangle:10101/api/do/comfy-stop
       Alias: stop ComfyUI.

Workflows:
  POST http://arcangle:10101/api/do/profile-gpu-inference
       Apply GPU Inference profile.
  POST http://arcangle:10101/api/do/profile-gpu-turbo
       Apply experimental GPU Turbo profile.
  POST http://arcangle:10101/api/do/profile-comfy-work
       Apply Comfy Work profile.
  POST http://arcangle:10101/api/do/profile-free-vram
       Apply Free VRAM profile.
  POST http://arcangle:10101/api/do/retask-gpu
       CPU inference, CPU dictation, start ComfyUI.
  POST http://arcangle:10101/api/do/restore-gpu
       Unload ComfyUI models, restore GPU inference and voice.
  POST http://arcangle:10101/api/do/free-vram
       CPU inference and CPU dictation, unload ComfyUI models without stopping ComfyUI.

Model Cache:
  GET  http://arcangle:10101/api/model-cache
       Current inference/voice model cache retouch settings and runtime state.
  POST http://arcangle:10101/api/model-cache
       Set enabled and interval_seconds for warm_models.sh retouching.
  POST http://arcangle:10101/api/do/model-cache-warm
       Run warm_models.sh immediately.

Logs:
  GET  http://arcangle:10101/logs?service=llama&lines=240
       Inference server log.
  GET  http://arcangle:10101/logs?service=voice&lines=120
       Voice orchestration log.
  GET  http://arcangle:10101/logs?service=kokoro&lines=120
       Kokoro log.
  GET  http://arcangle:10101/logs?service=kokoro_reader&lines=120
       Kokoro Reader log.
  GET  http://arcangle:10101/logs?service=comfy&lines=120
       ComfyUI log.

Examples:
  curl -X POST http://arcangle:10101/api/do/llama-gpu-turbo
  curl -X POST http://arcangle:10101/api/do/llama-cpu
  curl -X POST http://arcangle:10101/api/do/llama-stop
  curl -X POST http://arcangle:10101/api/do/voice-cpu
  curl -X POST http://arcangle:10101/api/do/kokoro-stop
  curl http://arcangle:10101/api/status
Cache Inference/Voice Models
model cache disabled
GPU ownerllama.cpp + Whisper
Inference routegpu / running
Voice routegpu / running
TTS routespeaches / running

llama gpu health check passed

0 operation(s)

GPU Load / VRAM Pressure - 5 Min

no samples · 0 samples
VRAM pressure GPU load Power

GPU Load / VRAM Pressure - 1 Hour

no samples · 0 samples
VRAM pressure GPU load Power

GPU Processes

PIDNameVRAM MB
1157395 Whisper dictation 1030 MB / 3.16%
uriel    1157395  0.5  4.3 100794656 5751292 ?   SLsl Jun30   4:29 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app
1434466 llama.cpp 27334 MB / 83.83%
uriel    1434466  2.5 15.7 99724352 20720796 ?   Sl   Jun30  18:31 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {"preserve_thinking":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20

Inference

Statusrunning
Modegpu
HTTPok
PID1156162
Health19091
Start with ArcControl

Dictation

Statusrunning
Modegpu
Port8000
Start with ArcControl

TTS

Statusstopped
Modestopped
PIDNone
Health8880
Start with ArcControl

Llama Proxy

Statusrunning
Port19090
Backendhttp://arcangle:19091
PID1650305
Start with ArcControl

TTS Cleaner

Statusrunning
Port8881
LLMhttp://arcangle:19090
PID3999358
Start with ArcControl

Kokoro Reader

Statusrunning
Port9999
TTS APIhttp://arcangle:8881
PID14846
Start with ArcControl

ComfyUI

Statusstopped
Modestopped
PID
URL8188
Start with ArcControl

Inference Log

Open llama log
[35177] 705.35.277.467 I reasoning-budget: forced sequence complete, done
[35177] 705.35.277.508 I slot launch_slot_: id  0 | task 34710 | processing task, is_child = 0
[35177] 705.35.277.535 I slot update_slots: id  0 | task 34710 | Checking checkpoint with [73487, 73487] against 31267...
[35177] 705.35.277.537 I slot update_slots: id  0 | task 34710 | Checking checkpoint with [73165, 73165] against 31267...
[35177] 705.35.277.537 I slot update_slots: id  0 | task 34710 | Checking checkpoint with [72702, 72702] against 31267...
[35177] 705.35.277.537 I slot update_slots: id  0 | task 34710 | Checking checkpoint with [72148, 72148] against 31267...
[35177] 705.35.277.538 W slot update_slots: id  0 | task 34710 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
[35177] 705.35.277.539 W slot update_slots: id  0 | task 34710 | erased invalidated context checkpoint (pos_min = 72148, pos_max = 72148, n_tokens = 72149, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 705.35.285.995 W slot update_slots: id  0 | task 34710 | erased invalidated context checkpoint (pos_min = 72702, pos_max = 72702, n_tokens = 72703, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 705.35.295.106 W slot update_slots: id  0 | task 34710 | erased invalidated context checkpoint (pos_min = 73165, pos_max = 73165, n_tokens = 73166, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 705.35.303.179 W slot update_slots: id  0 | task 34710 | erased invalidated context checkpoint (pos_min = 73487, pos_max = 73487, n_tokens = 73488, n_swa = 0, pos_next = 0, size = 149.626 MiB)
[35177] 705.38.673.115 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  12288, progress = 0.33, t =   3.40 s / 3618.81 tokens per second
[35177] 705.39.275.287 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  14336, progress = 0.39, t =   4.00 s / 3586.00 tokens per second
[35177] 705.39.887.256 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  16384, progress = 0.44, t =   4.61 s / 3554.22 tokens per second
[35177] 705.40.509.824 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  18432, progress = 0.50, t =   5.23 s / 3522.73 tokens per second
[35177] 705.41.142.600 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  20480, progress = 0.55, t =   5.87 s / 3491.85 tokens per second
[35177] 705.41.787.891 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  22528, progress = 0.61, t =   6.51 s / 3460.33 tokens per second
[35177] 705.42.444.770 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  24576, progress = 0.66, t =   7.17 s / 3428.93 tokens per second
[35177] 705.43.111.507 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  26624, progress = 0.72, t =   7.83 s / 3398.53 tokens per second
[35177] 705.43.796.645 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  28672, progress = 0.77, t =   8.52 s / 3365.60 tokens per second
[35177] 705.44.496.580 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  30720, progress = 0.83, t =   9.22 s / 3332.23 tokens per second
[35177] 705.45.213.621 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  32768, progress = 0.89, t =   9.94 s / 3297.87 tokens per second
[35177] 705.45.953.201 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  34816, progress = 0.94, t =  10.68 s / 3261.24 tokens per second
[35177] 705.46.647.019 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  36499, progress = 0.99, t =  11.37 s / 3210.26 tokens per second
[35177] 705.46.780.528 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  36975, progress = 1.00, t =  11.50 s / 3214.38 tokens per second
[35177] 705.46.859.339 I slot create_check: id  0 | task 34710 | created context checkpoint 1 of 32 (pos_min = 36974, pos_max = 36974, n_tokens = 36975, size = 149.626 MiB)
[35177] 705.46.879.812 I slot print_timing: id  0 | task 34710 | prompt processing, n_tokens =  37011, progress = 1.00, t =  11.60 s / 3189.97 tokens per second
[35177] 705.46.992.459 I slot print_timing: id  0 | task 34710 | prompt eval time =   11633.66 ms / 37015 tokens (    0.31 ms per token,  3181.72 tokens per second)
[35177] 705.46.992.461 I slot print_timing: id  0 | task 34710 |        eval time =      81.28 ms /     6 tokens (   13.55 ms per token,    73.82 tokens per second)
[35177] 705.46.992.462 I slot print_timing: id  0 | task 34710 |       total time =   11714.93 ms / 37021 tokens
[35177] 705.46.992.463 I slot print_timing: id  0 | task 34710 |    graphs reused =      33350
[35177] 705.46.993.570 I slot      release: id  0 | task 34710 | stop processing: n_tokens = 37020, truncated = 0
[35177] 705.46.993.626 I srv  update_slots: all slots are idle
733.48.658.478 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 709.03.467.235 I srv  params_from_: Chat format: peg-native
[35177] 709.03.483.111 I slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 330765934925
[35177] 709.03.483.114 I srv  get_availabl: updating prompt cache
[35177] 709.03.483.980 W srv   prompt_save:  - saving prompt with length 37020, total state size = 1379.512 MiB (draft: 0.000 MiB)
[35177] 709.04.151.069 I srv          load:  - looking for better prompt, base f_keep = 0.001, sim = 0.002
[35177] 709.04.151.084 I srv          load:  - found better prompt with f_keep = 0.936, sim = 0.948
[35177] 709.04.238.903 I srv        update:  - cache state: 11 prompts, 17457.405 MiB (limits: 32768.000 MiB, 262144 tokens, 648178 est)
[35177] 709.04.238.907 I srv        update:    - prompt 0x7090180595d0:      97 tokens, checkpoints:  1,   302.475 MiB
[35177] 709.04.238.908 I srv        update:    - prompt 0x709084b75d30:      37 tokens, checkpoints:  1,   300.482 MiB
[35177] 709.04.238.908 I srv        update:    - prompt 0x70908cb79950:    1607 tokens, checkpoints:  2,   502.267 MiB
[35177] 709.04.238.909 I srv        update:    - prompt 0x70908c724f30:   66195 tokens, checkpoints:  7,  3396.153 MiB
[35177] 709.04.238.909 I srv        update:    - prompt 0x709084189000:   67118 tokens, checkpoints:  1,  2529.060 MiB
[35177] 709.04.238.909 I srv        update:    - prompt 0x708b9872f0e0:   70834 tokens, checkpoints: 10,  3999.149 MiB
[35177] 709.04.238.910 I srv        update:    - prompt 0x5fc9b8f35180:    1681 tokens, checkpoints:  2,   504.725 MiB
[35177] 709.04.238.910 I srv        update:    - prompt 0x5fc9b8bd9250:    2250 tokens, checkpoints:  2,   523.629 MiB
[35177] 709.04.238.910 I srv        update:    - prompt 0x7090b45f7290:   73909 tokens, checkpoints:  1,  2754.672 MiB
[35177] 709.04.238.911 I srv        update:    - prompt 0x7090844cce60:   24574 tokens, checkpoints:  1,  1115.655 MiB
[35177] 709.04.238.911 I srv        update:    - prompt 0x70906c1c2a80:   37020 tokens, checkpoints:  1,  1529.138 MiB
[35177] 709.04.238.912 I srv  get_availabl: prompt cache update took 755.80 ms
[35177] 709.04.239.279 I reasoning-budget: activated, budget=2147483647 tokens
[35177] 709.04.239.391 I slot launch_slot_: id  0 | task 34737 | processing task, is_child = 0
[35177] 709.04.239.410 I slot update_slots: id  0 | task 34737 | Checking checkpoint with [14094, 14094] against 14149...
[35177] 709.04.247.307 W slot update_slots: id  0 | task 34737 | restored context checkpoint (pos_min = 14094, pos_max = 14094, n_tokens = 14095, n_past = 14095, size = 149.626 MiB)
[35177] 709.04.513.858 I slot create_check: id  0 | task 34737 | created context checkpoint 2 of 32 (pos_min = 14911, pos_max = 14911, n_tokens = 14912, size = 149.626 MiB)
[35177] 709.05.198.645 I reasoning-budget: deactivated (natural end)
[35177] 709.05.657.479 I slot print_timing: id  0 | task 34737 | prompt eval time =     315.57 ms /   836 tokens (    0.38 ms per token,  2649.18 tokens per second)
[35177] 709.05.657.482 I slot print_timing: id  0 | task 34737 |        eval time =    1102.50 ms /    80 tokens (   13.78 ms per token,    72.56 tokens per second)
[35177] 709.05.657.482 I slot print_timing: id  0 | task 34737 |       total time =    1418.07 ms /   916 tokens
[35177] 709.05.657.483 I slot print_timing: id  0 | task 34737 |    graphs reused =      33428
[35177] 709.05.657.969 I slot      release: id  0 | task 34737 | stop processing: n_tokens = 15010, truncated = 0
[35177] 709.05.658.160 I srv  update_slots: all slots are idle
736.45.144.996 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 711.59.952.652 I srv  params_from_: Chat format: peg-native
[35177] 711.59.971.267 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.995
[35177] 711.59.971.621 I reasoning-budget: activated, budget=2147483647 tokens
[35177] 711.59.971.674 I slot launch_slot_: id  0 | task 34821 | processing task, is_child = 0
[35177] 711.59.971.694 I slot update_slots: id  0 | task 34821 | Checking checkpoint with [14911, 14911] against 14930...
[35177] 711.59.987.901 W slot update_slots: id  0 | task 34821 | restored context checkpoint (pos_min = 14911, pos_max = 14911, n_tokens = 14912, n_past = 14912, size = 149.626 MiB)
[35177] 712.01.453.851 I slot print_timing: id  0 | task 34821 | n_decoded =    100, tg =  72.33 t/s
[35177] 712.01.883.216 I reasoning-budget: deactivated (natural end)
[35177] 712.04.456.162 I slot print_timing: id  0 | task 34821 | n_decoded =    317, tg =  72.29 t/s
[35177] 712.07.457.883 I slot print_timing: id  0 | task 34821 | n_decoded =    534, tg =  72.29 t/s
[35177] 712.07.860.681 I slot print_timing: id  0 | task 34821 | prompt eval time =      99.63 ms /   108 tokens (    0.92 ms per token,  1083.97 tokens per second)
[35177] 712.07.860.686 I slot print_timing: id  0 | task 34821 |        eval time =    7789.36 ms /   563 tokens (   13.84 ms per token,    72.28 tokens per second)
[35177] 712.07.860.687 I slot print_timing: id  0 | task 34821 |       total time =    7888.99 ms /   671 tokens
[35177] 712.07.860.687 I slot print_timing: id  0 | task 34821 |    graphs reused =      33987
[35177] 712.07.861.191 I slot      release: id  0 | task 34821 | stop processing: n_tokens = 15582, truncated = 0
[35177] 712.07.861.246 I srv  update_slots: all slots are idle
738.23.154.152 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 713.37.960.240 I srv  params_from_: Chat format: peg-native
[35177] 713.37.977.500 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.970 (> 0.100 thold), f_keep = 0.964
[35177] 713.37.977.959 I reasoning-budget: activated, budget=2147483647 tokens
[35177] 713.37.978.005 I slot launch_slot_: id  0 | task 35387 | processing task, is_child = 0
[35177] 713.37.978.030 I slot update_slots: id  0 | task 35387 | Checking checkpoint with [14911, 14911] against 15019...
[35177] 713.37.994.715 W slot update_slots: id  0 | task 35387 | restored context checkpoint (pos_min = 14911, pos_max = 14911, n_tokens = 14912, n_past = 14912, size = 149.626 MiB)
[35177] 713.38.185.968 I slot create_check: id  0 | task 35387 | created context checkpoint 3 of 32 (pos_min = 15453, pos_max = 15453, n_tokens = 15454, size = 149.626 MiB)
[35177] 713.38.244.987 I reasoning-budget: deactivated (natural end)
[35177] 713.39.603.598 I slot print_timing: id  0 | task 35387 | n_decoded =    100, tg =  72.80 t/s
[35177] 713.42.611.837 I slot print_timing: id  0 | task 35387 | n_decoded =    314, tg =  71.66 t/s
[35177] 713.45.622.852 I slot print_timing: id  0 | task 35387 | n_decoded =    530, tg =  71.69 t/s
[35177] 713.48.451.780 I slot print_timing: id  0 | task 35387 | prompt eval time =     251.89 ms /   572 tokens (    0.44 ms per token,  2270.82 tokens per second)
[35177] 713.48.451.783 I slot print_timing: id  0 | task 35387 |        eval time =   10221.86 ms /   732 tokens (   13.96 ms per token,    71.61 tokens per second)
[35177] 713.48.451.783 I slot print_timing: id  0 | task 35387 |       total time =   10473.75 ms /  1304 tokens
[35177] 713.48.451.784 I slot print_timing: id  0 | task 35387 |    graphs reused =      34714
[35177] 713.48.452.275 I slot      release: id  0 | task 35387 | stop processing: n_tokens = 16215, truncated = 0
[35177] 713.48.452.337 I srv  update_slots: all slots are idle
743.41.600.969 I srv  proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177
[35177] 718.56.411.313 I srv  params_from_: Chat format: peg-native
[35177] 718.56.432.696 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.949 (> 0.100 thold), f_keep = 0.955
[35177] 718.56.433.066 I reasoning-budget: activated, budget=2147483647 tokens
[35177] 718.56.433.105 I slot launch_slot_: id  0 | task 36123 | processing task, is_child = 0
[35177] 718.56.433.121 I slot update_slots: id  0 | task 36123 | Checking checkpoint with [15453, 15453] against 15483...
[35177] 718.56.448.947 W slot update_slots: id  0 | task 36123 | restored context checkpoint (pos_min = 15453, pos_max = 15453, n_tokens = 15454, n_past = 15454, size = 149.626 MiB)
[35177] 718.56.714.655 I slot create_check: id  0 | task 36123 | created context checkpoint 4 of 32 (pos_min = 16215, pos_max = 16215, n_tokens = 16216, size = 149.626 MiB)
[35177] 718.58.165.864 I slot print_timing: id  0 | task 36123 | n_decoded =    100, tg =  71.97 t/s
[35177] 719.01.169.309 I slot print_timing: id  0 | task 36123 | n_decoded =    315, tg =  71.71 t/s
[35177] 719.01.524.740 I reasoning-budget: deactivated (natural end)
[35177] 719.04.169.592 I slot print_timing: id  0 | task 36123 | n_decoded =    529, tg =  71.55 t/s
[35177] 719.07.174.598 I slot print_timing: id  0 | task 36123 | n_decoded =    743, tg =  71.45 t/s
[35177] 719.10.180.773 I slot print_timing: id  0 | task 36123 | n_decoded =    958, tg =  71.47 t/s
[35177] 719.11.090.898 I slot print_timing: id  0 | task 36123 | prompt eval time =     343.25 ms /   854 tokens (    0.40 ms per token,  2487.97 tokens per second)
[35177] 719.11.090.901 I slot print_timing: id  0 | task 36123 |        eval time =   14314.53 ms /  1023 tokens (   13.99 ms per token,    71.47 tokens per second)
[35177] 719.11.090.902 I slot print_timing: id  0 | task 36123 |       total time =   14657.78 ms /  1877 tokens
[35177] 719.11.090.902 I slot print_timing: id  0 | task 36123 |    graphs reused =      35731
[35177] 719.11.091.426 I slot      release: id  0 | task 36123 | stop processing: n_tokens = 17330, truncated = 0
[35177] 719.11.091.488 I srv  update_slots: all slots are idle

Other Logs

2026-07-01T03:37:06-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:06-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:06-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:08-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:08-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:09-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:09-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:09-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:11-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:11-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:15-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:15-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:15-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 -
2026-07-01T03:37:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 -
2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /robots.txt HTTP/1.1" 404 -
2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -

State File

{
  "transitioning": false,
  "gpu_state": "inference_available",
  "message": "llama gpu health check passed",
  "inference": {
    "status": "running",
    "mode": "gpu",
    "pid": 1156162,
    "launcher_pid": null,
    "pgid": 1156162,
    "port": 19091,
    "url": "http://arcangle:19091",
    "health_url": "http://arcangle:19091/health",
    "port_listening": true,
    "http_ok": true,
    "http_probe": {
      "ok": true,
      "status": 200,
      "body": "{\"status\":\"ok\"}"
    },
    "log": "/servers/run/state/logs/llama.log",
    "model_profile": "qwen36-27b-neocoder-q4-preserve",
    "model_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
    "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf"
  },
  "speech_input": {
    "status": "running",
    "mode": "gpu",
    "port": 8000,
    "url": "http://arcangle:8000",
    "tts": {
      "service": "kokoro",
      "status": "stopped",
      "mode": "stopped",
      "pid": null,
      "launcher_pid": null,
      "pgid": 1776540,
      "port": 8880,
      "url": "http://arcangle:8880",
      "health_url": "http://arcangle:8880/health",
      "port_listening": false,
      "http_ok": false,
      "http_probe": {
        "ok": false,
        "error": "port closed"
      },
      "log": "/servers/run/state/logs/kokoro.log"
    },
    "note": "Port 8000 remains the dictation endpoint; GPU dictation is served by /servers/speaches.",
    "log": "/servers/run/state/logs/voice.log",
    "containers": {
      "vad-shim": false,
      "speaches": true,
      "whisper-stt": false
    }
  },
  "text_to_speech": {
    "service": "kokoro",
    "status": "stopped",
    "mode": "stopped",
    "pid": null,
    "launcher_pid": null,
    "pgid": 1776540,
    "port": 8880,
    "url": "http://arcangle:8880",
    "health_url": "http://arcangle:8880/health",
    "port_listening": false,
    "http_ok": false,
    "http_probe": {
      "ok": false,
      "error": "port closed"
    },
    "log": "/servers/run/state/logs/kokoro.log"
  },
  "gpu_workload": {
    "service": null,
    "status": "stopped",
    "mode": "stopped",
    "pid": null,
    "pgid": null,
    "port": 8188,
    "url": "http://arcangle:8188",
    "log": "/servers/run/state/logs/comfy.log"
  },
  "gpu_workloads": {
    "file": "/servers/run/state/gpu-workloads.json",
    "items": [
      {
        "label": "ComfyUI",
        "slug": "comfy",
        "description": "ComfyUI image/video workflow",
        "full_use": true,
        "status": "stopped",
        "mode": "stopped",
        "pid": null,
        "pgid": null,
        "port": 8188,
        "url": "http://arcangle:8188",
        "log": "/servers/run/state/logs/comfy.log"
      }
    ]
  },
  "sidecars": {
    "llama_proxy": {
      "service": "llama_proxy",
      "label": "Llama Proxy",
      "status": "running",
      "mode": "proxy",
      "pid": 1650305,
      "pgid": 1650305,
      "port": 19090,
      "url": "http://arcangle:19090",
      "health_url": "http://arcangle:19090/health",
      "port_listening": true,
      "http_ok": true,
      "http_probe": {
        "ok": true,
        "status": 200,
        "body": "{\"status\":\"ok\"}"
      },
      "backend": "http://arcangle:19091",
      "log": "/servers/run/state/logs/llama-proxy.log"
    },
    "ttscleaner": {
      "service": "ttscleaner",
      "label": "TTS Cleaner",
      "status": "running",
      "mode": "proxy",
      "pid": 3999358,
      "port": 8881,
      "url": "http://arcangle:8881",
      "health_url": "http://arcangle:8881/health",
      "port_listening": true,
      "http_ok": true,
      "http_probe": {
        "ok": true,
        "status": 200,
        "body": "{\"message\":\"OK\"}"
      },
      "backend": "http://arcangle:8000",
      "llm_backend": "http://arcangle:19090",
      "log": "/servers/ttscleaner/proxy.log"
    },
    "kokoro_reader": {
      "service": "kokoro_reader",
      "label": "Kokoro Reader",
      "status": "running",
      "mode": "reader",
      "pid": 14846,
      "port": 9999,
      "url": "http://arcangle:9999",
      "health_url": "http://arcangle:9999",
      "port_listening": true,
      "http_ok": true,
      "http_probe": {
        "ok": true,
        "status": 200,
        "body": "<!doctype html>\n<html lang=\"en\">\n  <head>\n    <meta charset=\"utf-8\">\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n    <meta name=\"theme-color\" content=\"#101820\">\n    <title>Kokoro Reader</title>\n    <link rel=\"manifest\" href=\"/s"
      },
      "backend": "http://arcangle:8881",
      "log": "/servers/kokoro-reader/kokoro-reader.log"
    }
  },
  "nvidia": {
    "available": true,
    "gpus": [
      {
        "name": "NVIDIA GeForce RTX 5090",
        "memory_used_mb": "28393",
        "memory_total_mb": "32607",
        "gpu_util_percent": "0",
        "power_draw_w": "11.91",
        "power_limit_w": "575.00",
        "temperature_c": "40",
        "memory_used_percent": 87.08,
        "power_percent": 2.07,
        "util_percent": 0.0
      }
    ],
    "processes": {
      "available": true,
      "items": [
        {
          "pid": 1157395,
          "process_name": "/home/ubuntu/speaches/.venv/bin/python",
          "friendly_name": "Whisper dictation",
          "used_gpu_memory_mb": 1030.0,
          "gpu_capacity_mb": 32607.0,
          "used_gpu_memory_percent": 3.16,
          "ps_aux": "uriel    1157395  0.5  4.3 100794656 5751292 ?   SLsl Jun30   4:29 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app"
        },
        {
          "pid": 1434466,
          "process_name": "/servers/llcpp6/build/bin/llama-server",
          "friendly_name": "llama.cpp",
          "used_gpu_memory_mb": 27334.0,
          "gpu_capacity_mb": 32607.0,
          "used_gpu_memory_percent": 83.83,
          "ps_aux": "uriel    1434466  2.5 15.7 99724352 20720796 ?   Sl   Jun30  18:31 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {\"preserve_thinking\":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20"
        }
      ]
    }
  },
  "routing": {
    "gpu_owner": "llama.cpp + Whisper",
    "inference_route": "gpu / running",
    "voice_route": "gpu / running",
    "tts_route": "speaches / running"
  },
  "progress": {
    "active": false,
    "percent": 100,
    "label": "llama gpu health check passed",
    "operations": []
  },
  "model_cache": {
    "comfy_cpu_limit_percent": 80,
    "enabled": false,
    "gpu_util_limit_percent": 40,
    "guard": {
      "blocked": false,
      "blocked_reasons": [],
      "comfy_cpu_percent": 0.0,
      "gpu_util_percent": 0.0
    },
    "guard_window_seconds": 120,
    "interval_seconds": 120,
    "last_blocked_at": "2026-07-01T03:26:05-04:00",
    "last_exit_code": 0,
    "last_finished_at": "2026-06-12T19:04:02-04:00",
    "last_run_at": "2026-06-12T19:04:02-04:00",
    "last_started_at": "2026-06-12T19:04:01-04:00",
    "lists": {
      "inference": "/servers/run/cache/inference-active.txt",
      "stt": "/servers/run/cache/stt.txt",
      "tts": "/servers/run/cache/tts.txt"
    },
    "message": "model cache disabled",
    "next_run_due_at": "2026-06-12T19:06:02-04:00",
    "running": false,
    "script": "/servers/run/warm_models.sh",
    "settings": {
      "enabled": false,
      "interval_seconds": 120
    },
    "updated_at": "2026-07-01T03:37:26-04:00"
  },
  "startup_services": {
    "file": "/servers/run/state/startup-services.json",
    "services": [
      {
        "key": "inference",
        "label": "Inference",
        "description": "Start selected llama GPU model when ArcControl starts.",
        "action": "llama-gpu",
        "enabled": true
      },
      {
        "key": "speech_input",
        "label": "Dictation",
        "description": "Start GPU dictation/Speaches when ArcControl starts.",
        "action": "voice-gpu",
        "enabled": false
      },
      {
        "key": "text_to_speech",
        "label": "Standalone Kokoro",
        "description": "Start standalone Kokoro GPU service when ArcControl starts.",
        "action": "kokoro-gpu",
        "enabled": false
      },
      {
        "key": "sidecars.llama_proxy",
        "label": "Llama Proxy",
        "description": "Start llama logging proxy when ArcControl starts.",
        "action": "llama-proxy-start",
        "enabled": true
      },
      {
        "key": "sidecars.ttscleaner",
        "label": "TTS Cleaner",
        "description": "Start TTS Cleaner when ArcControl starts.",
        "action": "ttscleaner-start",
        "enabled": true
      },
      {
        "key": "sidecars.kokoro_reader",
        "label": "Kokoro Reader",
        "description": "Start Kokoro Reader when ArcControl starts.",
        "action": "kokoro-reader-start",
        "enabled": true
      },
      {
        "key": "gpu_workload.comfy",
        "label": "ComfyUI",
        "description": "Start ComfyUI when ArcControl starts.",
        "action": "gpu-workload-launch-comfy",
        "enabled": false
      }
    ],
    "updated_at": "2026-06-30T15:01:58-04:00"
  },
  "model_profiles": {
    "active": "qwen36-27b-neocoder-q4-preserve",
    "active_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
    "default": "qwen36-27b-heretic-preserve",
    "default_label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
    "items": [
      {
        "slug": "gemma4-12b-q4-heretic",
        "label": "Gemma 4 12B Q4 Heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/gemma4-12b-q4-heretic",
        "model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
        "size_gb": 6.9,
        "size_label": "6.9 GB",
        "cache_files": [
          "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
          "/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/gemma4-12b-q4-heretic.json"
      },
      {
        "slug": "gemma4-12b-q6-heretic",
        "label": "Gemma 4 12B Q6 Heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/gemma4-12b-q6-heretic",
        "model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
        "size_gb": 9.1,
        "size_label": "9.1 GB",
        "cache_files": [
          "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
          "/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/gemma4-12b-q6-heretic.json"
      },
      {
        "slug": "gemma4-26b-a4b-heretic",
        "label": "gemma4-26B-A4B-heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/gemma4-26b-a4b-heretic",
        "model_path": "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
        "size_gb": 15.6,
        "size_label": "15.6 GB",
        "cache_files": [
          "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
          "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.mmproj-f16.gguf"
        ],
        "path": "/servers/run/model-profiles/gemma4-26b-a4b-heretic.json"
      },
      {
        "slug": "qwen36-27b-heretic-preserve",
        "label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": true,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-heretic-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
        "size_gb": 16.0,
        "size_label": "16.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-heretic-preserve.json"
      },
      {
        "slug": "qwen36-27b-heretic",
        "label": "Qwen 3.6 27B Heretic",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-heretic",
        "model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
        "size_gb": 16.0,
        "size_label": "16.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-heretic.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-nl-preserve",
        "label": "Qwen 3.6 27B NeoCode IQ4_NL (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
        "size_gb": 15.0,
        "size_label": "15.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-nl",
        "label": "Qwen 3.6 27B NeoCode IQ4_NL",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
        "size_gb": 15.0,
        "size_label": "15.0 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-xs-preserve",
        "label": "Qwen 3.6 27B NeoCode IQ4_XS (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
        "size_gb": 14.3,
        "size_label": "14.3 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-iq4-xs",
        "label": "Qwen 3.6 27B NeoCode IQ4_XS",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
        "size_gb": 14.3,
        "size_label": "14.3 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q4-preserve",
        "label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
        "description": "",
        "active": true,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q4-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
        "size_gb": 15.7,
        "size_label": "15.7 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q4",
        "label": "Qwen 3.6 27B NeoCode Q4",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q4",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
        "size_gb": 15.7,
        "size_label": "15.7 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q6-preserve",
        "label": "Qwen 3.6 27B NeoCode Q6 (Preserve Thinking)",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q6-preserve",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
        "size_gb": 20.9,
        "size_label": "20.9 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6-preserve.json"
      },
      {
        "slug": "qwen36-27b-neocoder-q6",
        "label": "Qwen 3.6 27B NeoCode Q6",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-27b-neocoder-q6",
        "model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
        "size_gb": 20.9,
        "size_label": "20.9 GB",
        "cache_files": [
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
          "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6.json"
      },
      {
        "slug": "qwen36-unc",
        "label": "Qwen3.6 35B Uncensored",
        "description": "",
        "active": false,
        "default": false,
        "build_key": "llcpp6",
        "openclaw_model_ref": "arc/qwen36-unc",
        "model_path": "/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
        "size_gb": 19.7,
        "size_label": "19.7 GB",
        "cache_files": [
          "/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
          "/models/Q3.6/mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf"
        ],
        "path": "/servers/run/model-profiles/qwen36-unc.json"
      }
    ],
    "dir": "/servers/run/model-profiles",
    "state_file": "/servers/run/state/active-model-profile.json",
    "default_state_file": "/servers/run/state/default-model-profile.json",
    "cache_file": "/servers/run/cache/inference-active.txt"
  },
  "openclaw_sync": {
    "status": "synchronized",
    "expected_model": "qwen36-27b-neocoder-q4-preserve",
    "loaded_model": "qwen36-27b-neocoder-q4-preserve",
    "loaded_models": [
      "qwen36-27b-neocoder-q4-preserve"
    ],
    "expected_openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
    "openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
    "synchronized": true,
    "error": "",
    "sessions": {
      "stale_local": 0,
      "stale_keys": [],
      "cloud_overrides": 1,
      "checked": 42
    },
    "state_file": "/servers/run/state/openclaw-sync-state.json"
  },
  "state_file": "/servers/run/state/system-state.json",
  "updated_at": "2026-07-01T03:37:29-04:00",
  "profiles": {
    "active": "custom",
    "active_label": "Custom Profile",
    "items": [
      {
        "slug": "gpu-inference",
        "label": "GPU Inference",
        "description": "GPU llama, GPU dictation, GPU Kokoro; unload Comfy models if Comfy is running; model cache off.",
        "desired": {
          "llama": "gpu",
          "voice": "gpu",
          "kokoro": "gpu",
          "comfy": "unload_models",
          "model_cache": false
        },
        "momentary": false,
        "active": false
      },
      {
        "slug": "gpu-turbo",
        "label": "GPU Turbo",
        "description": "Experimental turboquant llama profile; GPU dictation and Kokoro; unload Comfy models if Comfy is running; model cache off.",
        "desired": {
          "llama": "gpu-turbo",
          "voice": "gpu",
          "kokoro": "gpu",
          "comfy": "unload_models",
          "model_cache": false
        },
        "momentary": false,
        "active": false
      },
      {
        "slug": "comfy-work",
        "label": "Comfy Work",
        "description": "CPU inference, CPU-lite dictation, Comfy running, model files kept warm.",
        "desired": {
          "llama": "cpu",
          "voice": "cpu-lite",
          "kokoro": "stopped",
          "comfy": "running",
          "model_cache": true
        },
        "momentary": false,
        "active": false
      },
      {
        "slug": "free-vram",
        "label": "Free VRAM",
        "description": "One-shot VRAM cleanup: CPU inference and dictation, Kokoro stopped, Comfy kept running with models unloaded.",
        "desired": {
          "llama": "cpu",
          "voice": "cpu-lite",
          "kokoro": "stopped",
          "comfy": "unload_models",
          "model_cache": true
        },
        "momentary": true,
        "active": false
      }
    ],
    "watch": {
      "enabled": false,
      "profile": "gpu-inference",
      "updated_at": "2026-06-03T18:03:26-04:00"
    },
    "file": "/servers/run/state/profiles.json"
  }
}