Workflows
ProfilesCustom Profile
Model ProfileQwen 3.6 27B NeoCode Q4 (Preserve Thinking)
Default: Qwen 3.6 27B Heretic (Preserve Thinking)
Loaded: qwen36-27b-neocoder-q4-preserve
OpenClaw: arc/qwen36-27b-neocoder-q4-preserve
Synchronization: synchronized
API Reference
Base URL: http://arcangle:10101
Action endpoints accept GET or POST; POST is preferred for automation.
Status:
GET http://arcangle:10101/api/status
Current dashboard state.
GET http://arcangle:10101/state.json
Alias for current dashboard state.
GET http://arcangle:10101/api/gpu-history?minutes=60
GPU telemetry history.
GET http://arcangle:10101/api/profiles
Available desired-state profiles and active match.
GET http://arcangle:10101/api/model-profiles
Available llama model profiles and selected model.
GET http://arcangle:10101/api/openclaw-sync
ArcControl/OpenClaw model synchronization state.
GET http://arcangle:10101/api/gpu-workloads
Registered full-use GPU workloads.
GET http://arcangle:10101/api/startup-services
Card services configured to start with ArcControl.
POST http://arcangle:10101/api/startup-services
Set whether a card service starts with ArcControl; pass key and enabled.
Inference:
POST http://arcangle:10101/api/model-profile
Select llama model profile; pass slug and optional restart.
POST http://arcangle:10101/api/model-profile-default
Set the default llama model profile used when active state is missing.
POST http://arcangle:10101/api/openclaw-sync
Retry OpenClaw sync for the currently loaded ArcControl model.
POST http://arcangle:10101/api/do/llama-gpu
Switch llama.cpp to regular GPU mode.
POST http://arcangle:10101/api/do/llama-gpu-turbo
Switch llama.cpp to GPU turboquant mode.
POST http://arcangle:10101/api/do/llama-cpu
Switch llama.cpp to CPU mode.
POST http://arcangle:10101/api/do/llama-stop
Stop llama.cpp inference.
POST http://arcangle:10101/api/do/llama-bounce
Restart the current llama.cpp mode.
POST http://arcangle:10101/api/bounce/llama
Alias: restart the current llama.cpp mode.
Whisper / Dictation:
POST http://arcangle:10101/api/do/voice-gpu
Start GPU dictation stack and Kokoro GPU.
POST http://arcangle:10101/api/do/voice-cpu
Start CPU tiny dictation on port 8000.
POST http://arcangle:10101/api/do/voice-stop
Stop dictation and Kokoro.
POST http://arcangle:10101/api/do/voice-bounce
Restart the current dictation mode.
POST http://arcangle:10101/api/bounce/voice
Alias: restart the current dictation mode.
Kokoro TTS:
POST http://arcangle:10101/api/do/kokoro-gpu
Start Kokoro TTS in GPU mode.
POST http://arcangle:10101/api/do/kokoro-cpu
Start Kokoro TTS in CPU mode.
POST http://arcangle:10101/api/do/kokoro-stop
Stop Kokoro TTS.
POST http://arcangle:10101/api/do/kokoro-bounce
Restart the current standalone Kokoro TTS mode.
POST http://arcangle:10101/api/bounce/kokoro
Alias: restart the current standalone Kokoro TTS mode.
Sidecars:
POST http://arcangle:10101/api/do/llama-proxy-start
Start llama logging proxy on port 19090.
POST http://arcangle:10101/api/do/llama-proxy-stop
Stop llama logging proxy on port 19090.
POST http://arcangle:10101/api/do/ttscleaner-start
Start TTS Cleaner proxy on port 8881.
POST http://arcangle:10101/api/do/ttscleaner-stop
Stop TTS Cleaner proxy on port 8881.
POST http://arcangle:10101/api/do/kokoro-reader-start
Start Kokoro Reader on port 9999.
POST http://arcangle:10101/api/do/kokoro-reader-stop
Stop Kokoro Reader on port 9999.
POST http://arcangle:10101/api/do/llama-proxy-bounce
Restart the llama logging proxy.
POST http://arcangle:10101/api/do/ttscleaner-bounce
Restart TTS Cleaner.
POST http://arcangle:10101/api/do/kokoro-reader-bounce
Restart Kokoro Reader.
POST http://arcangle:10101/api/bounce/llama-proxy
Alias: restart the llama logging proxy.
POST http://arcangle:10101/api/bounce/ttscleaner
Alias: restart TTS Cleaner.
POST http://arcangle:10101/api/bounce/kokoro-reader
Alias: restart Kokoro Reader.
GPU Workloads:
POST http://arcangle:10101/api/do/gpu-workload-launch-comfy
Start ComfyUI in place without changing other services.
POST http://arcangle:10101/api/do/gpu-workload-start-comfy
Make room and start ComfyUI.
POST http://arcangle:10101/api/do/gpu-workload-stop-comfy
Stop ComfyUI.
POST http://arcangle:10101/api/do/gpu-workload-bounce-comfy
Restart ComfyUI in place.
POST http://arcangle:10101/api/bounce/gpu-workload/comfy
Alias: restart ComfyUI in place.
POST http://arcangle:10101/api/do/comfy-start
Alias: start ComfyUI in place.
POST http://arcangle:10101/api/do/comfy-start-managed
Alias: make room and start ComfyUI.
POST http://arcangle:10101/api/do/comfy-free
Ask ComfyUI to unload models and free memory.
POST http://arcangle:10101/api/do/comfy-stop
Alias: stop ComfyUI.
Workflows:
POST http://arcangle:10101/api/do/profile-gpu-inference
Apply GPU Inference profile.
POST http://arcangle:10101/api/do/profile-gpu-turbo
Apply experimental GPU Turbo profile.
POST http://arcangle:10101/api/do/profile-comfy-work
Apply Comfy Work profile.
POST http://arcangle:10101/api/do/profile-free-vram
Apply Free VRAM profile.
POST http://arcangle:10101/api/do/retask-gpu
CPU inference, CPU dictation, start ComfyUI.
POST http://arcangle:10101/api/do/restore-gpu
Unload ComfyUI models, restore GPU inference and voice.
POST http://arcangle:10101/api/do/free-vram
CPU inference and CPU dictation, unload ComfyUI models without stopping ComfyUI.
Model Cache:
GET http://arcangle:10101/api/model-cache
Current inference/voice model cache retouch settings and runtime state.
POST http://arcangle:10101/api/model-cache
Set enabled and interval_seconds for warm_models.sh retouching.
POST http://arcangle:10101/api/do/model-cache-warm
Run warm_models.sh immediately.
Logs:
GET http://arcangle:10101/logs?service=llama&lines=240
Inference server log.
GET http://arcangle:10101/logs?service=voice&lines=120
Voice orchestration log.
GET http://arcangle:10101/logs?service=kokoro&lines=120
Kokoro log.
GET http://arcangle:10101/logs?service=kokoro_reader&lines=120
Kokoro Reader log.
GET http://arcangle:10101/logs?service=comfy&lines=120
ComfyUI log.
Examples:
curl -X POST http://arcangle:10101/api/do/llama-gpu-turbo
curl -X POST http://arcangle:10101/api/do/llama-cpu
curl -X POST http://arcangle:10101/api/do/llama-stop
curl -X POST http://arcangle:10101/api/do/voice-cpu
curl -X POST http://arcangle:10101/api/do/kokoro-stop
curl http://arcangle:10101/api/status
Cache Inference/Voice Models
model cache disabled
GPU ownerllama.cpp + Whisper
Inference routegpu / running
Voice routegpu / running
TTS routespeaches / running
llama gpu health check passed
0 operation(s)
GPU Load / VRAM Pressure - 5 Min
no samples · 0 samples
VRAM pressure GPU load Power
GPU Load / VRAM Pressure - 1 Hour
no samples · 0 samples
VRAM pressure GPU load Power
GPU Processes
| PID | Name | VRAM MB |
|---|---|---|
| 1157395 | Whisper dictation | 1030 MB / 3.16% |
uriel 1157395 0.5 4.3 100794656 5751292 ? SLsl Jun30 4:29 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app |
||
| 1434466 | llama.cpp | 27334 MB / 83.83% |
uriel 1434466 2.5 15.7 99724352 20720796 ? Sl Jun30 18:31 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {"preserve_thinking":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20 |
||
Inference
Statusrunning
Modegpu
HTTPok
PID1156162
Health19091
Start with ArcControl
Dictation
Statusrunning
Modegpu
Port8000
Start with ArcControl
TTS
Statusstopped
Modestopped
PIDNone
Health8880
Start with ArcControl
Llama Proxy
Statusrunning
Port19090
Backendhttp://arcangle:19091
PID1650305
Start with ArcControl
TTS Cleaner
Statusrunning
Port8881
LLMhttp://arcangle:19090
PID3999358
Start with ArcControl
Kokoro Reader
Statusrunning
Port9999
TTS APIhttp://arcangle:8881
PID14846
Start with ArcControl
ComfyUI
Statusstopped
Modestopped
PID
URL8188
Start with ArcControl
Inference Log
[35177] 705.35.277.467 I reasoning-budget: forced sequence complete, done [35177] 705.35.277.508 I slot launch_slot_: id 0 | task 34710 | processing task, is_child = 0 [35177] 705.35.277.535 I slot update_slots: id 0 | task 34710 | Checking checkpoint with [73487, 73487] against 31267... [35177] 705.35.277.537 I slot update_slots: id 0 | task 34710 | Checking checkpoint with [73165, 73165] against 31267... [35177] 705.35.277.537 I slot update_slots: id 0 | task 34710 | Checking checkpoint with [72702, 72702] against 31267... [35177] 705.35.277.537 I slot update_slots: id 0 | task 34710 | Checking checkpoint with [72148, 72148] against 31267... [35177] 705.35.277.538 W slot update_slots: id 0 | task 34710 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) [35177] 705.35.277.539 W slot update_slots: id 0 | task 34710 | erased invalidated context checkpoint (pos_min = 72148, pos_max = 72148, n_tokens = 72149, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 705.35.285.995 W slot update_slots: id 0 | task 34710 | erased invalidated context checkpoint (pos_min = 72702, pos_max = 72702, n_tokens = 72703, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 705.35.295.106 W slot update_slots: id 0 | task 34710 | erased invalidated context checkpoint (pos_min = 73165, pos_max = 73165, n_tokens = 73166, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 705.35.303.179 W slot update_slots: id 0 | task 34710 | erased invalidated context checkpoint (pos_min = 73487, pos_max = 73487, n_tokens = 73488, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 705.38.673.115 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 12288, progress = 0.33, t = 3.40 s / 3618.81 tokens per second [35177] 705.39.275.287 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 14336, progress = 0.39, t = 4.00 s / 3586.00 tokens per second [35177] 705.39.887.256 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 16384, progress = 0.44, t = 4.61 s / 3554.22 tokens per second [35177] 705.40.509.824 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 18432, progress = 0.50, t = 5.23 s / 3522.73 tokens per second [35177] 705.41.142.600 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 20480, progress = 0.55, t = 5.87 s / 3491.85 tokens per second [35177] 705.41.787.891 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 22528, progress = 0.61, t = 6.51 s / 3460.33 tokens per second [35177] 705.42.444.770 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 24576, progress = 0.66, t = 7.17 s / 3428.93 tokens per second [35177] 705.43.111.507 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 26624, progress = 0.72, t = 7.83 s / 3398.53 tokens per second [35177] 705.43.796.645 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 28672, progress = 0.77, t = 8.52 s / 3365.60 tokens per second [35177] 705.44.496.580 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 30720, progress = 0.83, t = 9.22 s / 3332.23 tokens per second [35177] 705.45.213.621 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 32768, progress = 0.89, t = 9.94 s / 3297.87 tokens per second [35177] 705.45.953.201 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 34816, progress = 0.94, t = 10.68 s / 3261.24 tokens per second [35177] 705.46.647.019 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 36499, progress = 0.99, t = 11.37 s / 3210.26 tokens per second [35177] 705.46.780.528 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 36975, progress = 1.00, t = 11.50 s / 3214.38 tokens per second [35177] 705.46.859.339 I slot create_check: id 0 | task 34710 | created context checkpoint 1 of 32 (pos_min = 36974, pos_max = 36974, n_tokens = 36975, size = 149.626 MiB) [35177] 705.46.879.812 I slot print_timing: id 0 | task 34710 | prompt processing, n_tokens = 37011, progress = 1.00, t = 11.60 s / 3189.97 tokens per second [35177] 705.46.992.459 I slot print_timing: id 0 | task 34710 | prompt eval time = 11633.66 ms / 37015 tokens ( 0.31 ms per token, 3181.72 tokens per second) [35177] 705.46.992.461 I slot print_timing: id 0 | task 34710 | eval time = 81.28 ms / 6 tokens ( 13.55 ms per token, 73.82 tokens per second) [35177] 705.46.992.462 I slot print_timing: id 0 | task 34710 | total time = 11714.93 ms / 37021 tokens [35177] 705.46.992.463 I slot print_timing: id 0 | task 34710 | graphs reused = 33350 [35177] 705.46.993.570 I slot release: id 0 | task 34710 | stop processing: n_tokens = 37020, truncated = 0 [35177] 705.46.993.626 I srv update_slots: all slots are idle 733.48.658.478 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 709.03.467.235 I srv params_from_: Chat format: peg-native [35177] 709.03.483.111 I slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = 330765934925 [35177] 709.03.483.114 I srv get_availabl: updating prompt cache [35177] 709.03.483.980 W srv prompt_save: - saving prompt with length 37020, total state size = 1379.512 MiB (draft: 0.000 MiB) [35177] 709.04.151.069 I srv load: - looking for better prompt, base f_keep = 0.001, sim = 0.002 [35177] 709.04.151.084 I srv load: - found better prompt with f_keep = 0.936, sim = 0.948 [35177] 709.04.238.903 I srv update: - cache state: 11 prompts, 17457.405 MiB (limits: 32768.000 MiB, 262144 tokens, 648178 est) [35177] 709.04.238.907 I srv update: - prompt 0x7090180595d0: 97 tokens, checkpoints: 1, 302.475 MiB [35177] 709.04.238.908 I srv update: - prompt 0x709084b75d30: 37 tokens, checkpoints: 1, 300.482 MiB [35177] 709.04.238.908 I srv update: - prompt 0x70908cb79950: 1607 tokens, checkpoints: 2, 502.267 MiB [35177] 709.04.238.909 I srv update: - prompt 0x70908c724f30: 66195 tokens, checkpoints: 7, 3396.153 MiB [35177] 709.04.238.909 I srv update: - prompt 0x709084189000: 67118 tokens, checkpoints: 1, 2529.060 MiB [35177] 709.04.238.909 I srv update: - prompt 0x708b9872f0e0: 70834 tokens, checkpoints: 10, 3999.149 MiB [35177] 709.04.238.910 I srv update: - prompt 0x5fc9b8f35180: 1681 tokens, checkpoints: 2, 504.725 MiB [35177] 709.04.238.910 I srv update: - prompt 0x5fc9b8bd9250: 2250 tokens, checkpoints: 2, 523.629 MiB [35177] 709.04.238.910 I srv update: - prompt 0x7090b45f7290: 73909 tokens, checkpoints: 1, 2754.672 MiB [35177] 709.04.238.911 I srv update: - prompt 0x7090844cce60: 24574 tokens, checkpoints: 1, 1115.655 MiB [35177] 709.04.238.911 I srv update: - prompt 0x70906c1c2a80: 37020 tokens, checkpoints: 1, 1529.138 MiB [35177] 709.04.238.912 I srv get_availabl: prompt cache update took 755.80 ms [35177] 709.04.239.279 I reasoning-budget: activated, budget=2147483647 tokens [35177] 709.04.239.391 I slot launch_slot_: id 0 | task 34737 | processing task, is_child = 0 [35177] 709.04.239.410 I slot update_slots: id 0 | task 34737 | Checking checkpoint with [14094, 14094] against 14149... [35177] 709.04.247.307 W slot update_slots: id 0 | task 34737 | restored context checkpoint (pos_min = 14094, pos_max = 14094, n_tokens = 14095, n_past = 14095, size = 149.626 MiB) [35177] 709.04.513.858 I slot create_check: id 0 | task 34737 | created context checkpoint 2 of 32 (pos_min = 14911, pos_max = 14911, n_tokens = 14912, size = 149.626 MiB) [35177] 709.05.198.645 I reasoning-budget: deactivated (natural end) [35177] 709.05.657.479 I slot print_timing: id 0 | task 34737 | prompt eval time = 315.57 ms / 836 tokens ( 0.38 ms per token, 2649.18 tokens per second) [35177] 709.05.657.482 I slot print_timing: id 0 | task 34737 | eval time = 1102.50 ms / 80 tokens ( 13.78 ms per token, 72.56 tokens per second) [35177] 709.05.657.482 I slot print_timing: id 0 | task 34737 | total time = 1418.07 ms / 916 tokens [35177] 709.05.657.483 I slot print_timing: id 0 | task 34737 | graphs reused = 33428 [35177] 709.05.657.969 I slot release: id 0 | task 34737 | stop processing: n_tokens = 15010, truncated = 0 [35177] 709.05.658.160 I srv update_slots: all slots are idle 736.45.144.996 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 711.59.952.652 I srv params_from_: Chat format: peg-native [35177] 711.59.971.267 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.994 (> 0.100 thold), f_keep = 0.995 [35177] 711.59.971.621 I reasoning-budget: activated, budget=2147483647 tokens [35177] 711.59.971.674 I slot launch_slot_: id 0 | task 34821 | processing task, is_child = 0 [35177] 711.59.971.694 I slot update_slots: id 0 | task 34821 | Checking checkpoint with [14911, 14911] against 14930... [35177] 711.59.987.901 W slot update_slots: id 0 | task 34821 | restored context checkpoint (pos_min = 14911, pos_max = 14911, n_tokens = 14912, n_past = 14912, size = 149.626 MiB) [35177] 712.01.453.851 I slot print_timing: id 0 | task 34821 | n_decoded = 100, tg = 72.33 t/s [35177] 712.01.883.216 I reasoning-budget: deactivated (natural end) [35177] 712.04.456.162 I slot print_timing: id 0 | task 34821 | n_decoded = 317, tg = 72.29 t/s [35177] 712.07.457.883 I slot print_timing: id 0 | task 34821 | n_decoded = 534, tg = 72.29 t/s [35177] 712.07.860.681 I slot print_timing: id 0 | task 34821 | prompt eval time = 99.63 ms / 108 tokens ( 0.92 ms per token, 1083.97 tokens per second) [35177] 712.07.860.686 I slot print_timing: id 0 | task 34821 | eval time = 7789.36 ms / 563 tokens ( 13.84 ms per token, 72.28 tokens per second) [35177] 712.07.860.687 I slot print_timing: id 0 | task 34821 | total time = 7888.99 ms / 671 tokens [35177] 712.07.860.687 I slot print_timing: id 0 | task 34821 | graphs reused = 33987 [35177] 712.07.861.191 I slot release: id 0 | task 34821 | stop processing: n_tokens = 15582, truncated = 0 [35177] 712.07.861.246 I srv update_slots: all slots are idle 738.23.154.152 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 713.37.960.240 I srv params_from_: Chat format: peg-native [35177] 713.37.977.500 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.970 (> 0.100 thold), f_keep = 0.964 [35177] 713.37.977.959 I reasoning-budget: activated, budget=2147483647 tokens [35177] 713.37.978.005 I slot launch_slot_: id 0 | task 35387 | processing task, is_child = 0 [35177] 713.37.978.030 I slot update_slots: id 0 | task 35387 | Checking checkpoint with [14911, 14911] against 15019... [35177] 713.37.994.715 W slot update_slots: id 0 | task 35387 | restored context checkpoint (pos_min = 14911, pos_max = 14911, n_tokens = 14912, n_past = 14912, size = 149.626 MiB) [35177] 713.38.185.968 I slot create_check: id 0 | task 35387 | created context checkpoint 3 of 32 (pos_min = 15453, pos_max = 15453, n_tokens = 15454, size = 149.626 MiB) [35177] 713.38.244.987 I reasoning-budget: deactivated (natural end) [35177] 713.39.603.598 I slot print_timing: id 0 | task 35387 | n_decoded = 100, tg = 72.80 t/s [35177] 713.42.611.837 I slot print_timing: id 0 | task 35387 | n_decoded = 314, tg = 71.66 t/s [35177] 713.45.622.852 I slot print_timing: id 0 | task 35387 | n_decoded = 530, tg = 71.69 t/s [35177] 713.48.451.780 I slot print_timing: id 0 | task 35387 | prompt eval time = 251.89 ms / 572 tokens ( 0.44 ms per token, 2270.82 tokens per second) [35177] 713.48.451.783 I slot print_timing: id 0 | task 35387 | eval time = 10221.86 ms / 732 tokens ( 13.96 ms per token, 71.61 tokens per second) [35177] 713.48.451.783 I slot print_timing: id 0 | task 35387 | total time = 10473.75 ms / 1304 tokens [35177] 713.48.451.784 I slot print_timing: id 0 | task 35387 | graphs reused = 34714 [35177] 713.48.452.275 I slot release: id 0 | task 35387 | stop processing: n_tokens = 16215, truncated = 0 [35177] 713.48.452.337 I srv update_slots: all slots are idle 743.41.600.969 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 718.56.411.313 I srv params_from_: Chat format: peg-native [35177] 718.56.432.696 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.949 (> 0.100 thold), f_keep = 0.955 [35177] 718.56.433.066 I reasoning-budget: activated, budget=2147483647 tokens [35177] 718.56.433.105 I slot launch_slot_: id 0 | task 36123 | processing task, is_child = 0 [35177] 718.56.433.121 I slot update_slots: id 0 | task 36123 | Checking checkpoint with [15453, 15453] against 15483... [35177] 718.56.448.947 W slot update_slots: id 0 | task 36123 | restored context checkpoint (pos_min = 15453, pos_max = 15453, n_tokens = 15454, n_past = 15454, size = 149.626 MiB) [35177] 718.56.714.655 I slot create_check: id 0 | task 36123 | created context checkpoint 4 of 32 (pos_min = 16215, pos_max = 16215, n_tokens = 16216, size = 149.626 MiB) [35177] 718.58.165.864 I slot print_timing: id 0 | task 36123 | n_decoded = 100, tg = 71.97 t/s [35177] 719.01.169.309 I slot print_timing: id 0 | task 36123 | n_decoded = 315, tg = 71.71 t/s [35177] 719.01.524.740 I reasoning-budget: deactivated (natural end) [35177] 719.04.169.592 I slot print_timing: id 0 | task 36123 | n_decoded = 529, tg = 71.55 t/s [35177] 719.07.174.598 I slot print_timing: id 0 | task 36123 | n_decoded = 743, tg = 71.45 t/s [35177] 719.10.180.773 I slot print_timing: id 0 | task 36123 | n_decoded = 958, tg = 71.47 t/s [35177] 719.11.090.898 I slot print_timing: id 0 | task 36123 | prompt eval time = 343.25 ms / 854 tokens ( 0.40 ms per token, 2487.97 tokens per second) [35177] 719.11.090.901 I slot print_timing: id 0 | task 36123 | eval time = 14314.53 ms / 1023 tokens ( 13.99 ms per token, 71.47 tokens per second) [35177] 719.11.090.902 I slot print_timing: id 0 | task 36123 | total time = 14657.78 ms / 1877 tokens [35177] 719.11.090.902 I slot print_timing: id 0 | task 36123 | graphs reused = 35731 [35177] 719.11.091.426 I slot release: id 0 | task 36123 | stop processing: n_tokens = 17330, truncated = 0 [35177] 719.11.091.488 I srv update_slots: all slots are idle
Other Logs
2026-07-01T03:37:06-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:06-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:06-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:08-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:08-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:09-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:09-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:09-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:11-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:11-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:12-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:15-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:15-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:15-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:23-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:26-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T03:37:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /robots.txt HTTP/1.1" 404 - 2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T03:37:29-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
State File
{
"transitioning": false,
"gpu_state": "inference_available",
"message": "llama gpu health check passed",
"inference": {
"status": "running",
"mode": "gpu",
"pid": 1156162,
"launcher_pid": null,
"pgid": 1156162,
"port": 19091,
"url": "http://arcangle:19091",
"health_url": "http://arcangle:19091/health",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "{\"status\":\"ok\"}"
},
"log": "/servers/run/state/logs/llama.log",
"model_profile": "qwen36-27b-neocoder-q4-preserve",
"model_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf"
},
"speech_input": {
"status": "running",
"mode": "gpu",
"port": 8000,
"url": "http://arcangle:8000",
"tts": {
"service": "kokoro",
"status": "stopped",
"mode": "stopped",
"pid": null,
"launcher_pid": null,
"pgid": 1776540,
"port": 8880,
"url": "http://arcangle:8880",
"health_url": "http://arcangle:8880/health",
"port_listening": false,
"http_ok": false,
"http_probe": {
"ok": false,
"error": "port closed"
},
"log": "/servers/run/state/logs/kokoro.log"
},
"note": "Port 8000 remains the dictation endpoint; GPU dictation is served by /servers/speaches.",
"log": "/servers/run/state/logs/voice.log",
"containers": {
"vad-shim": false,
"speaches": true,
"whisper-stt": false
}
},
"text_to_speech": {
"service": "kokoro",
"status": "stopped",
"mode": "stopped",
"pid": null,
"launcher_pid": null,
"pgid": 1776540,
"port": 8880,
"url": "http://arcangle:8880",
"health_url": "http://arcangle:8880/health",
"port_listening": false,
"http_ok": false,
"http_probe": {
"ok": false,
"error": "port closed"
},
"log": "/servers/run/state/logs/kokoro.log"
},
"gpu_workload": {
"service": null,
"status": "stopped",
"mode": "stopped",
"pid": null,
"pgid": null,
"port": 8188,
"url": "http://arcangle:8188",
"log": "/servers/run/state/logs/comfy.log"
},
"gpu_workloads": {
"file": "/servers/run/state/gpu-workloads.json",
"items": [
{
"label": "ComfyUI",
"slug": "comfy",
"description": "ComfyUI image/video workflow",
"full_use": true,
"status": "stopped",
"mode": "stopped",
"pid": null,
"pgid": null,
"port": 8188,
"url": "http://arcangle:8188",
"log": "/servers/run/state/logs/comfy.log"
}
]
},
"sidecars": {
"llama_proxy": {
"service": "llama_proxy",
"label": "Llama Proxy",
"status": "running",
"mode": "proxy",
"pid": 1650305,
"pgid": 1650305,
"port": 19090,
"url": "http://arcangle:19090",
"health_url": "http://arcangle:19090/health",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "{\"status\":\"ok\"}"
},
"backend": "http://arcangle:19091",
"log": "/servers/run/state/logs/llama-proxy.log"
},
"ttscleaner": {
"service": "ttscleaner",
"label": "TTS Cleaner",
"status": "running",
"mode": "proxy",
"pid": 3999358,
"port": 8881,
"url": "http://arcangle:8881",
"health_url": "http://arcangle:8881/health",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "{\"message\":\"OK\"}"
},
"backend": "http://arcangle:8000",
"llm_backend": "http://arcangle:19090",
"log": "/servers/ttscleaner/proxy.log"
},
"kokoro_reader": {
"service": "kokoro_reader",
"label": "Kokoro Reader",
"status": "running",
"mode": "reader",
"pid": 14846,
"port": 9999,
"url": "http://arcangle:9999",
"health_url": "http://arcangle:9999",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "<!doctype html>\n<html lang=\"en\">\n <head>\n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n <meta name=\"theme-color\" content=\"#101820\">\n <title>Kokoro Reader</title>\n <link rel=\"manifest\" href=\"/s"
},
"backend": "http://arcangle:8881",
"log": "/servers/kokoro-reader/kokoro-reader.log"
}
},
"nvidia": {
"available": true,
"gpus": [
{
"name": "NVIDIA GeForce RTX 5090",
"memory_used_mb": "28393",
"memory_total_mb": "32607",
"gpu_util_percent": "0",
"power_draw_w": "11.91",
"power_limit_w": "575.00",
"temperature_c": "40",
"memory_used_percent": 87.08,
"power_percent": 2.07,
"util_percent": 0.0
}
],
"processes": {
"available": true,
"items": [
{
"pid": 1157395,
"process_name": "/home/ubuntu/speaches/.venv/bin/python",
"friendly_name": "Whisper dictation",
"used_gpu_memory_mb": 1030.0,
"gpu_capacity_mb": 32607.0,
"used_gpu_memory_percent": 3.16,
"ps_aux": "uriel 1157395 0.5 4.3 100794656 5751292 ? SLsl Jun30 4:29 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app"
},
{
"pid": 1434466,
"process_name": "/servers/llcpp6/build/bin/llama-server",
"friendly_name": "llama.cpp",
"used_gpu_memory_mb": 27334.0,
"gpu_capacity_mb": 32607.0,
"used_gpu_memory_percent": 83.83,
"ps_aux": "uriel 1434466 2.5 15.7 99724352 20720796 ? Sl Jun30 18:31 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {\"preserve_thinking\":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20"
}
]
}
},
"routing": {
"gpu_owner": "llama.cpp + Whisper",
"inference_route": "gpu / running",
"voice_route": "gpu / running",
"tts_route": "speaches / running"
},
"progress": {
"active": false,
"percent": 100,
"label": "llama gpu health check passed",
"operations": []
},
"model_cache": {
"comfy_cpu_limit_percent": 80,
"enabled": false,
"gpu_util_limit_percent": 40,
"guard": {
"blocked": false,
"blocked_reasons": [],
"comfy_cpu_percent": 0.0,
"gpu_util_percent": 0.0
},
"guard_window_seconds": 120,
"interval_seconds": 120,
"last_blocked_at": "2026-07-01T03:26:05-04:00",
"last_exit_code": 0,
"last_finished_at": "2026-06-12T19:04:02-04:00",
"last_run_at": "2026-06-12T19:04:02-04:00",
"last_started_at": "2026-06-12T19:04:01-04:00",
"lists": {
"inference": "/servers/run/cache/inference-active.txt",
"stt": "/servers/run/cache/stt.txt",
"tts": "/servers/run/cache/tts.txt"
},
"message": "model cache disabled",
"next_run_due_at": "2026-06-12T19:06:02-04:00",
"running": false,
"script": "/servers/run/warm_models.sh",
"settings": {
"enabled": false,
"interval_seconds": 120
},
"updated_at": "2026-07-01T03:37:26-04:00"
},
"startup_services": {
"file": "/servers/run/state/startup-services.json",
"services": [
{
"key": "inference",
"label": "Inference",
"description": "Start selected llama GPU model when ArcControl starts.",
"action": "llama-gpu",
"enabled": true
},
{
"key": "speech_input",
"label": "Dictation",
"description": "Start GPU dictation/Speaches when ArcControl starts.",
"action": "voice-gpu",
"enabled": false
},
{
"key": "text_to_speech",
"label": "Standalone Kokoro",
"description": "Start standalone Kokoro GPU service when ArcControl starts.",
"action": "kokoro-gpu",
"enabled": false
},
{
"key": "sidecars.llama_proxy",
"label": "Llama Proxy",
"description": "Start llama logging proxy when ArcControl starts.",
"action": "llama-proxy-start",
"enabled": true
},
{
"key": "sidecars.ttscleaner",
"label": "TTS Cleaner",
"description": "Start TTS Cleaner when ArcControl starts.",
"action": "ttscleaner-start",
"enabled": true
},
{
"key": "sidecars.kokoro_reader",
"label": "Kokoro Reader",
"description": "Start Kokoro Reader when ArcControl starts.",
"action": "kokoro-reader-start",
"enabled": true
},
{
"key": "gpu_workload.comfy",
"label": "ComfyUI",
"description": "Start ComfyUI when ArcControl starts.",
"action": "gpu-workload-launch-comfy",
"enabled": false
}
],
"updated_at": "2026-06-30T15:01:58-04:00"
},
"model_profiles": {
"active": "qwen36-27b-neocoder-q4-preserve",
"active_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
"default": "qwen36-27b-heretic-preserve",
"default_label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
"items": [
{
"slug": "gemma4-12b-q4-heretic",
"label": "Gemma 4 12B Q4 Heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/gemma4-12b-q4-heretic",
"model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
"size_gb": 6.9,
"size_label": "6.9 GB",
"cache_files": [
"/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
"/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
],
"path": "/servers/run/model-profiles/gemma4-12b-q4-heretic.json"
},
{
"slug": "gemma4-12b-q6-heretic",
"label": "Gemma 4 12B Q6 Heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/gemma4-12b-q6-heretic",
"model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
"size_gb": 9.1,
"size_label": "9.1 GB",
"cache_files": [
"/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
"/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
],
"path": "/servers/run/model-profiles/gemma4-12b-q6-heretic.json"
},
{
"slug": "gemma4-26b-a4b-heretic",
"label": "gemma4-26B-A4B-heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/gemma4-26b-a4b-heretic",
"model_path": "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
"size_gb": 15.6,
"size_label": "15.6 GB",
"cache_files": [
"/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
"/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.mmproj-f16.gguf"
],
"path": "/servers/run/model-profiles/gemma4-26b-a4b-heretic.json"
},
{
"slug": "qwen36-27b-heretic-preserve",
"label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
"description": "",
"active": false,
"default": true,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-heretic-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"size_gb": 16.0,
"size_label": "16.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-heretic-preserve.json"
},
{
"slug": "qwen36-27b-heretic",
"label": "Qwen 3.6 27B Heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-heretic",
"model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"size_gb": 16.0,
"size_label": "16.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-heretic.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-nl-preserve",
"label": "Qwen 3.6 27B NeoCode IQ4_NL (Preserve Thinking)",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"size_gb": 15.0,
"size_label": "15.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-nl",
"label": "Qwen 3.6 27B NeoCode IQ4_NL",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"size_gb": 15.0,
"size_label": "15.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-xs-preserve",
"label": "Qwen 3.6 27B NeoCode IQ4_XS (Preserve Thinking)",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"size_gb": 14.3,
"size_label": "14.3 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-xs",
"label": "Qwen 3.6 27B NeoCode IQ4_XS",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"size_gb": 14.3,
"size_label": "14.3 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs.json"
},
{
"slug": "qwen36-27b-neocoder-q4-preserve",
"label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
"description": "",
"active": true,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q4-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"size_gb": 15.7,
"size_label": "15.7 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-q4",
"label": "Qwen 3.6 27B NeoCode Q4",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q4",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"size_gb": 15.7,
"size_label": "15.7 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4.json"
},
{
"slug": "qwen36-27b-neocoder-q6-preserve",
"label": "Qwen 3.6 27B NeoCode Q6 (Preserve Thinking)",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q6-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"size_gb": 20.9,
"size_label": "20.9 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-q6",
"label": "Qwen 3.6 27B NeoCode Q6",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q6",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"size_gb": 20.9,
"size_label": "20.9 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6.json"
},
{
"slug": "qwen36-unc",
"label": "Qwen3.6 35B Uncensored",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-unc",
"model_path": "/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
"size_gb": 19.7,
"size_label": "19.7 GB",
"cache_files": [
"/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
"/models/Q3.6/mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-unc.json"
}
],
"dir": "/servers/run/model-profiles",
"state_file": "/servers/run/state/active-model-profile.json",
"default_state_file": "/servers/run/state/default-model-profile.json",
"cache_file": "/servers/run/cache/inference-active.txt"
},
"openclaw_sync": {
"status": "synchronized",
"expected_model": "qwen36-27b-neocoder-q4-preserve",
"loaded_model": "qwen36-27b-neocoder-q4-preserve",
"loaded_models": [
"qwen36-27b-neocoder-q4-preserve"
],
"expected_openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
"openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
"synchronized": true,
"error": "",
"sessions": {
"stale_local": 0,
"stale_keys": [],
"cloud_overrides": 1,
"checked": 42
},
"state_file": "/servers/run/state/openclaw-sync-state.json"
},
"state_file": "/servers/run/state/system-state.json",
"updated_at": "2026-07-01T03:37:29-04:00",
"profiles": {
"active": "custom",
"active_label": "Custom Profile",
"items": [
{
"slug": "gpu-inference",
"label": "GPU Inference",
"description": "GPU llama, GPU dictation, GPU Kokoro; unload Comfy models if Comfy is running; model cache off.",
"desired": {
"llama": "gpu",
"voice": "gpu",
"kokoro": "gpu",
"comfy": "unload_models",
"model_cache": false
},
"momentary": false,
"active": false
},
{
"slug": "gpu-turbo",
"label": "GPU Turbo",
"description": "Experimental turboquant llama profile; GPU dictation and Kokoro; unload Comfy models if Comfy is running; model cache off.",
"desired": {
"llama": "gpu-turbo",
"voice": "gpu",
"kokoro": "gpu",
"comfy": "unload_models",
"model_cache": false
},
"momentary": false,
"active": false
},
{
"slug": "comfy-work",
"label": "Comfy Work",
"description": "CPU inference, CPU-lite dictation, Comfy running, model files kept warm.",
"desired": {
"llama": "cpu",
"voice": "cpu-lite",
"kokoro": "stopped",
"comfy": "running",
"model_cache": true
},
"momentary": false,
"active": false
},
{
"slug": "free-vram",
"label": "Free VRAM",
"description": "One-shot VRAM cleanup: CPU inference and dictation, Kokoro stopped, Comfy kept running with models unloaded.",
"desired": {
"llama": "cpu",
"voice": "cpu-lite",
"kokoro": "stopped",
"comfy": "unload_models",
"model_cache": true
},
"momentary": true,
"active": false
}
],
"watch": {
"enabled": false,
"profile": "gpu-inference",
"updated_at": "2026-06-03T18:03:26-04:00"
},
"file": "/servers/run/state/profiles.json"
}
}