Workflows
ProfilesCustom Profile
Model ProfileQwen 3.6 27B NeoCode Q4 (Preserve Thinking)
Default: Qwen 3.6 27B Heretic (Preserve Thinking)
Loaded: qwen36-27b-neocoder-q4-preserve
OpenClaw: arc/qwen36-27b-neocoder-q4-preserve
Synchronization: synchronized
API Reference
Base URL: http://arcangle:10101
Action endpoints accept GET or POST; POST is preferred for automation.
Status:
GET http://arcangle:10101/api/status
Current dashboard state.
GET http://arcangle:10101/state.json
Alias for current dashboard state.
GET http://arcangle:10101/api/gpu-history?minutes=60
GPU telemetry history.
GET http://arcangle:10101/api/profiles
Available desired-state profiles and active match.
GET http://arcangle:10101/api/model-profiles
Available llama model profiles and selected model.
GET http://arcangle:10101/api/openclaw-sync
ArcControl/OpenClaw model synchronization state.
GET http://arcangle:10101/api/gpu-workloads
Registered full-use GPU workloads.
GET http://arcangle:10101/api/startup-services
Card services configured to start with ArcControl.
POST http://arcangle:10101/api/startup-services
Set whether a card service starts with ArcControl; pass key and enabled.
Inference:
POST http://arcangle:10101/api/model-profile
Select llama model profile; pass slug and optional restart.
POST http://arcangle:10101/api/model-profile-default
Set the default llama model profile used when active state is missing.
POST http://arcangle:10101/api/openclaw-sync
Retry OpenClaw sync for the currently loaded ArcControl model.
POST http://arcangle:10101/api/do/llama-gpu
Switch llama.cpp to regular GPU mode.
POST http://arcangle:10101/api/do/llama-gpu-turbo
Switch llama.cpp to GPU turboquant mode.
POST http://arcangle:10101/api/do/llama-cpu
Switch llama.cpp to CPU mode.
POST http://arcangle:10101/api/do/llama-stop
Stop llama.cpp inference.
POST http://arcangle:10101/api/do/llama-bounce
Restart the current llama.cpp mode.
POST http://arcangle:10101/api/bounce/llama
Alias: restart the current llama.cpp mode.
Whisper / Dictation:
POST http://arcangle:10101/api/do/voice-gpu
Start GPU dictation stack and Kokoro GPU.
POST http://arcangle:10101/api/do/voice-cpu
Start CPU tiny dictation on port 8000.
POST http://arcangle:10101/api/do/voice-stop
Stop dictation and Kokoro.
POST http://arcangle:10101/api/do/voice-bounce
Restart the current dictation mode.
POST http://arcangle:10101/api/bounce/voice
Alias: restart the current dictation mode.
Kokoro TTS:
POST http://arcangle:10101/api/do/kokoro-gpu
Start Kokoro TTS in GPU mode.
POST http://arcangle:10101/api/do/kokoro-cpu
Start Kokoro TTS in CPU mode.
POST http://arcangle:10101/api/do/kokoro-stop
Stop Kokoro TTS.
POST http://arcangle:10101/api/do/kokoro-bounce
Restart the current standalone Kokoro TTS mode.
POST http://arcangle:10101/api/bounce/kokoro
Alias: restart the current standalone Kokoro TTS mode.
Sidecars:
POST http://arcangle:10101/api/do/llama-proxy-start
Start llama logging proxy on port 19090.
POST http://arcangle:10101/api/do/llama-proxy-stop
Stop llama logging proxy on port 19090.
POST http://arcangle:10101/api/do/ttscleaner-start
Start TTS Cleaner proxy on port 8881.
POST http://arcangle:10101/api/do/ttscleaner-stop
Stop TTS Cleaner proxy on port 8881.
POST http://arcangle:10101/api/do/kokoro-reader-start
Start Kokoro Reader on port 9999.
POST http://arcangle:10101/api/do/kokoro-reader-stop
Stop Kokoro Reader on port 9999.
POST http://arcangle:10101/api/do/llama-proxy-bounce
Restart the llama logging proxy.
POST http://arcangle:10101/api/do/ttscleaner-bounce
Restart TTS Cleaner.
POST http://arcangle:10101/api/do/kokoro-reader-bounce
Restart Kokoro Reader.
POST http://arcangle:10101/api/bounce/llama-proxy
Alias: restart the llama logging proxy.
POST http://arcangle:10101/api/bounce/ttscleaner
Alias: restart TTS Cleaner.
POST http://arcangle:10101/api/bounce/kokoro-reader
Alias: restart Kokoro Reader.
GPU Workloads:
POST http://arcangle:10101/api/do/gpu-workload-launch-comfy
Start ComfyUI in place without changing other services.
POST http://arcangle:10101/api/do/gpu-workload-start-comfy
Make room and start ComfyUI.
POST http://arcangle:10101/api/do/gpu-workload-stop-comfy
Stop ComfyUI.
POST http://arcangle:10101/api/do/gpu-workload-bounce-comfy
Restart ComfyUI in place.
POST http://arcangle:10101/api/bounce/gpu-workload/comfy
Alias: restart ComfyUI in place.
POST http://arcangle:10101/api/do/comfy-start
Alias: start ComfyUI in place.
POST http://arcangle:10101/api/do/comfy-start-managed
Alias: make room and start ComfyUI.
POST http://arcangle:10101/api/do/comfy-free
Ask ComfyUI to unload models and free memory.
POST http://arcangle:10101/api/do/comfy-stop
Alias: stop ComfyUI.
Workflows:
POST http://arcangle:10101/api/do/profile-gpu-inference
Apply GPU Inference profile.
POST http://arcangle:10101/api/do/profile-gpu-turbo
Apply experimental GPU Turbo profile.
POST http://arcangle:10101/api/do/profile-comfy-work
Apply Comfy Work profile.
POST http://arcangle:10101/api/do/profile-free-vram
Apply Free VRAM profile.
POST http://arcangle:10101/api/do/retask-gpu
CPU inference, CPU dictation, start ComfyUI.
POST http://arcangle:10101/api/do/restore-gpu
Unload ComfyUI models, restore GPU inference and voice.
POST http://arcangle:10101/api/do/free-vram
CPU inference and CPU dictation, unload ComfyUI models without stopping ComfyUI.
Model Cache:
GET http://arcangle:10101/api/model-cache
Current inference/voice model cache retouch settings and runtime state.
POST http://arcangle:10101/api/model-cache
Set enabled and interval_seconds for warm_models.sh retouching.
POST http://arcangle:10101/api/do/model-cache-warm
Run warm_models.sh immediately.
Logs:
GET http://arcangle:10101/logs?service=llama&lines=240
Inference server log.
GET http://arcangle:10101/logs?service=voice&lines=120
Voice orchestration log.
GET http://arcangle:10101/logs?service=kokoro&lines=120
Kokoro log.
GET http://arcangle:10101/logs?service=kokoro_reader&lines=120
Kokoro Reader log.
GET http://arcangle:10101/logs?service=comfy&lines=120
ComfyUI log.
Examples:
curl -X POST http://arcangle:10101/api/do/llama-gpu-turbo
curl -X POST http://arcangle:10101/api/do/llama-cpu
curl -X POST http://arcangle:10101/api/do/llama-stop
curl -X POST http://arcangle:10101/api/do/voice-cpu
curl -X POST http://arcangle:10101/api/do/kokoro-stop
curl http://arcangle:10101/api/status
Cache Inference/Voice Models
model cache disabled
GPU ownerllama.cpp + Whisper
Inference routegpu / running
Voice routegpu / running
TTS routespeaches / running
llama gpu health check passed
0 operation(s)
GPU Load / VRAM Pressure - 5 Min
no samples · 0 samples
VRAM pressure GPU load Power
GPU Load / VRAM Pressure - 1 Hour
no samples · 0 samples
VRAM pressure GPU load Power
GPU Processes
| PID | Name | VRAM MB |
|---|---|---|
| 1157395 | Whisper dictation | 1030 MB / 3.16% |
uriel 1157395 0.5 4.3 100794656 5751288 ? SLsl Jun30 4:41 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app |
||
| 1434466 | llama.cpp | 27334 MB / 83.83% |
uriel 1434466 2.8 16.0 100095040 21106456 ? Sl Jun30 22:35 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {"preserve_thinking":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20 |
||
Inference
Statusrunning
Modegpu
HTTPok
PID1156162
Health19091
Start with ArcControl
Dictation
Statusrunning
Modegpu
Port8000
Start with ArcControl
TTS
Statusstopped
Modestopped
PIDNone
Health8880
Start with ArcControl
Llama Proxy
Statusrunning
Port19090
Backendhttp://arcangle:19091
PID1650305
Start with ArcControl
TTS Cleaner
Statusrunning
Port8881
LLMhttp://arcangle:19090
PID3999358
Start with ArcControl
Kokoro Reader
Statusrunning
Port9999
TTS APIhttp://arcangle:8881
PID14846
Start with ArcControl
ComfyUI
Statusstopped
Modestopped
PID
URL8188
Start with ArcControl
Inference Log
[35177] 767.54.752.059 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 [35177] 767.54.754.439 I reasoning-budget: activated, budget=0 tokens [35177] 767.54.754.441 I reasoning-budget: budget=0, forcing immediately [35177] 767.54.754.441 I reasoning-budget: forced sequence complete, done [35177] 767.54.754.558 I slot launch_slot_: id 0 | task 45479 | processing task, is_child = 0 [35177] 767.55.698.807 I slot print_timing: id 0 | task 45479 | prompt eval time = 75.22 ms / 27 tokens ( 2.79 ms per token, 358.97 tokens per second) [35177] 767.55.698.810 I slot print_timing: id 0 | task 45479 | eval time = 869.01 ms / 57 tokens ( 15.25 ms per token, 65.59 tokens per second) [35177] 767.55.698.810 I slot print_timing: id 0 | task 45479 | total time = 944.22 ms / 84 tokens [35177] 767.55.698.811 I slot print_timing: id 0 | task 45479 | graphs reused = 43670 [35177] 767.55.700.088 I slot release: id 0 | task 45479 | stop processing: n_tokens = 43014, truncated = 0 [35177] 767.55.700.284 I srv update_slots: all slots are idle 792.58.655.693 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 768.13.542.376 I srv params_from_: Chat format: peg-native [35177] 768.13.587.382 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 [35177] 768.13.589.675 I reasoning-budget: activated, budget=0 tokens [35177] 768.13.589.677 I reasoning-budget: budget=0, forcing immediately [35177] 768.13.589.677 I reasoning-budget: forced sequence complete, done [35177] 768.13.589.780 I slot launch_slot_: id 0 | task 45539 | processing task, is_child = 0 [35177] 768.14.878.815 I slot print_timing: id 0 | task 45539 | prompt eval time = 84.39 ms / 60 tokens ( 1.41 ms per token, 711.02 tokens per second) [35177] 768.14.878.819 I slot print_timing: id 0 | task 45539 | eval time = 1204.63 ms / 80 tokens ( 15.06 ms per token, 66.41 tokens per second) [35177] 768.14.878.820 I slot print_timing: id 0 | task 45539 | total time = 1289.01 ms / 140 tokens [35177] 768.14.878.820 I slot print_timing: id 0 | task 45539 | graphs reused = 43748 [35177] 768.14.880.076 I slot release: id 0 | task 45539 | stop processing: n_tokens = 43153, truncated = 0 [35177] 768.14.880.272 I srv update_slots: all slots are idle 793.00.328.693 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 768.15.228.704 I srv params_from_: Chat format: peg-native [35177] 768.15.282.367 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 1.000 [35177] 768.15.285.056 I reasoning-budget: activated, budget=0 tokens [35177] 768.15.285.059 I reasoning-budget: budget=0, forcing immediately [35177] 768.15.285.059 I reasoning-budget: forced sequence complete, done [35177] 768.15.285.183 I slot launch_slot_: id 0 | task 45622 | processing task, is_child = 0 [35177] 768.15.366.817 I slot create_check: id 0 | task 45622 | created context checkpoint 10 of 32 (pos_min = 43154, pos_max = 43154, n_tokens = 43155, size = 149.626 MiB) [35177] 768.16.637.466 I slot print_timing: id 0 | task 45622 | prompt eval time = 140.72 ms / 57 tokens ( 2.47 ms per token, 405.05 tokens per second) [35177] 768.16.637.469 I slot print_timing: id 0 | task 45622 | eval time = 1211.54 ms / 80 tokens ( 15.14 ms per token, 66.03 tokens per second) [35177] 768.16.637.469 I slot print_timing: id 0 | task 45622 | total time = 1352.27 ms / 137 tokens [35177] 768.16.637.470 I slot print_timing: id 0 | task 45622 | graphs reused = 43825 [35177] 768.16.638.700 I slot release: id 0 | task 45622 | stop processing: n_tokens = 43289, truncated = 0 [35177] 768.16.638.885 I srv update_slots: all slots are idle 793.02.158.713 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 768.17.050.717 I srv params_from_: Chat format: peg-native [35177] 768.17.102.733 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000 [35177] 768.17.105.080 I reasoning-budget: activated, budget=0 tokens [35177] 768.17.105.082 I reasoning-budget: budget=0, forcing immediately [35177] 768.17.105.082 I reasoning-budget: forced sequence complete, done [35177] 768.17.105.197 I slot launch_slot_: id 0 | task 45705 | processing task, is_child = 0 [35177] 768.17.256.573 I slot create_check: id 0 | task 45705 | created context checkpoint 11 of 32 (pos_min = 43466, pos_max = 43466, n_tokens = 43467, size = 149.626 MiB) [35177] 768.18.810.441 I slot print_timing: id 0 | task 45705 | n_decoded = 100, tg = 65.30 t/s [35177] 768.20.103.277 I slot print_timing: id 0 | task 45705 | prompt eval time = 173.74 ms / 182 tokens ( 0.95 ms per token, 1047.52 tokens per second) [35177] 768.20.103.281 I slot print_timing: id 0 | task 45705 | eval time = 2824.32 ms / 185 tokens ( 15.27 ms per token, 65.50 tokens per second) [35177] 768.20.103.281 I slot print_timing: id 0 | task 45705 | total time = 2998.06 ms / 367 tokens [35177] 768.20.103.282 I slot print_timing: id 0 | task 45705 | graphs reused = 44007 [35177] 768.20.104.558 I slot release: id 0 | task 45705 | stop processing: n_tokens = 43655, truncated = 0 [35177] 768.20.104.764 I srv update_slots: all slots are idle 793.06.770.559 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 768.21.630.462 I srv params_from_: Chat format: peg-native [35177] 768.21.680.136 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.845 (> 0.100 thold), f_keep = 0.716 [35177] 768.21.682.419 I reasoning-budget: activated, budget=0 tokens [35177] 768.21.682.421 I reasoning-budget: budget=0, forcing immediately [35177] 768.21.682.422 I reasoning-budget: forced sequence complete, done [35177] 768.21.682.531 I slot launch_slot_: id 0 | task 45893 | processing task, is_child = 0 [35177] 768.21.682.555 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [43466, 43466] against 31267... [35177] 768.21.682.557 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [43154, 43154] against 31267... [35177] 768.21.682.557 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [42815, 42815] against 31267... [35177] 768.21.682.558 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [42456, 42456] against 31267... [35177] 768.21.682.558 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [42188, 42188] against 31267... [35177] 768.21.682.559 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [41914, 41914] against 31267... [35177] 768.21.682.559 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [41602, 41602] against 31267... [35177] 768.21.682.560 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [41225, 41225] against 31267... [35177] 768.21.682.560 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [40713, 40713] against 31267... [35177] 768.21.682.560 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [40258, 40258] against 31267... [35177] 768.21.682.561 I slot update_slots: id 0 | task 45893 | Checking checkpoint with [39925, 39925] against 31267... [35177] 768.21.682.561 W slot update_slots: id 0 | task 45893 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055) [35177] 768.21.682.566 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 39925, pos_max = 39925, n_tokens = 39926, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.693.936 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 40258, pos_max = 40258, n_tokens = 40259, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.707.553 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 40713, pos_max = 40713, n_tokens = 40714, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.720.686 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 41225, pos_max = 41225, n_tokens = 41226, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.734.447 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 41602, pos_max = 41602, n_tokens = 41603, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.744.172 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 41914, pos_max = 41914, n_tokens = 41915, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.753.458 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 42188, pos_max = 42188, n_tokens = 42189, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.765.676 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 42456, pos_max = 42456, n_tokens = 42457, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.776.975 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 42815, pos_max = 42815, n_tokens = 42816, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.788.948 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 43154, pos_max = 43154, n_tokens = 43155, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.21.800.354 W slot update_slots: id 0 | task 45893 | erased invalidated context checkpoint (pos_min = 43466, pos_max = 43466, n_tokens = 43467, n_swa = 0, pos_next = 0, size = 149.626 MiB) [35177] 768.25.170.633 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 12288, progress = 0.33, t = 3.49 s / 3522.85 tokens per second [35177] 768.25.775.502 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 14336, progress = 0.39, t = 4.09 s / 3502.60 tokens per second [35177] 768.26.391.021 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 16384, progress = 0.44, t = 4.71 s / 3479.68 tokens per second [35177] 768.27.013.537 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 18432, progress = 0.50, t = 5.33 s / 3457.52 tokens per second [35177] 768.27.646.817 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 20480, progress = 0.55, t = 5.96 s / 3433.78 tokens per second [35177] 768.28.288.980 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 22528, progress = 0.61, t = 6.61 s / 3410.01 tokens per second [35177] 768.28.942.291 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 24576, progress = 0.66, t = 7.26 s / 3385.24 tokens per second [35177] 768.29.610.285 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 26624, progress = 0.72, t = 7.93 s / 3358.33 tokens per second [35177] 768.30.296.166 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 28672, progress = 0.77, t = 8.61 s / 3328.68 tokens per second [35177] 768.30.997.546 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 30720, progress = 0.83, t = 9.32 s / 3297.91 tokens per second [35177] 768.31.715.129 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 32768, progress = 0.89, t = 10.03 s / 3266.16 tokens per second [35177] 768.32.453.339 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 34816, progress = 0.94, t = 10.77 s / 3232.44 tokens per second [35177] 768.33.149.036 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 36499, progress = 0.99, t = 11.47 s / 3183.10 tokens per second [35177] 768.33.282.533 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 36975, progress = 1.00, t = 11.60 s / 3187.50 tokens per second [35177] 768.33.361.321 I slot create_check: id 0 | task 45893 | created context checkpoint 1 of 32 (pos_min = 36974, pos_max = 36974, n_tokens = 36975, size = 149.626 MiB) [35177] 768.33.381.838 I slot print_timing: id 0 | task 45893 | prompt processing, n_tokens = 37011, progress = 1.00, t = 11.70 s / 3163.52 tokens per second [35177] 768.33.491.888 I slot print_timing: id 0 | task 45893 | prompt eval time = 11730.70 ms / 37015 tokens ( 0.32 ms per token, 3155.39 tokens per second) [35177] 768.33.491.891 I slot print_timing: id 0 | task 45893 | eval time = 78.64 ms / 6 tokens ( 13.11 ms per token, 76.30 tokens per second) [35177] 768.33.491.891 I slot print_timing: id 0 | task 45893 | total time = 11809.34 ms / 37021 tokens [35177] 768.33.491.892 I slot print_timing: id 0 | task 45893 | graphs reused = 44011 [35177] 768.33.493.020 I slot release: id 0 | task 45893 | stop processing: n_tokens = 37020, truncated = 0 [35177] 768.33.493.208 I srv update_slots: all slots are idle 820.19.685.429 I srv proxy_reques: proxying request to model qwen36-27b-neocoder-q4-preserve on port 35177 [35177] 795.34.541.915 I srv params_from_: Chat format: peg-native [35177] 795.34.590.685 I slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.999 (> 0.100 thold), f_keep = 0.999 [35177] 795.34.593.418 I reasoning-budget: activated, budget=0 tokens [35177] 795.34.593.420 I reasoning-budget: budget=0, forcing immediately [35177] 795.34.593.420 I reasoning-budget: forced sequence complete, done [35177] 795.34.593.523 I slot launch_slot_: id 0 | task 45920 | processing task, is_child = 0 [35177] 795.34.593.549 I slot update_slots: id 0 | task 45920 | Checking checkpoint with [36974, 36974] against 36995... [35177] 795.34.616.337 W slot update_slots: id 0 | task 45920 | restored context checkpoint (pos_min = 36974, pos_max = 36974, n_tokens = 36975, n_past = 36975, size = 149.626 MiB) [35177] 795.34.751.012 I slot print_timing: id 0 | task 45920 | prompt eval time = 75.95 ms / 40 tokens ( 1.90 ms per token, 526.64 tokens per second) [35177] 795.34.751.016 I slot print_timing: id 0 | task 45920 | eval time = 81.52 ms / 6 tokens ( 13.59 ms per token, 73.60 tokens per second) [35177] 795.34.751.016 I slot print_timing: id 0 | task 45920 | total time = 157.47 ms / 46 tokens [35177] 795.34.751.017 I slot print_timing: id 0 | task 45920 | graphs reused = 44015 [35177] 795.34.752.409 I slot release: id 0 | task 45920 | stop processing: n_tokens = 37020, truncated = 0 [35177] 795.34.752.636 I srv update_slots: all slots are idle
Other Logs
2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:13-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:16-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:18-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:21-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:23-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:24-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:26-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:27-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:27-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:27-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:28-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:28-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:30-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:30-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:30-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:31-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:31-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=60 HTTP/1.1" 200 - 2026-07-01T04:42:33-04:00 http 127.0.0.1 "GET /api/gpu-history?minutes=5 HTTP/1.1" 200 - 2026-07-01T04:42:34-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /favicon/favicon.ico HTTP/1.1" 200 - 2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 - 2026-07-01T04:42:36-04:00 http 127.0.0.1 "GET /api/status HTTP/1.1" 200 -
State File
{
"transitioning": false,
"gpu_state": "inference_available",
"message": "llama gpu health check passed",
"inference": {
"status": "running",
"mode": "gpu",
"pid": 1156162,
"launcher_pid": null,
"pgid": 1156162,
"port": 19091,
"url": "http://arcangle:19091",
"health_url": "http://arcangle:19091/health",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "{\"status\":\"ok\"}"
},
"log": "/servers/run/state/logs/llama.log",
"model_profile": "qwen36-27b-neocoder-q4-preserve",
"model_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf"
},
"speech_input": {
"status": "running",
"mode": "gpu",
"port": 8000,
"url": "http://arcangle:8000",
"tts": {
"service": "kokoro",
"status": "stopped",
"mode": "stopped",
"pid": null,
"launcher_pid": null,
"pgid": 1776540,
"port": 8880,
"url": "http://arcangle:8880",
"health_url": "http://arcangle:8880/health",
"port_listening": false,
"http_ok": false,
"http_probe": {
"ok": false,
"error": "port closed"
},
"log": "/servers/run/state/logs/kokoro.log"
},
"note": "Port 8000 remains the dictation endpoint; GPU dictation is served by /servers/speaches.",
"log": "/servers/run/state/logs/voice.log",
"containers": {
"vad-shim": false,
"speaches": true,
"whisper-stt": false
}
},
"text_to_speech": {
"service": "kokoro",
"status": "stopped",
"mode": "stopped",
"pid": null,
"launcher_pid": null,
"pgid": 1776540,
"port": 8880,
"url": "http://arcangle:8880",
"health_url": "http://arcangle:8880/health",
"port_listening": false,
"http_ok": false,
"http_probe": {
"ok": false,
"error": "port closed"
},
"log": "/servers/run/state/logs/kokoro.log"
},
"gpu_workload": {
"service": null,
"status": "stopped",
"mode": "stopped",
"pid": null,
"pgid": null,
"port": 8188,
"url": "http://arcangle:8188",
"log": "/servers/run/state/logs/comfy.log"
},
"gpu_workloads": {
"file": "/servers/run/state/gpu-workloads.json",
"items": [
{
"label": "ComfyUI",
"slug": "comfy",
"description": "ComfyUI image/video workflow",
"full_use": true,
"status": "stopped",
"mode": "stopped",
"pid": null,
"pgid": null,
"port": 8188,
"url": "http://arcangle:8188",
"log": "/servers/run/state/logs/comfy.log"
}
]
},
"sidecars": {
"llama_proxy": {
"service": "llama_proxy",
"label": "Llama Proxy",
"status": "running",
"mode": "proxy",
"pid": 1650305,
"pgid": 1650305,
"port": 19090,
"url": "http://arcangle:19090",
"health_url": "http://arcangle:19090/health",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "{\"status\":\"ok\"}"
},
"backend": "http://arcangle:19091",
"log": "/servers/run/state/logs/llama-proxy.log"
},
"ttscleaner": {
"service": "ttscleaner",
"label": "TTS Cleaner",
"status": "running",
"mode": "proxy",
"pid": 3999358,
"port": 8881,
"url": "http://arcangle:8881",
"health_url": "http://arcangle:8881/health",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "{\"message\":\"OK\"}"
},
"backend": "http://arcangle:8000",
"llm_backend": "http://arcangle:19090",
"log": "/servers/ttscleaner/proxy.log"
},
"kokoro_reader": {
"service": "kokoro_reader",
"label": "Kokoro Reader",
"status": "running",
"mode": "reader",
"pid": 14846,
"port": 9999,
"url": "http://arcangle:9999",
"health_url": "http://arcangle:9999",
"port_listening": true,
"http_ok": true,
"http_probe": {
"ok": true,
"status": 200,
"body": "<!doctype html>\n<html lang=\"en\">\n <head>\n <meta charset=\"utf-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n <meta name=\"theme-color\" content=\"#101820\">\n <title>Kokoro Reader</title>\n <link rel=\"manifest\" href=\"/s"
},
"backend": "http://arcangle:8881",
"log": "/servers/kokoro-reader/kokoro-reader.log"
}
},
"nvidia": {
"available": true,
"gpus": [
{
"name": "NVIDIA GeForce RTX 5090",
"memory_used_mb": "28393",
"memory_total_mb": "32607",
"gpu_util_percent": "0",
"power_draw_w": "58.87",
"power_limit_w": "575.00",
"temperature_c": "41",
"memory_used_percent": 87.08,
"power_percent": 10.24,
"util_percent": 0.0
}
],
"processes": {
"available": true,
"items": [
{
"pid": 1157395,
"process_name": "/home/ubuntu/speaches/.venv/bin/python",
"friendly_name": "Whisper dictation",
"used_gpu_memory_mb": 1030.0,
"gpu_capacity_mb": 32607.0,
"used_gpu_memory_percent": 3.16,
"ps_aux": "uriel 1157395 0.5 4.3 100794656 5751288 ? SLsl Jun30 4:41 /home/ubuntu/speaches/.venv/bin/python /home/ubuntu/speaches/.venv/bin/uvicorn --factory speaches.main:create_app"
},
{
"pid": 1434466,
"process_name": "/servers/llcpp6/build/bin/llama-server",
"friendly_name": "llama.cpp",
"used_gpu_memory_mb": 27334.0,
"gpu_capacity_mb": 32607.0,
"used_gpu_memory_percent": 83.83,
"ps_aux": "uriel 1434466 2.8 16.0 100095040 21106456 ? Sl Jun30 22:35 /servers/llcpp6/build/bin/llama-server --cache-idle-slots --chat-template-file /models/Qwen-3.6-27b-heretic-neocode/chat_template.jinja --chat-template-kwargs {\"preserve_thinking\":true} --context-shift --host 127.0.0.1 --jinja --keep 4096 --mmproj-offload --op-offload --port 35177 --slot-save-path /servers/run/state/llama-slot-cache --no-ui --warmup --alias qwen36-27b-neocoder-q4-preserve --ctx-size 262144 --cache-ram 32768 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn auto --kv-unified --log-verbosity 3 --model /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf --mmproj /models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf --n-gpu-layers 999 --parallel 1 --reasoning on --threads 20 --threads-batch 20"
}
]
}
},
"routing": {
"gpu_owner": "llama.cpp + Whisper",
"inference_route": "gpu / running",
"voice_route": "gpu / running",
"tts_route": "speaches / running"
},
"progress": {
"active": false,
"percent": 100,
"label": "llama gpu health check passed",
"operations": []
},
"model_cache": {
"comfy_cpu_limit_percent": 80,
"enabled": false,
"gpu_util_limit_percent": 40,
"guard": {
"blocked": false,
"blocked_reasons": [],
"comfy_cpu_percent": 0.0,
"gpu_util_percent": 0.0
},
"guard_window_seconds": 120,
"interval_seconds": 120,
"last_blocked_at": "2026-07-01T04:15:30-04:00",
"last_exit_code": 0,
"last_finished_at": "2026-06-12T19:04:02-04:00",
"last_run_at": "2026-06-12T19:04:02-04:00",
"last_started_at": "2026-06-12T19:04:01-04:00",
"lists": {
"inference": "/servers/run/cache/inference-active.txt",
"stt": "/servers/run/cache/stt.txt",
"tts": "/servers/run/cache/tts.txt"
},
"message": "model cache disabled",
"next_run_due_at": "2026-06-12T19:06:02-04:00",
"running": false,
"script": "/servers/run/warm_models.sh",
"settings": {
"enabled": false,
"interval_seconds": 120
},
"updated_at": "2026-07-01T04:42:33-04:00"
},
"startup_services": {
"file": "/servers/run/state/startup-services.json",
"services": [
{
"key": "inference",
"label": "Inference",
"description": "Start selected llama GPU model when ArcControl starts.",
"action": "llama-gpu",
"enabled": true
},
{
"key": "speech_input",
"label": "Dictation",
"description": "Start GPU dictation/Speaches when ArcControl starts.",
"action": "voice-gpu",
"enabled": false
},
{
"key": "text_to_speech",
"label": "Standalone Kokoro",
"description": "Start standalone Kokoro GPU service when ArcControl starts.",
"action": "kokoro-gpu",
"enabled": false
},
{
"key": "sidecars.llama_proxy",
"label": "Llama Proxy",
"description": "Start llama logging proxy when ArcControl starts.",
"action": "llama-proxy-start",
"enabled": true
},
{
"key": "sidecars.ttscleaner",
"label": "TTS Cleaner",
"description": "Start TTS Cleaner when ArcControl starts.",
"action": "ttscleaner-start",
"enabled": true
},
{
"key": "sidecars.kokoro_reader",
"label": "Kokoro Reader",
"description": "Start Kokoro Reader when ArcControl starts.",
"action": "kokoro-reader-start",
"enabled": true
},
{
"key": "gpu_workload.comfy",
"label": "ComfyUI",
"description": "Start ComfyUI when ArcControl starts.",
"action": "gpu-workload-launch-comfy",
"enabled": false
}
],
"updated_at": "2026-06-30T15:01:58-04:00"
},
"model_profiles": {
"active": "qwen36-27b-neocoder-q4-preserve",
"active_label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
"default": "qwen36-27b-heretic-preserve",
"default_label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
"items": [
{
"slug": "gemma4-12b-q4-heretic",
"label": "Gemma 4 12B Q4 Heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/gemma4-12b-q4-heretic",
"model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
"size_gb": 6.9,
"size_label": "6.9 GB",
"cache_files": [
"/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q4_K_M.gguf",
"/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
],
"path": "/servers/run/model-profiles/gemma4-12b-q4-heretic.json"
},
{
"slug": "gemma4-12b-q6-heretic",
"label": "Gemma 4 12B Q6 Heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/gemma4-12b-q6-heretic",
"model_path": "/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
"size_gb": 9.1,
"size_label": "9.1 GB",
"cache_files": [
"/models/gemma-4-12b/Gemma-4-12B-it-heretic-Q6_K.gguf",
"/models/gemma-4-12b/mmproj-Gemma-4-12B-it-BF16.gguf"
],
"path": "/servers/run/model-profiles/gemma4-12b-q6-heretic.json"
},
{
"slug": "gemma4-26b-a4b-heretic",
"label": "gemma4-26B-A4B-heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/gemma4-26b-a4b-heretic",
"model_path": "/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
"size_gb": 15.6,
"size_label": "15.6 GB",
"cache_files": [
"/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.i1-Q4_K_M.gguf",
"/models/mradermacher/gemma-4-26B-A4B-it-heretic-ara-v2.mmproj-f16.gguf"
],
"path": "/servers/run/model-profiles/gemma4-26b-a4b-heretic.json"
},
{
"slug": "qwen36-27b-heretic-preserve",
"label": "Qwen 3.6 27B Heretic (Preserve Thinking)",
"description": "",
"active": false,
"default": true,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-heretic-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"size_gb": 16.0,
"size_label": "16.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-heretic-preserve.json"
},
{
"slug": "qwen36-27b-heretic",
"label": "Qwen 3.6 27B Heretic",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-heretic",
"model_path": "/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"size_gb": 16.0,
"size_label": "16.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-heretic.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-nl-preserve",
"label": "Qwen 3.6 27B NeoCode IQ4_NL (Preserve Thinking)",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"size_gb": 15.0,
"size_label": "15.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-nl",
"label": "Qwen 3.6 27B NeoCode IQ4_NL",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-nl",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"size_gb": 15.0,
"size_label": "15.0 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_NL.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-nl.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-xs-preserve",
"label": "Qwen 3.6 27B NeoCode IQ4_XS (Preserve Thinking)",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"size_gb": 14.3,
"size_label": "14.3 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-iq4-xs",
"label": "Qwen 3.6 27B NeoCode IQ4_XS",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-iq4-xs",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"size_gb": 14.3,
"size_label": "14.3 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-IQ4_XS.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-iq4-xs.json"
},
{
"slug": "qwen36-27b-neocoder-q4-preserve",
"label": "Qwen 3.6 27B NeoCode Q4 (Preserve Thinking)",
"description": "",
"active": true,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q4-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"size_gb": 15.7,
"size_label": "15.7 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-q4",
"label": "Qwen 3.6 27B NeoCode Q4",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q4",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"size_gb": 15.7,
"size_label": "15.7 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q4_K_M.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q4.json"
},
{
"slug": "qwen36-27b-neocoder-q6-preserve",
"label": "Qwen 3.6 27B NeoCode Q6 (Preserve Thinking)",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q6-preserve",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"size_gb": 20.9,
"size_label": "20.9 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6-preserve.json"
},
{
"slug": "qwen36-27b-neocoder-q6",
"label": "Qwen 3.6 27B NeoCode Q6",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-27b-neocoder-q6",
"model_path": "/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"size_gb": 20.9,
"size_label": "20.9 GB",
"cache_files": [
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-NEO-CODE-HERE-2T-OT-Q6_K.gguf",
"/models/Qwen-3.6-27b-heretic-neocode/Qwen3.6-27B-mmproj-BF16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-27b-neocoder-q6.json"
},
{
"slug": "qwen36-unc",
"label": "Qwen3.6 35B Uncensored",
"description": "",
"active": false,
"default": false,
"build_key": "llcpp6",
"openclaw_model_ref": "arc/qwen36-unc",
"model_path": "/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
"size_gb": 19.7,
"size_label": "19.7 GB",
"cache_files": [
"/models/Q3.6/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf",
"/models/Q3.6/mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf"
],
"path": "/servers/run/model-profiles/qwen36-unc.json"
}
],
"dir": "/servers/run/model-profiles",
"state_file": "/servers/run/state/active-model-profile.json",
"default_state_file": "/servers/run/state/default-model-profile.json",
"cache_file": "/servers/run/cache/inference-active.txt"
},
"openclaw_sync": {
"status": "synchronized",
"expected_model": "qwen36-27b-neocoder-q4-preserve",
"loaded_model": "qwen36-27b-neocoder-q4-preserve",
"loaded_models": [
"qwen36-27b-neocoder-q4-preserve"
],
"expected_openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
"openclaw_model": "arc/qwen36-27b-neocoder-q4-preserve",
"synchronized": true,
"error": "",
"sessions": {
"stale_local": 0,
"stale_keys": [],
"cloud_overrides": 1,
"checked": 43
},
"state_file": "/servers/run/state/openclaw-sync-state.json"
},
"state_file": "/servers/run/state/system-state.json",
"updated_at": "2026-07-01T04:42:36-04:00",
"profiles": {
"active": "custom",
"active_label": "Custom Profile",
"items": [
{
"slug": "gpu-inference",
"label": "GPU Inference",
"description": "GPU llama, GPU dictation, GPU Kokoro; unload Comfy models if Comfy is running; model cache off.",
"desired": {
"llama": "gpu",
"voice": "gpu",
"kokoro": "gpu",
"comfy": "unload_models",
"model_cache": false
},
"momentary": false,
"active": false
},
{
"slug": "gpu-turbo",
"label": "GPU Turbo",
"description": "Experimental turboquant llama profile; GPU dictation and Kokoro; unload Comfy models if Comfy is running; model cache off.",
"desired": {
"llama": "gpu-turbo",
"voice": "gpu",
"kokoro": "gpu",
"comfy": "unload_models",
"model_cache": false
},
"momentary": false,
"active": false
},
{
"slug": "comfy-work",
"label": "Comfy Work",
"description": "CPU inference, CPU-lite dictation, Comfy running, model files kept warm.",
"desired": {
"llama": "cpu",
"voice": "cpu-lite",
"kokoro": "stopped",
"comfy": "running",
"model_cache": true
},
"momentary": false,
"active": false
},
{
"slug": "free-vram",
"label": "Free VRAM",
"description": "One-shot VRAM cleanup: CPU inference and dictation, Kokoro stopped, Comfy kept running with models unloaded.",
"desired": {
"llama": "cpu",
"voice": "cpu-lite",
"kokoro": "stopped",
"comfy": "unload_models",
"model_cache": true
},
"momentary": true,
"active": false
}
],
"watch": {
"enabled": false,
"profile": "gpu-inference",
"updated_at": "2026-06-03T18:03:26-04:00"
},
"file": "/servers/run/state/profiles.json"
}
}