Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop (self.__VINEXT_RSC_CHUNKS__=self.__VINEXT_RSC_CHUNKS__||[];self.__VINEXT_RSC_CHUNKS__.push("2:I[\"aadde9aaef29\",[],\"default\",1]\n3:I[\"6e873226e03b\",[],\"Children\",1]\n5:I[\"bc2946a341c8\",[],\"LayoutSegmentProvider\",1]\n6:I[\"6e873226e03b\",[],\"Slot\",1]\n7:I[\"3506b3d116f7\",[],\"ErrorBoundary\",1]\n8:I[\"a9bbde40cf2d\",[],\"default\",1]\n9:I[\"3506b3d116f7\",[],\"NotFoundBoundary\",1]\na:\"$Sreact.suspense\"\n:HL[\"/assets/index-BLEkI_5r.css\",\"style\"]\n") target="_blank">systalyze.com)

74 points by ManyaGhobadi 9 hours ago | 19 comments

xrd 55 minutes ago [-]

I feel like this is tangential to this conversation.

Does anyone know of a good tool for "load balancing" usage across local GPUs?

Why: I have two RTX3090s (24GB). I've been using nvidia-smi to check usage of my RTX3090. Mostly I'm running llama.cpp with unsloth/Qwen3.6-27B-GGUF:Q4_K_M and getting some pretty decent results for a self hosted LLMs (orchestrated via opencode). I'm surprised at how well it is working for a local model. nvidia-smi is great for determining total VRAM usage and nvtop gives a little more insight.

But, I also am doing some experiments with some other non-LLM models (video generation, etc), and want to find a way to timeslice across these GPUs, for example, when my coding is paused.

This "Utilyze" tool appears it would get me better insight into usage of one. Can it be scripted to better utilize my GPUs across a diverse load?

Any suggestions on whether there are existing projects out there? I thought about vibe coding, but wonder if there is existing art.

Cynddl 3 hours ago [-]

This sounds super interesting and relevant. I run a small cluster with H100s (often research projects with vLLM) and being able to see not just usage but efficiency would be great.

I don't fully get the 100% utilisation vs. 1-10% real compute. Given you rely on telemetry from users to add new models, are you trying to predict how fast a model should be on vLLM, compared to how it runs in practice? What if users tweak some hyperparameters?

taupi 1 hours ago [-]

Glad you found it interesting!

What you described is the goal of Attainable SOL, but using GPU utilization as the metric rather than throughput. We're answering "for a given model and workload, have you optimized this well enough?", where "optimized" includes hyperparameter tuning. So if someone hasn't tuned batch size, parallelism, or other knobs well for their workload, the gap between their current utilization and the Attainable SOL is what tells them there's still room to improve.

We're motivated by the fact that reaching 100% Compute SOL is impossible -- no model can run at the hardware's theoretical maximum -- but we want to provide a realistic target for optimization. And we've noticed that different model architectures have different realistic ceilings. For example, MoE models run at much worse utilization due to their sparsity. We don't expect you to retrain an MoE model in order to get a higher utilization, and no hyperparameter tuning can bring you close to 100%, so the maximum attainable SOL should be lower for that model.

uberduper 5 hours ago [-]

There's a few dimensions you can look at for gpu load. Probably the easiest indirect metric to watch for gpu load is power usage.

But if you really care about this, you should actually profile your application. nsight systems makes this pretty simple to do. Dunno how many actually care about having a TUI.

ManyaGhobadi 4 hours ago [-]

Power is useful as a second-order metric and can help catch drastic underutilization, but it has similar problems to SM Active (DCGM) -- it tends to overestimate utilization and doesn't distinguish between useful compute and memory traffic. It's very possible to be in a memory-bound workload with high power even though underutilizing compute utilization. Our goal was to separate these bottlenecks out so there's more visibility into where to optimize.

On nsys, agreed it's great, but we wanted something that could run continuously instead of an offline analysis tool. We think there's room for both to be useful.

jhgg 5 hours ago [-]

We just track power utilization.

xtimecrystal 6 hours ago [-]

One small suggestion: add more GPU stats to your tool.

At the moment (v0.1.3) it is more helpful for compute visualization but keeping track of memory usage/processes/temperature/fan speed/etc. prevent this from becoming a full-on drop-in replacement for `nvidia-smi` for me.

ManyaGhobadi 5 hours ago [-]

We agree! We are planning a "process" or "advanced" view with temp/power usage and per-process breakdowns. Would a separate full page view or fitting everything onto one view be more useful for your workflows? Just thinking about fitting everything in because it is a lot

5 hours ago [-]

vogje01 1 hours ago [-]

Looks good for now.

Will further test it.

apitman 2 hours ago [-]

I believe recent versions of nvtop show efficiency, right?

taupi 2 hours ago [-]

There's a new "Effective Load" metric that we've looked at -- it's derived from Power, which has the same problems we mentioned here: https://news.ycombinator.com/item?id=47925149

It's useful as a rough heuristic, but tends to overestimate utilization. We've also noticed that power-derived metrics have a lag time behind true utilization, the controller that regulates it has a delayed response time. This especially becomes important for spiky workloads like real-time inference.

Any tool (like nvtop) that only queries NVIDIA's NVML library does not have access to the detailed metrics that we draw upon, and therefore has to use proxies for efficiency.

nawi 5 hours ago [-]

Hi, many thx, does the os can run on nvidia jetson and orin? Or just for server gpu?

ManyaGhobadi 5 hours ago [-]

Currently just server GPU, but theoretically it should be easy to link against the ARM64 CUDA libraries for Jetson/Orin. The only challenge would be to check if it supports all the metrics we're sampling, though anything Ampere or newer should have reasonable support.

SilentM68 3 hours ago [-]

Great tool.

Just testing for now.

Any removal instructions or function for utilyze beyond the manual removal of utilyze & utlz binaries from ~/.local/bin & /usr/local/bin & PATH cleanup for ~/.profile, in particular CAP_SYS_ADMIN capability and reversal for any other changes made?

ManyaGhobadi 1 hours ago [-]

If you installed CUPTI via utlz on initial startup, you can also remove that in ~/.cache/utlz. There is also a config in ~/.systalyze if you'd like to fully clean up everything. Besides those things, the steps you mentioned should be enough. We'll add this info to the README.

SilentM68 1 hours ago [-]

Thank you :)

324 2 hours ago [-]

gdp means gross domestic product

latchkey 4 hours ago [-]

You mention rocm-smi in your blog post, but you don't actually support AMD gpus?

taupi 2 hours ago [-]

AMD support is on the roadmap, but we mentioned it for now to highlight that AMD calculates their utilization metric the same way -- it's not just NVIDIA.

4 hours ago [-]

marlburrow 24 minutes ago [-]

[dead]

johnwhitman 4 hours ago [-]

[dead]

throwawaycbb7 6 hours ago [-]

[dead]

Rekindle8090 5 hours ago [-]

[dead]

Rendered at 22:27:28 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.