| Service | Calls | Useful | Wasted | Avg usefulness | Trend |
|---|---|---|---|---|---|
| signal-whale-watch | 2 | 0 | 1 | ████░░░░░░40% | mixed |
| research-deep-dive | 1 | 1 | 0 | ██████████100% | high utility |
| research-token-report | 1 | 1 | 0 | █████████░90% | high utility |
| research-liquidity-health | 1 | 0 | 0 | ████░░░░░░40% | mixed |
How Celina learns
After every synthesis, the LLM retrospectively grades each paid call (0 = wasted USDG, 1 = directly answered the question). The grades are aggregated here and injected into the planning prompt as servicePerformanceHistory. Services above 70% avg usefulness get a "consistently useful" tag in the planner context; services below 40% get "often wastes USDG".