Caching in LLMs - Quality Score Eviction Policy

Results

Performance evaluation results comparing Quality Score Eviction Policy against baseline memory policies.

This section presents the performance evaluation results of our proposed Quality Score Eviction Policy compared to the baseline memory policies (LRU, LFU, FIFO, RR) across different experimental configurations.

Hit Rate

The primary metric for evaluating cache performance is the hit rate - the percentage of requests served from the cache. Higher hit rates indicate better cache performance, as they reduce the need for expensive LLM API calls.

Our experiments evaluated cache performance across:

  • 270 unique configurations from our grid search
  • Three repetition scenarios: High, Low, and Mixed
  • Three dataset sizes: 500, 1000, and 3000 questions
  • Various cache sizes relative to dataset size
  • Different Quality Score parameters: learning rates and weight combinations
Hit Rate Comparison
Quality Score outperforms baselines by 24.1%
MIXED500 questions

Throughput

For throughput measurements, we used a mock LLM API that responds instantly rather than making actual inference calls, ensuring that the results reflect cache policy overhead rather than LLM latency.

The following chart shows the average throughput performance (requests per second) for each cache eviction policy across all experimental configuration.

Cache Policy Throughput Comparison
Average requests per second by eviction policy
Quality Score maintains competitive throughput with intelligent eviction
Quality Score achieves 3.16 req/s

CPU Usage

During our experiments, CPU usage remained consistent across all cache eviction policies, averaging approximately 55% throughout the testing period.

This indicates that the computational overhead of different eviction strategies does not significantly impact system resource utilization. The main contributor to CPU usage was likely the similarity models used for cache key matching rather than the eviction policies themselves, suggesting that the choice of eviction policy is primarily driven by cache performance metrics rather than computational efficiency concerns.