Results

Performance evaluation results comparing Quality Score Eviction Policy against baseline memory policies.

This section presents the performance evaluation results of our proposed Quality Score Eviction Policy compared to the baseline memory policies (LRU, LFU, FIFO, RR) across different experimental configurations.

Hit Rate

The primary metric for evaluating cache performance is the hit rate - the percentage of requests served from the cache. Higher hit rates indicate better cache performance, as they reduce the need for expensive LLM API calls.

Our experiments evaluated cache performance across:

270 unique configurations from our grid search
Three repetition scenarios: High, Low, and Mixed
Three dataset sizes: 500, 1000, and 3000 questions
Various cache sizes relative to dataset size
Different Quality Score parameters: learning rates and weight combinations

Hit Rate Comparison

Degree of RepetitionRepetition

Number of QuestionsQuestions

Learning Rate

Weights (Q,R,F)Weights

Quality Score outperforms baselines by 24.1%

Hit rate comparison across different cache sizes • MIXED repetition • 500 questionsMIXED • 500 questions

Throughput

For throughput measurements, we used a mock LLM API that responds instantly rather than making actual inference calls, ensuring that the results reflect cache policy overhead rather than LLM latency.

The following chart shows the average throughput performance (requests per second) for each cache eviction policy across all experimental configuration.

Cache Policy Throughput Comparison

Average requests per second by eviction policy

Quality Score maintains competitive throughput with intelligent eviction

Quality Score achieves 3.16 req/s

CPU Usage

During our experiments, CPU usage remained consistent across all cache eviction policies, averaging approximately 55% throughout the testing period.

This indicates that the computational overhead of different eviction strategies does not significantly impact system resource utilization. The main contributor to CPU usage was likely the similarity models used for cache key matching rather than the eviction policies themselves, suggesting that the choice of eviction policy is primarily driven by cache performance metrics rather than computational efficiency concerns.

Results

Hit Rate

Throughput

CPU Usage

On this page