Results
Performance evaluation results comparing Quality Score Eviction Policy against baseline memory policies.
This section presents the performance evaluation results of our proposed Quality Score Eviction Policy compared to the baseline memory policies (LRU, LFU, FIFO, RR) across different experimental configurations.
Hit Rate
The primary metric for evaluating cache performance is the hit rate - the percentage of requests served from the cache. Higher hit rates indicate better cache performance, as they reduce the need for expensive LLM API calls.
Our experiments evaluated cache performance across:
- 270 unique configurations from our grid search
- Three repetition scenarios: High, Low, and Mixed
- Three dataset sizes: 500, 1000, and 3000 questions
- Various cache sizes relative to dataset size
- Different Quality Score parameters: learning rates and weight combinations
Throughput
For throughput measurements, we used a mock LLM API that responds instantly rather than making actual inference calls, ensuring that the results reflect cache policy overhead rather than LLM latency.
The following chart shows the average throughput performance (requests per second) for each cache eviction policy across all experimental configuration.
CPU Usage
During our experiments, CPU usage remained consistent across all cache eviction policies, averaging approximately 55% throughout the testing period.
This indicates that the computational overhead of different eviction strategies does not significantly impact system resource utilization. The main contributor to CPU usage was likely the similarity models used for cache key matching rather than the eviction policies themselves, suggesting that the choice of eviction policy is primarily driven by cache performance metrics rather than computational efficiency concerns.