Problem:

Applications using Redisson MapCache began throwing RedisResponseTimeoutException errors (client timeout = 3000 ms) during eviction activity. Errors were raised while executing EVALSHA with context pointing to org.redisson.eviction.MapCacheEvictionTask; application threads were reporting “Unable to evict elements” outputs correlated with redisson-timer-* threads.

Environment details: a Redis Cluster (client traffic on shard port 7000) served multiple integration environments from the same cluster under high JVM concurrency. Redis SLOWLOG samples consistently showed EVALSHA entries for MapCache-related scripts with execution times from ~10 ms up to 140+ ms. Script payloads contained large serialized binary values (hundreds of KB to ~2 MB, e.g., PDF content) and included SPUBLISH calls executed inside the Lua scripts.

Process:

Step 1: Intake — reproduce scope from client report

Client-provided symptoms, configuration snapshots and slowlog extracts were collected and inspected. The collected artifacts confirmed frequent EVALSHA entries tied to MapCache eviction, a 3s client timeout, multiple JVM clients, and large value sizes. This established the operational window and showed the problem affected eviction paths rather than generic network instability.

Step 2: Slowlog and payload analysis

Slowlog samples were examined for command distribution, script SHA identifiers, execution time distribution, and parameter patterns. Discovery: EVALSHA dominated slowlog during incident windows and scripts carried the large serialized payloads as parameters. The presence of SPUBLISH inside the same Lua execution was noted. This indicated server-side command execution latency (not transport-level packet loss) was the primary cause.

Step 3: Redis configuration and runtime metrics review

Redis configuration files and runtime INFO outputs were reviewed for persistence, cluster settings, and IO threading. Configuration appeared standard (persistence disabled in supplied snapshot; cluster mode enabled; IO threads present). CPU and instantaneous_ops_per_sec were reviewed where available. Finding: no misconfiguration explained prolonged Lua blocking; single-threaded script execution semantics remained the limiting factor.

Step 4: Execution model and contention validation

LUA single-threaded execution behavior was cross-referenced with the observed concurrent invocation pattern. Multiple environments invoking MapCache eviction concurrently, each passing multi-hundred-KB to multi-MB payloads into Lua, was determined to create serializable blocking on the Redis event loop. This mapping explained how tens-to-hundreds of millisecond script runs queued other workload and exceeded the 3s client timeout under sustained concurrency.

Step 5: Mitigation design and verification planning

Options were evaluated and prioritized for short- and long-term impact: short-term changes (increase client timeout threshold and reduce eviction frequency) to reduce immediate errors; medium/long-term architecture changes (remove large binaries from Redis, store only metadata/URLs in Redis, compress or segment values) to eliminate large-payload Lua execution. Monitoring plan finalized (SLOWLOG, INFO commandstats, LATENCY, blocked_clients, CPU). This step led directly to implementation choices described in the Solution.

Solution:

Large binary documents were removed from Redis and migrated to external object storage; Redis now stores lightweight metadata and object references (IDs/URLs) instead of multi-hundred-KB/MB blobs. MapCache eviction was reconfigured to operate on metadata-only keys, and heavy post-eviction actions (pub/sub notifications) were moved out of Lua into asynchronous application-side handlers where atomicity was not required. A temporary client-side timeout increase and reduced eviction frequency were applied during migration. Additional monitoring was enabled (SLOWLOG sampling, INFO commandstats, LATENCY DOCTOR, blocked_clients, and node CPU).

Architecturally, this reduces the size of parameters passed into Redis Lua scripts and the amount of memory copying and CPU work Redis must do during eviction, which avoids long single-threaded script runs and the resulting command queueing.

Conclusion:

Post-change observation showed a marked reduction in EVALSHA-dominated slowlog entries, median Lua execution times decreased to single-digit milliseconds for eviction scripts, and RedisResponseTimeoutException occurrences fell to near-zero. System stability improved by removing large-object processing from Redis and by isolating eviction logic from heavy payload handling, mitigating the single-threaded Lua blocking risk.