With a code review approaching fast, I guessed it was time to do some mathematics. The result...
Worst case scenario
Market Count = 20
Instrument Count = 1000 (per partition)
Time = 8hrs. = 3600 * 8 s = 28800 s
Total Memory = 28800 * 20 * 1000 * 16 = 9,216,000,000B = 9.2 Gb
Yep..Even though this is the worst case, the figure was unacceptable. So I had to find a way to prove that in reality, the memory consumption of my process would be less.
The secret lay in the number of trades which occur on a given day. For the 10 or so markets that we connect to, this would be around 10 million. I took this down to 5 million to be on the safe side and it changed the result drastically.
Trade Count = 5,000,000
Memory = (16 * 5000000)B = 80,000,000 b = 80 MB
For the worst case, the trade count would be;Trade count in order for this memory level to be reached = 28800 * 20 * 1000 =
576,000,000 = 576 million
So I can safely say that my process would operate at between 100MB and 1GB as long as the trade count remains below 50 million... Since it's unlikely that we will be receiving this many messages any time soon, the current architecture could be used for a long time to come.
If this fails we'll have to go for plan B, which is to move our indexes into files, and cope with having to do 2 file reads for each data segment retrieval...