For an Apache Ignite cache deployed in the cluster, there is always overhead: the cache is split into partitions whose state has to be tracked on every cluster node for system needs.
For instance, each cluster node maintains a data structure called a partition map that resides in the Java heap consuming some of its space. These partition maps are exchanged across the cluster nodes in case of a topology change event (a new node joins the cluster or an old one leaves). And if Ignite persistence is enabled, then for every partition there will be an open file on the disk that Ignite actively writes to and reads from. Thus, the more caches and partitions you have:
- The more Java heap will be occupied by partition maps. Every cache has its own partition map.
- The longer it might take for a new node to join the cluster.
- The longer it might take to initiate rebalancing if a node leaves the cluster.
- The more partition files will be kept open and the worse the performance of the checkpointing might be.
Usually, you will not spot any of these problems for deployments with dozens or several hundreds of caches. However, when it comes to thousands the impact can be noticeable.
To avoid this impact, consider using cache groups. Caches within a single cache group share various internal structures such as partitions maps described above, thus, boosting topology events processing and decreasing overall memory usage. Note that from the API standpoint, there is no difference whether a cache is a part of a group or not.
Cache groups can be created by setting the
groupName property of
Here is an example of how to assign caches to a specific group:
<bean class="org.apache.ignite.configuration.IgniteConfiguration"> <property name="cacheConfiguration"> <list> <!-- Partitioned cache for Persons data. --> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="Person"/> <property name="backups" value="1"/> <!-- Group the cache belongs to. --> <property name="groupName" value="group1"/> </bean> <!-- Partitioned cache for Organizations data. --> <bean class="org.apache.ignite.configuration.CacheConfiguration"> <property name="name" value="Organization"/> <property name="backups" value="1"/> <!-- Group the cache belongs to. --> <property name="groupName" value="group1"/> </bean> </list> </property> </bean>
// Defining cluster configuration. IgniteConfiguration cfg = new IgniteConfiguration(); // Defining Person cache configuration. CacheConfiguration personCfg = new CacheConfiguration("Person"); personCfg.setBackups(1); // Group the cache belongs to. personCfg.setGroupName("group1"); // Defining Organization cache configuration. CacheConfiguration orgCfg = new CacheConfiguration("Organization"); orgCfg.setBackups(1); // Group the cache belongs to. orgCfg.setGroupName("group1"); cfg.setCacheConfiguration(personCfg, orgCfg); //Starting the node. Ignition.start(cfg);
In the above example, the
Organization caches belong to
How are key-value pairs distinguished?
Once a cache is assigned to a cache group, its data will be stored in the shared partitions' internal structures. Every key you put into the cache will be enriched with the unique ID of the cache the key belongs to. The ID is derived from the cache name. This happens transparently and allows Ignite to mix data of caches in shared partitions, B+trees, and partition files.
The reason for grouping caches is simple: if you decide to group 1000 caches, then you will have 1000x fewer structures that store partitions' data, partition maps, and open Ignite persistence partition files.
Should the grouping be used all the times?
Cache groups have many benefits, which makes them great to use. That said, they might impact the performance of read operations and indexes lookups. This is caused by the fact that all the data and indexes get mixed in shared data structures (partition maps, B+trees) and it will take more time querying over them.
Thus, consider using cache groups if you have a cluster of dozens and hundreds of nodes and caches, and you encounter increased Java heap usage by internal structures, checkpointing performance drop, and/or slow node connectivity to the cluster.
Updated about a year ago