Apache Ignite Documentation

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Performance Tips

Simple cache configuration tips to optimize your cache performance.

Overview

Ignite In-Memory Data Grid performance and throughput vastly depends on the features and the settings you use. In almost any use case the cache performance can be optimized by simply tweaking the cache configuration.

Disable Internal Events Notification

Ignite has rich event system to notify users about various events, including cache modification, eviction, compaction, topology changes, and a lot more. Since thousands of events per second are generated, it creates an additional load on the system. This can lead to significant performance degradation. Therefore, it is highly recommended to enable only those events that your application logic requires. By default, event notifications are disabled.

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ... 
    <!-- Enable only some events and leave other ones disabled. -->
    <property name="includeEventTypes">
        <list>
            <util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_STARTED"/>
            <util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FINISHED"/>
            <util:constant static-field="org.apache.ignite.events.EventType.EVT_TASK_FAILED"/>
        </list>
    </property>
    ...
</bean>

Tune Cache Start Size

In terms of size and capacity, Ignite's internal cache map acts exactly like a normal Java HashMap: it has some initial capacity (which is pretty small by default), which doubles as data arrives. The process of internal cache map resizing is CPU-intensive and time-consuming, and if you load a huge dataset into cache (which is a normal use case), the map will have to resize a lot of times. To avoid that, you can specify the initial cache map capacity, comparable to the expected size of your dataset. This will save a lot of CPU resources during the load time, because the map won't have to resize. For example, if you expect to load 100 million entries into cache, you can use the following configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ...
    <property name="cacheConfiguration">
        <bean class="org.apache.ignite.configuration.CacheConfiguration">
            ...
            <!-- Set initial cache capacity to ~ 100M. -->
            <property name="startSize" value="#{100 * 1024 * 1024}"/> 
            ...
        </bean>
    </property>
</bean>

The above configuration will save you from log₂(10⁸) − log₂(1024) ≈ 16 cache map resizes (1024 is an initial map capacity by default). Remember, that each subsequent resize will be on average 2 times longer than the previous one.

Turn Off Backups

If you use PARTITIONED cache, and the data loss is not critical for you (for example, when you have a backing cache store), consider disabling backups for PARTITIONED cache. When backups are enabled, the cache engine has to maintain a remote copy of each entry, which requires network exchange and is time-consuming. To disable backups, use the following configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ...
    <property name="cacheConfiguration">
        <bean class="org.apache.ignite.configuration.CacheConfiguration">
            ...
            <!-- Set cache mode. -->
            <property name="cacheMode" value="PARTITIONED"/>
            <!-- Set number of backups to 0-->
            <property name="backups" value="0"/>
            ...
        </bean>
    </property>
</bean>

Possible Data Loss

If you don't have backups enabled for PARTITIONED cache, you will loose all entries cached on a failed node. It may be acceptable for caching temporary data or data that can be otherwise recreated. Make sure that such data loss is not critical for application before disabling backups.

Tune Off-Heap Memory

If you plan to allocate large amounts of memory to your JVM for data caching (usually more than 10GB of memory), then your application will most likely suffer from prolonged lock-the-world GC pauses which can significantly hurt latencies. To avoid GC pauses use off-heap memory to cache data - essentially your data is still cached in memory, but JVM does not know about it and GC is not affected. To enable off-heap storage with unlimited size, use the following configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ...
    <property name="cacheConfiguration">
        <bean class="org.apache.ignite.configuration.CacheConfiguration">
            ...
            <!-- Enable off-heap storage with unlimited size. -->
            <property name="offHeapMaxMemory" value="0"/> 
            ...
        </bean>
    </property>
</bean>

Disable Swap Storage

Swap storage is disabled by default. However, in your configuration it might be enabled. If it is, keep in mind that using swap storage can significantly hurt performance. To disable swap storage explicitly, use the following configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ...
    <property name="cacheConfiguration">
        <bean class="org.apache.ignite.configuration.CacheConfiguration">
            ...
            <!-- Disable swap. -->
            <property name="swapEnabled" value="false"/> 
            ...
        </bean>
    </property>
</bean>

Tune Eviction Policy

Evictions are disabled by default. If you do need to use evictions to make sure that data in cache does not overgrow beyond allowed memory limits, consider choosing the proper eviction policy. An example of setting the LRU eviction policy with maximum size of 100000 entries is shown below:

<bean class="org.apache.ignite.cache.CacheConfiguration">
    ...
    <property name="evictionPolicy">
        <!-- LRU eviction policy. -->
        <bean class="org.apache.ignite.cache.eviction.lru.LruEvictionPolicy">
            <!-- Set the maximum cache size to 1 million (default is 100,000). -->
            <property name="maxSize" value="1000000"/>
        </bean>
    </property>
    ...
</bean>

Regardless of which eviction policy you use, cache performance will depend on the maximum amount of entries in cache allowed by eviction policy - if cache size overgrows this limit, the evictions start to occur.

Tune Cache Data Rebalancing

When a new node joins topology, existing nodes relinquish primary or back up ownership of some keys to the new node so that keys remain equally balanced across the grid at all times. This may require additional resources and hit cache performance. To tackle this possible problem, consider tweaking the following parameters:

  • Configure rebalance batch size, appropriate for your network. Default is 512KB which means that by default rebalance messages will be about 512KB. However, you may need to set this value to be higher or lower based on your network performance.
  • Configure rebalance throttling to unload the CPU. If your data sets are large and there are a lot of messages to send, the CPU or network can get over-consumed, which consecutively may slow down the application performance. In this case you should enable data rebalance throttling which helps tune the amount of time to wait between rebalance messages to make sure that rebalancing process does not have any negative performance impact. Note that application will continue to work properly while rebalancing is still in progress.
  • Configure rebalance thread pool size. As opposite to previous point, sometimes you may need to make rebalancing faster by engaging more CPU cores. This can be done by increasing the number of threads in rebalance thread pool (by default, there are only 2 threads in pool).

Below is an example of setting all of the above parameters in cache configuration:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ...
    <property name="cacheConfiguration">
        <bean class="org.apache.ignite.configuration.CacheConfiguration">             
            <!-- Set rebalance batch size to 1 MB. -->
            <property name="rebalanceBatchSize" value="#{1024 * 1024}"/>
 
            <!-- Explicitly disable rebalance throttling. -->
            <property name="rebalanceThrottle" value="0"/>
 
            <!-- Set 4 threads for rebalancing. -->
            <property name="rebalanceThreadPoolSize" value="4"/>
            ... 
        </bean
    </property>
</bean>

Configure Thread Pools

By default, Ignite has it's main thread pool size set to the 2 times the available CPU count. In most cases keeping 2 threads per core will result in faster application performance, since there will be less context switching and CPU caches will work better. However, if you are expecting that your jobs will block for I/O or any other reason, it may make sense to increase the thread pool size. Below is an examples of how you configure thread pools:

<bean class="org.apache.ignite.configuration.IgniteConfiguration">
    ... 
    <!-- Configure internal thread pool. -->
    <property name="publicThreadPoolSize" value="64"/>
    
    <!-- Configure system thread pool. -->
    <property name="systemThreadPoolSize" value="32"/>
    ...
</bean>

Use Collocated Computations

Ignite enables you to execute MapReduce computations in memory. However, most computations usually work on some data which is cached on remote grid nodes. Loading that data from remote nodes is very expensive in most cases and it is a lot more cheaper to send the computation to the node where the data is. The easiest way to do it is to use IgniteCompute.affinityRun() method or @CacheAffinityMapped annotation. There are other ways, including Affinity.mapKeysToNodes() methods. The topic of collocated computations is covered in much detail in Affinity Collocation, which contains proper code examples.

Use Data Streamer

If you need to upload lots of data into cache, use IgniteDataStreamer to do it. Data streamer will properly batch the updates prior to sending them to remote nodes and will properly control number of parallel operations taking place on each node to avoid thrashing. Generally it provides performance of 10x than doing a bunch of single-threaded updates. See Data Loading section for more detailed description and examples.

Batch Up Your Messages

If you can send 10 bigger jobs instead of 100 smaller jobs, you should always choose to send bigger jobs. This will reduce the amount of jobs going across the network and may significantly improve performance. The same regards cache entries - always try to use API methods, that take collections of keys or values, instead of passing them one-by-one.

Tune Garbage Collection

If you are seeing spikes in your throughput due to Garbage Collection (GC), then you should tune JVM settings. The following JVM settings have proven to provide fairly smooth throughput without large spikes:

-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:+UseTLAB 
-XX:NewSize=128m 
-XX:MaxNewSize=128m 
-XX:MaxTenuringThreshold=0 
-XX:SurvivorRatio=1024 
-XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=60

Do Not Copy Value On Read

JCache standard requires cache providers to support store-by-value semantics, which means that when you read a value from the cache, you don't get the reference to the object that is actually stored, but rather a copy of this object. Ignite behaves this way by default, but it's possible override this behavior via CacheConfiguration.copyOnRead configuration property:

<bean class="org.apache.ignite.configuration.CacheConfiguration">
    <!-- 
        Force cache to return the instance that is stored in cache
        instead of creating a copy. 
    -->
    <property name="copyOnRead" value="false"/>
</bean>

Ensure logging is correctly set up if using jul-to-slf4j bridge

Ignite uses java.util.logging.Logger (JUL). If you are using the jul-to-slf4j bridge, you may want to pay particular attention to the JUL log-level for ignite; if you for some reason have org.apache at DEBUG you may have your final logger at INFO. This means ignite will spend a 10x overhead generating log messages that will be subsequently thrown away once they cross the bridge. JUL has a default level of "INFO" out of the box. Setting a quick breakpoint in org.apache.ignite.logger.java.JavaLogger#isDebugEnabled should reveal if your JUL subsystem is producing debug level logs.

Performance Tips

Simple cache configuration tips to optimize your cache performance.