Apache Ignite Documentation

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Data Loading

Load large amounts of data into RAM.

Ignite provides several techniques for initial data loading from a 3rd party database or another source:

Overview

Using standard cache put(...) or putAll(...) operations is generally inefficient for loading large amounts of data. Ignite offers IgniteDataStreamer API, integrations with major streaming technologies and IgniteCache API that can help you load large volumes of data into the cluster in a more efficient manner.

IgniteDataStreamer

Data streamers are defined by IgniteDataStreamer API and are built to inject large amounts of continuous data into Ignite caches. Data streamers are built in a scalable and fault-tolerant fashion and achieve high performance by batching entries together before they are sent to the corresponding cluster members.

Data streamers should be used to load large amount of data into caches at any time, including pre-loading on startup.

See Data Streamers documentation for more information.

IgniteCache.loadCache()

Ignite requires to preload data from the disk in RAM if a 3rd party database persists the data and applications utilize SQL, Scan and other advanced queries.

Ignite native persistence

Ignite native persistence does not require to warm up RAM upon restarts. Thus, the loading techniques based on IgniteCache.loadCache() are not relevant for this type of persistent storage.

To preload data from a 3rd party store such as a relational database, use the IgniteCache.loadCache() method that allows loading cache data without passing the keys that need to be loaded.

The IgniteCache.loadCache() method will delegate to the CacheStore.loadCache() method on every cluster member. To invoke loading only on the local cluster node, use the IgniteCache.localLoadCache() method.

In case of partitioned caches and 3rd party persistence such as a relational database, keys that are not mapped to this node, either as primary or backups, will be automatically discarded.

That's not relevant for Ignite Persistent Store where every node stores only that data for which the node is either a primary or backup.

Here is an example of how to use the CacheStore.loadCache() implementation for a 3rd party persistence. For a complete example on how a CacheStore can be implemented, refer to the 3rd Party Store documentation.

public class CacheJdbcPersonStore extends CacheStoreAdapter<Long, Person> {
	...
  // This method is called whenever "IgniteCache.loadCache()" or
  // "IgniteCache.localLoadCache()" methods are called.
  @Override public void loadCache(IgniteBiInClosure<Long, Person> clo, Object... args) {
    if (args == null || args.length == 0 || args[0] == null)
      throw new CacheLoaderException("Expected entry count parameter is not provided.");

    final int entryCnt = (Integer)args[0];

    Connection conn = null;

    try (Connection conn = connection()) {
      try (PreparedStatement st = conn.prepareStatement("select * from PERSONS")) {
        try (ResultSet rs = st.executeQuery()) {
          int cnt = 0;

          while (cnt < entryCnt && rs.next()) {
            Person person = new Person(rs.getLong(1), rs.getString(2), rs.getString(3));

            clo.apply(person.getId(), person);

            cnt++;
          }
        }
      }
    }
    catch (SQLException e) {
      throw new CacheLoaderException("Failed to load values from cache store.", e);
    }
  }
  ...
}

Partition-aware data loading

In the scenario described above, the same query will be executed on all the nodes. Each node will iterate over the whole result set, skipping the keys that do not belong to the node, which is not very efficient. This situation can be improved if the partition ID is stored alongside each record in the database. You can use the org.apache.ignite.cache.affinity.Affinity interface to get the partition ID for any key being stored in a cache.

Below is an example code snippet that determines partition ID for each Person object stored in the cache.

IgniteCache cache = ignite.cache(cacheName);
Affinity aff = ignite.affinity(cacheName);

for (int personId = 0; personId < PERSONS_CNT; personId++) {
    // Get partition ID for the key under which person is stored in cache.
    int partId = aff.partition(personId);
  
    Person person = new Person(personId);
    person.setPartitionId(partId);
    // Fill other fields.
  
    cache.put(personId, person);
}

When Person objects become partition-ID aware, each node can query only those partitions that belong to the node. In order to do that, you can inject an instance of Ignite into your cache store and use it to determine partitions that belong to the local node.

Below is an example code snippet that demonstrates how to use Affinity to load only local partitions. Note that the example code is single-threaded, however it can be very effectively parallelized by partition ID.

public class CacheJdbcPersonStore extends CacheStoreAdapter<Long, Person> {
  // Will be automatically injected.
  @IgniteInstanceResource
  private Ignite ignite;
  
	...
  // This mehtod is called whenever "IgniteCache.loadCache()" or
  // "IgniteCache.localLoadCache()" methods are called.
  @Override public void loadCache(IgniteBiInClosure<Long, Person> clo, Object... args) {
    Affinity aff = ignite.affinity(cacheName);
    ClusterNode locNode = ignite.cluster().localNode();
    
    try (Connection conn = connection()) {
      for (int part : aff.primaryPartitions(locNode))
        loadPartition(conn, part, clo);
      
      for (int part : aff.backupPartitions(locNode))
        loadPartition(conn, part, clo);
    }
  }
  
  private void loadPartition(Connection conn, int part, IgniteBiInClosure<Long, Person> clo) {
    try (PreparedStatement st = conn.prepareStatement("select * from PERSONS where partId=?")) {
      st.setInt(1, part);
      
      try (ResultSet rs = st.executeQuery()) {
        while (rs.next()) {
          Person person = new Person(rs.getLong(1), rs.getString(2), rs.getString(3));
          
          clo.apply(person.getId(), person);
        }
      }
    }
    catch (SQLException e) {
      throw new CacheLoaderException("Failed to load values from cache store.", e);
    }
  }
  
  ...
}

Note that key-to-partition mapping depends on the number of partitions configured in the affinity function (see org.apache.ignite.cache.affinity.AffinityFunction). If the affinity function configuration changes, partition ID records in the database must be updated accordingly.

Performance Improvement

To maintain consistency and durability, Ignite persistence supports Write-Ahead Logging which is enabled by default. However, this can affect the performance of the cluster during data pre-loading. It is recommended to disable WAL when pre-loading data, and enable it after pre-loading is complete. See WAL documentation for Java API, and ALTER TABLE documentation for SQL.