Apache Ignite Documentation

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Performance and Debugging

Performance Considerations and Debugging Techniques

Using EXPLAIN Statement

Ignite supports "EXPLAIN ..." syntax for reading execution plans and query performance investigation purposes. Note that a plan cursor will contain multiple rows: the last one will contain a query for reducing node, others are for map nodes.

SqlFieldsQuery sql = new SqlFieldsQuery(
  "explain select name from Person where age = ?").setArgs(26); 

System.out.println(cache.query(sql).getAll());

The execution plan itself is generated by H2 as described here:
http://www.h2database.com/html/performance.html#explain_plan

Using H2 Debug Console

When developing with Ignite, sometimes, it is useful to check if your tables and indexes look correctly. It may also be helpful to run some local queries against the embedded node in H2 database. For that purpose, Ignite has an ability to start H2 Console. To do that, you can start a local node with IGNITE_H2_DEBUG_CONSOLE system property or an environment variable set to true. The console will be opened in your browser. You may need to click the Refresh button on the Console because it can be opened before the database objects are initialized.

SQL Performance and Usability Considerations

There are a few common pitfalls that should be considered when running SQL queries.

  1. If the query contains an OR operator, then indexes may not be used as expected. For example, for the query select name from Person where sex='M' and (age = 20 or age = 30), index on field sex will be used instead of index on field age although the latter is a more selective index. As a workaround for this issue, you can rewrite the query with UNION ALL (notice that UNION without ALL will return DISTINCT rows, which will change the query semantics and introduce additional performance penalty). For Example:
    select name from Person where sex='M' and age = 20 UNION ALL select name from Person where sex='M' and age = 30.

  2. If the query contains an IN operator, there can be two issues: First, it is impossible to provide a variable list of parameters. That means that you have to specify the exact list in the query, for example, where id in (?, ?, ?). You cannot write - where id in ? and pass an array or collection. Second, this query will not use indexes. As a workaround to both the problems, you can rewrite the query in the following way: select p.name from Person p join table(id bigint = ?) i on p.id = i.id.
    Here you can provide an object array (Object[]) of any length as a parameter and the query will use the index on the field id. Note that primitive arrays (int[], long[], etc..) can not be used with this syntax, you can only pass an array of boxed primitives.

Example:

new SqlFieldsQuery(
  "select * from Person p join table(id bigint = ?) i on p.id = i.id").setArgs(new Object[]{ new Integer[] {2, 3, 4} }))

Which is converted in to the following SQL:

select * from "cache-name".Person p join table(id bigint = (2,3,4)) i on p.id = i.id

Query Parallelism

By default, an SQL query is executed in a single thread on each participating Ignite node. This approach is optimal for queries returning small result sets involving index search. For example:

select * from Person where p.id = ?

Certain queries might benefit from being executed in multiple threads. This relates to queries with table scans and aggregations, which is often the case for OLAP workloads. For example:

select SUM(salary) from Person

You can control query parallelism through the CacheConfiguration.queryParallelism property which defines the number of threads that will be used to execute a query on a single node.
If a query contains JOINs, then all the participating caches must have the same degree of parallelism.

Use with care

Currently, this property affects all queries executed on the given cache. While providing speedup to heavy OLAP queries, this option may slowdown other simple queries. This behavior will be improved in further versions.

Index Hints

Index hints are useful in scenarios when it's known that one index is more suitable for certain queries than another. They are also needed to instruct the query optimizer to choose a more efficient execution plan. To do this optimization in Apache Ignite, you can use
USE INDEX(indexA,...,indexN) statement that tells Ignite to apply only one of the named indexes provided for query execution.

Below is an example that leverages from this capability:

SELECT * FROM Person USE INDEX(index_age)
  WHERE salary > 150000 AND age < 35;

Querying Replicated Caches

If a SQL query is executed over the data stored across replicated caches only, then you may want to set the SqlQuery.setReplicatedOnly(...) parameter to true. This is a special hint to the SQL engine that might produce a more effective execution plan for the query.

Query Execution Flow Optimizations

SQL engine automatically optimizes queries that use primary or affinity keys in the condition section of a SELECT statement. For instance, for the query like the one below

SELECT * FROM Person p WHERE p.id = ?

Ignite will calculate the partition p.id belongs to and will execute the query only on the node that is a primary for that partition.

Advanced DML Optimizations

Usually, UPDATE and DELETE statements require performing a SELECT query in order to prepare a set of cache entries to be processed later. In some situations, this can be avoided by direct translation of DML statements into specific cache operations, leading to a significant performance gain.

To summarize the content of the distributed DML section, following are the reasons why UPDATE and DELETE automatically execute a SELECT query:

  1. A complex filtering is used in the WHERE clause of UPDATE or DELETE statement. This happens when a sophisticated and advanced filtering of entries is used and the DML engine needs to do extra work in order to prepare the list of the entries that will be updated by the DML statement.
  2. An UPDATE statement contains an expression. Even if the WHERE clause is simple and points to a cache entry to be modified directly with the usage of _key or _value, the execution of an expression might result in new fields' values. This is why DML engine has to execute the SELECT in order to evaluate the expression's execution result.
  3. An UPDATE statement modifies specific fields belonging to a cache entry. The DML engine needs to retrieve a current cache entry first, modify it, and put it back into the cache.

Executing DML faster

To execute a DML operation in the fastest way, the following requirements must be met:

  1. A DML operation must not trigger the SELECT query execution.
  2. The operation has to adjust a single cache entry.

The following rules have to be followed in order to satisfy the requirements above:

  1. Filter out cache entries with the usage of _key and _val keywords only.
  2. These arguments have to be used explicitly in a DML statement. Cache entries' fields or expressions must not be accessed and executed.
  3. If an UPDATE statement is executed, then it has to update the whole cache entry (_val) rather than specific fields.

Let's look into the following example.

cache.query(new SqlFieldsQuery("UPDATE Person SET _val = ?3" +
    " WHERE _key = ?1 and _val = ?2").setArgs(7, 1, 2));

The UPDATE statement does the following:

  • Explicitly tells which cache entry needs to be updated by specifying _key the entry belongs to and entry's expected value (_val).
  • Updates the whole cache entry's value by using _val keyword.

As a result, the DML engine will execute the cache operation below as-is:

cache.replace(7, 1, 2);