Apache Ignite Documentation

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Machine Learning

Overview

Ignite Machine Learning was made generally available as part of the Apache Ignite 2.

The rationale for adding the ​machine and deep learning to Apache Ignite is quite simple. Many users employ Ignite as the central high-performance storage and processing systems for various data sets. If they wanted to perform machine learning (ML) or deep learning (DL) on these data sets (i.e training sets or model inference), they had to ETL them first into some other systems like Apache Mahout or Apache Spark, which resulted in two significant drawbacks:

  • It introduced a costly ETL step that was extremely time consuming and meant that ML/DL would be run on a copy of the data that was often out of date, and
  • It ended up being much slower processing than what Ignite's collocated distributed processing could achieve.

Plus, nowadays many ML and DL researchers and data scientists can't longer fit and have their algorithms digest constantly growing data volumes within a single server unit.

Ignite Machine Leaning allows users to run ML/DL training and inference directly on the data stored in an Ignite cluster and provides ML and DL algorithms that are specifically optimized for Ignite's co-located distributed processing resulting in extreme high-performance ML/DL on the live up-to-date data.

Overall, Ignite Machine Learning component is intended​ to bring the following benefits:

  • Distributed ML and DL when data does not fit within a single server unit.
  • Zero-ETL: train models and run algorithms in place.
  • Massive Scalability: Horizontal + Vertical, RAM + Disk.

Presently ML Grid supports core distributed algebra implementation based on Ignite co-located distributed processing as well as essential machine learning algorithms such as Linear Regression, Decision Trees, K-Means clustering and more. Future releases will introduce custom DSLs for Python, R and Scala, growing collection of optimized ML algorithms as well support for Ignite-optimized Neural Networks.

Following pages cover some capabilities and algorithms supported by ML component in more details:

Getting Started

The fastest way to get started with the Machine Learning is to build and run existing examples, study their output and keep coding. The ML and DL examples are located in the examples folder of every Apache Ignite distribution.

Follow the steps below to try out the examples:

  1. Download Apache Ignite of version 2.4 or later.
  2. Open examples project in an IDE like IntelliJ IDEA or Eclipse.
  3. Go to src/main/java/org/apache/ignite/examples/ml folder in the IDE and run an ML or DL example.

The examples do not require any special configuration. All ML and DL examples are supposed to launch, run and stop successfully without any user intervention and provide a meaningful output on the console. Additionally, the Tracer API example is supposed to launch a web browser and generates an HTML output.

Get it With Maven

Add the Maven dependency below to your project in order to include the ML functionality provided by Ignite:

<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-ml</artifactId>
    <version>${ignite.version}</version>
</dependency

Replace ${ignite-version} with an actual Ignite version.

Build From Sources

The latest Apache Ignite Machine Learning jar is always uploaded to the Maven repository. If you need to take the jar and deploy it in a custom environment, then it can be either downloaded from Maven or built from scratch. To build Machine Learning component from sources:

  1. Download the latest Apache Ignite source release.
  2. Clean local Maven repository (this is to ensure that older Maven builds don’t impact my check).
  3. Build and install Apache Ignite from the project's root directory:
mvn clean install -DskipTests -Dmaven.javadoc.skip=true
  1. Locate the Machine Learning jar in your local Maven repository under the path {user_dir}/.m2/repository/org/apache/ignite/ignite-ml/{ignite-version}/ignite-ml-{ignite-version}.jar.

  2. If you want to build ML or DL examples from sources, execute the following commands:

cd examples
mvn clean package -DskipTests

If needed, refer to DEVNOTES.txt in the project's root folder and README files in the ignite-ml component for more details.

Machine Learning