Apache Ignite Documentation

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Linear Regression

Overview

Apache Ignite supports the ordinary least squares Linear Regression algorithm - one of the most basic and powerful machine learning algorithms. This documentation describes how the algorithm works, and is implemented in Apache Ignite.

The basic idea behind the Linear Regression algorithm is an assumption that a dependent variable y and an explanatory variable x are in the following relationship:

Be aware that further documentation uses a dot product of vectors x and b, and explicitly avoids using a constant term. It is mathematically correct in the case where vector x is supplemented by one value equal to 1.

The above assumption allows us to make a prediction based on a feature vector x if a vector b is known. This fact is reflected in Apache Ignite in the LinearRegressionModel class responsible for making predictions.

Model

A Model in the case of linear regression is represented by the class LinearRegressionModel. It enables a prediction to be made for a given vector of features, in the following way:

LinearRegressionModel model = ...;
double prediction = model.predict(observation);

Model is fully independent object and after the training it can be saved, serialized and restored.

Trainers

Linear Regression is a supervised learning algorithm. This means that to find parameters (vector b), we need to train on a training dataset and minimize the loss function:

Apache Ignite provides two linear regression trainers: trainer based on the LSQR algorithm and another trainer based on the Stochastic Gradient Descent method.

LSQR Trainer

The LSQR algorithm finds the least-squares solution to a large, sparse, linear system of equations. The Apache Ignite implementation is a distributed version of this algorithm.

// Create linear regression trainer.
LinearRegressionLSQRTrainer trainer = new LinearRegressionLSQRTrainer();

// Train model.
LinearRegressionModel mdl = trainer.fit(
    ignite,
    upstreamCache,
    (k, pnt) -> pnt.coordinates,
    (k, pnt) -> pnt.label
);

// Make a prediction. 
double prediction = mdl.apply(coordinates);
// Create linear regression trainer.
LinearRegressionLSQRTrainer trainer = new LinearRegressionLSQRTrainer();

// Train model.
LinearRegressionModel mdl = trainer.fit(
    upstreamMap,
    10,          // Number of partitions.
    (k, pnt) -> pnt.coordinates,
    (k, pnt) -> pnt.label
);

// Make a prediction. 
double prediction = mdl.apply(coordinates);

SGD Trainer

Another Linear Regression Trainer uses the stochastic gradient descent method to find a minimum of the loss function. The configuration of this trainer is similar to multilayer perceptron trainer configuration and we can specify the type of updater (SGD, RProp of Nesterov), max number of iterations, batch size, number of local iterations and seed.

// Create linear regression trainer.
LinearRegressionSGDTrainer<?> trainer = new LinearRegressionSGDTrainer<>(
    new UpdatesStrategy<>(
        new RPropUpdateCalculator(),
        RPropParameterUpdate::sumLocal,
        RPropParameterUpdate::avg
    ), 
    100000,  // Max iterations.
    10,      // Batch size.
    100,     // Local iterations.
    123L     // Random seed.
);

// Train model.
LinearRegressionModel mdl = trainer.fit(
    ignite,
    upstreamCache,
    (k, pnt) -> pnt.coordinates,
    (k, pnt) -> pnt.label
);

// Make a prediction. 
double prediction = mdl.apply(coordinates);
// Create linear regression trainer.
LinearRegressionSGDTrainer<?> trainer = new LinearRegressionSGDTrainer<>(
    new UpdatesStrategy<>(
        new RPropUpdateCalculator(),
        RPropParameterUpdate::sumLocal,
        RPropParameterUpdate::avg
    ), 
    100000,  // Max iterations.
    10,      // Batch size.
    100,     // Local iterations.
    123L     // Random seed.
);

// Train model.
LinearRegressionModel mdl = trainer.fit(
    upstreamMap,
    10,          // Number of partitions.
    (k, pnt) -> pnt.coordinates,
    (k, pnt) -> pnt.label
);

// Make a prediction. 
double prediction = mdl.apply(coordinates);

Examples

To see how the Linear Regression can be used in practice, try these examples that are available on GitHub and delivered with every Apache Ignite distribution.

Linear Regression


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.