Apache Ignite Documentation

GridGain Developer Hub - Apache Ignitetm

Welcome to the Apache Ignite developer hub run by GridGain. Here you'll find comprehensive guides and documentation to help you start working with Apache Ignite as quickly as possible, as well as support if you get stuck.

 

GridGain also provides Community Edition which is a distribution of Apache Ignite made available by GridGain. It is the fastest and easiest way to get started with Apache Ignite. The Community Edition is generally more stable than the Apache Ignite release available from the Apache Ignite website and may contain extra bug fixes and features that have not made it yet into the release on the Apache website.

 

Let's jump right in!

 

Documentation     Ask a Question     Download

 

Javadoc     Scaladoc     Examples

Command Line Tool

To allow a user to control the process of building a TensorFlow cluster on top of an Apache Ignite cluster, Ignite provides a simple command line tool with the following commands.

Start Command

The start command starts a new TensorFlow cluster on top of an Apache Ignite cluster for the specified cache and then starts training (specified by JOB_DIR, JOB_CMD and JOB_ARGS). When everything is started, Apache Ignite maintains all processes and automatically restarts them in case of any failure. The output of the start command is an output of training.

Start command

Usage: ignite-tf start [-hV] [-c=<cfg>] CACHE_NAME JOB_DIR JOB_CMD [JOB_ARGS...]
Starts a new TensorFlow cluster and attaches to user script process.
CACHE_NAME Upstream cache name.
JOB_DIR Job folder (or zip archive).
JOB_CMD Job command.
[JOB_ARGS...] Job arguments.
-c, --config=<cfg> Apache Ignite client configuration.
-h, --help Show this help message and exit.
-V, --version Print version information and exit.

Internally it means the following procedure:

  • Determine the placement of partitions for the specified cache.
  • According to the partitions placement, start workers on the appropriate nodes.
  • Start training code on a random node in the cluster with TF_CONFIG that contains information about workers placement.
  • Route output of training to output of start command.
  • In case of failure, stop everything and start again from the first step.
  • If training is successfully completed, stop everything.

Stop Command

The stop command stops the specified TensorFlow cluster and corresponding training.

Stop command

Usage: ignite-tf stop [-hV] [-c=<cfg>] CLUSTER_ID
Stops a running TensorFlow cluster.
CLUSTER_ID Cluster identifier.
-c, --config=<cfg> Apache Ignite client configuration.
-h, --help Show this help message and exit.
-V, --version Print version information and exit.

Attach Command

The attach command attaches to the specified training and routes output of this training to the output of the attach command.

Attach command

Usage: ignite-tf attach [-hV] [-c=<cfg>] CLUSTER_ID
Attaches to running TensorFlow cluster (user script process).
CLUSTER_ID Cluster identifier.
-c, --config=<cfg> Apache Ignite client configuration.
-h, --help Show this help message and exit.
-V, --version Print version information and exit.

Ps Command

The ps command prints identifiers of all running TensorFlow clusters.

Ps command

Usage: ignite-tf ps [-hV] [-c=<cfg>]
Prints identifiers of all running TensorFlow clusters.
-c, --config=<cfg> Apache Ignite client configuration.
-h, --help Show this help message and exit.
-V, --version Print version information and exit.

Cluster Manager

Apache Ignite has a complex infrastructure that maintains a TensorFlow cluster. A quick overview of this is shown in the following diagram:

Command Line Tool


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.