TensorFlow is an open source software library for high-performance numerical computation that is used mostly for deep learning and other computationally intensive machine learning tasks. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs). Originally developed by researchers and engineers from the Google Brain team within Google’s AI organization, it comes with strong support for machine learning and deep learning and the flexible numerical computation core is used across many other scientific domains.
Apache Ignite is a memory-centric distributed database, caching, and processing platform for transactional, analytical, and streaming workloads delivering in-memory speeds at petabyte scale.
TensorFlow and Apache Ignite together provide a full toolset needed to work with operational and historical data, to perform data analysis and to build complex mathematical models based on neural networks.
Ignite Dataset represents an integration between Apache Ignite and TensorFlow that allows Apache Ignite to be used as a data source for neural network training, inference and all other computations supported by TensorFlow. Using Ignite Dataset has many advantages, including:
- TensorFlow obtains fast access to a distributed database that can contain training data and data for inference.
- Objects fed by Ignite Dataset can have any structure, thus all preprocessing can be done in the TensorFlow pipeline.
- SSL, Windows and distributed training are also supported.
For now, Ignite Dataset is a part of TensorFlow, so there is no need to install any third-party packages and you can use it out of the box. The integration is based on tf.data from the TensorFlow side and Binary Client Protocol from the Apache Ignite side.
Apache Ignite can be used as a data source for neural network training, inference and all other computations supported by TensorFlow.
In addition to the database functionality, Apache Ignite provides a distributed file system called IGFS. IGFS delivers functionality similar to Hadoop HDFS, but only in-memory.
The integration is based on custom filesystem plugin from the TensorFlow side and IGFS Native API from the Apache Ignite side. It has many uses, for example:
- Checkpoints of state can be saved to IGFS for reliability and fault-tolerance.
- Training processes communicate with TensorBoard by writing event files to a directory, which TensorBoard watches. IGFS allows this communication to work even when TensorBoard runs in a different process or machine.
IGFS plugin state
At present, the IGFS plugin is not a part of Tensorflow. For the current state of TensorFlow, please follow this pull request