Statistics on CaffeOnSpark

Number of watchers on Github 1217
Number of open issues 76
Average time to close an issue 14 days
Main language C++
Average time to merge a PR 1 day
Open pull requests 4+
Closed pull requests 13+
Last commit almost 2 years ago
Repo Created about 3 years ago
Repo Last Updated 10 months ago
Size 17.1 MB
Organization / Authoryahoo
Contributors3
Page Updated
Do you use CaffeOnSpark? Leave a review!
View open issues (76)
View CaffeOnSpark activity
View on github
Fresh, new opensource launches πŸš€πŸš€πŸš€
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating CaffeOnSpark for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

CaffeOnSpark

What's CaffeOnSpark?

CaffeOnSpark brings deep learning to Hadoop and Spark clusters. By combining salient features from deep learning framework Caffe and big-data frameworks Apache Spark and Apache Hadoop, CaffeOnSpark enables distributed deep learning on a cluster of GPU and CPU servers.

As a distributed extension of Caffe, CaffeOnSpark supports neural network model training, testing, and feature extraction. Caffe users can now perform distributed learning using their existing LMDB data files and minorly adjusted network configuration (as illustrated).

CaffeOnSpark is a Spark package for deep learning. It is complementary to non-deep learning libraries MLlib and Spark SQL. CaffeOnSpark's Scala API provides Spark applications with an easy mechanism to invoke deep learning (see sample) over distributed datasets.

CaffeOnSpark was developed by Yahoo for large-scale distributed deep learning on our Hadoop clusters in Yahoo's private cloud. It's been in use by Yahoo for image search, content classification and several other use cases.

Why CaffeOnSpark?

CaffeOnSpark provides some important benefits (see our blog) over alternative deep learning solutions.

  • It enables model training, test and feature extraction directly on Hadoop datasets stored in HDFS on Hadoop clusters.
  • It turns your Hadoop or Spark cluster(s) into a powerful platform for deep learning, without the need to set up a new dedicated cluster for deep learning separately.
  • Server-to-server direct communication (Ethernet or InfiniBand) achieves faster learning and eliminates scalability bottleneck.
  • Caffe users' existing datasets (e.g. LMDB) and configurations could be applied for distributed learning without any conversion needed.
  • High-level API empowers Spark applications to easily conduct deep learning.
  • Incremental learning is supported to leverage previously trained models or snapshots.
  • Additional data formats and network interfaces could be easily added.
  • It can be easily deployed on public cloud (ex. AWS EC2) or a private cloud.

Using CaffeOnSpark

Please check CaffeOnSpark wiki site for detailed documentations such as building instruction, API reference and getting started guides for standalone cluster and AWS EC2 cluster.

  • Batch sizes specified in prototxt files are per device.
  • Memory layers should not be shared among GPUs, and thus share_in_parallel: false is required for layer configuration.

Building for Spark 2.X

CaffeOnSpark supports both Spark 1.x and 2.x. For Spark 2.0, our default settings are:

  • spark-2.0.0
  • hadoop-2.7.1
  • scala-2.11.7 You may want to adjust them in caffe-grid/pom.xml.

Mailing List

Please join CaffeOnSpark user group for discussions and questions.

License

The use and distribution terms for this software are covered by the Apache 2.0 license. See LICENSE file for terms.

CaffeOnSpark open issues Ask a question     (View All Issues)
  • about 2 years How to test in the yarn cluster
  • about 2 years Should I install caffeonspark to all the slave node ?
  • about 2 years something wrong with standalone cluster
  • about 2 years something wrong when buiding caffe-grid
  • about 2 years Error while making-build under folder CaffeOnSpark
  • about 2 years Yarn mode:libprotobuf link error
  • about 2 years SocketCaffeNet UT should be enhanced
  • over 2 years Error while "make build" --[caffe-grid Failure]
  • over 2 years Fail to pass the sanity check
  • over 2 years caffe-grid build failure
  • over 2 years Add maven install code to build wiki page
  • over 2 years Storage platform functions
  • over 2 years What happens if one executor fails in muti executors running mode.
  • over 2 years run CaffeOnSpark standalone cluster,ERROR scheduler.TaskSchedulerImpl: Lost executor 2 on 10.136.159.133: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
  • over 2 years Num of Executors gets changed internally
  • over 2 years NullPointerException while training from dataframe
  • over 2 years run CaffeOnSpark standalone cluster,when run "make build " something error
  • over 2 years CaffeOnSpark run "make build",error!
  • over 2 years Error while building
  • over 2 years Training time on a cluster is higher than training on one machine
  • over 2 years Where is the Iteration and Loss output?
  • over 2 years How to switch on GPU mode on multiple nodes
  • over 2 years CaffeOnSpark as maven package
  • over 2 years You cannot call toBytes() more than once without calling reset()
  • over 2 years Docker image needed
  • over 2 years Error while building - run (proto) on project caffe distri
  • over 2 years How to monitor test error
  • over 2 years Does CaffeOnSpark support multiple LMDB Files For Training/Testing
  • over 2 years java.lang.IllegalStateException: RpcEnv already stopped.
  • over 2 years NullPointerException when Training Imagenet
CaffeOnSpark open pull requests (View All Pulls)
  • remove unused data
  • Add Spark 2.x support with binary compatibility to 1.X via maven prof…
  • Wrap with a for loop while sending&receiving header
  • Renewing Hadoop Download Links.
CaffeOnSpark list of languages used
Other projects in C++