incubator-systemml

Mirror of Apache SystemML

Subscribe to updates I use incubator-systemml


Statistics on incubator-systemml

Number of watchers on Github 625
Number of open issues 21
Main language Java
Average time to merge a PR about 15 hours
Open pull requests 102+
Closed pull requests 430+
Last commit 8 months ago
Repo Created about 3 years ago
Repo Last Updated 8 months ago
Size 201 MB
Organization / Authorapache
Contributors27
Page Updated
Do you use incubator-systemml? Leave a review!
View open issues (21)
View incubator-systemml activity
View on github
Latest Open Source Launches
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating incubator-systemml for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

SystemML

Documentation: SystemML Documentation
Mailing List: Dev Mailing List
Build Status: Build Status
Issue Tracker: JIRA
Download: Download SystemML

SystemML is now an Apache Top Level Project! Please see the Apache SystemML website for more information.

SystemML is a flexible, scalable machine learning system. SystemML's distinguishing characteristics are:

  1. Algorithm customizability via R-like and Python-like languages.
  2. Multiple execution modes, including Spark MLContext API, Spark Batch, Hadoop Batch, Standalone, and JMLC.
  3. Automatic optimization based on data and cluster characteristics to ensure both efficiency and scalability.

The latest version of SystemML supports: Java 8+, Scala 2.11+, Python 2.7/3.5+, Hadoop 2.6+, and Spark 2.1+.

Algorithm Customizability

ML algorithms in SystemML are specified in a high-level, declarative machine learning (DML) language. Algorithms can be expressed in either an R-like syntax or a Python-like syntax. DML includes linear algebra primitives, statistical functions, and additional constructs.

This high-level language significantly increases the productivity of data scientists as it provides (1) full flexibility in expressing custom analytics and (2) data independence from the underlying input formats and physical data representations.

Multiple Execution Modes

SystemML computations can be executed in a variety of different modes. To begin with, SystemML can be operated in Standalone mode on a single machine, allowing data scientists to develop algorithms locally without need of a distributed cluster. In order to scale up, algorithms can also be distributed across a cluster using Spark or Hadoop. This flexibility allows the utilization of an organization's existing resources and expertise. In addition, SystemML features a Spark MLContext API that allows for programmatic interaction via Scala, Python, and Java. SystemML also features an embedded API for scoring models.

Automatic Optimization

Algorithms specified in DML are dynamically compiled and optimized based on data and cluster characteristics using rule-based and cost-based optimization techniques. The optimizer automatically generates hybrid runtime execution plans ranging from in-memory, single-node execution, to distributed computations on Spark or Hadoop. This ensures both efficiency and scalability. Automatic optimization reduces or eliminates the need to hand-tune distributed runtime execution plans and system configurations.

ML Algorithms

SystemML features a suite of production-level examples that can be grouped into six broad categories: Descriptive Statistics, Classification, Clustering, Regression, Matrix Factorization, and Survival Analysis. Detailed descriptions of these algorithms can be found in the SystemML Algorithms Reference. The goal of these provided algorithms is to serve as production-level examples that can modified or used as inspiration for a new custom algorithm.

Download & Setup

Before you get started on SystemML, make sure that your environment is set up and ready to go.

  1. If youre on OS X, we recommend installing Homebrew if you havent already. For Linux users, the Linuxbrew project is equivalent.

OS X:

  /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Linux:

  ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install)"
  1. Install Java (need Java 8).

    brew tap caskroom/cask
    brew install Caskroom/cask/java
    
  2. Install Spark 2.1.

    brew tap homebrew/versions
    brew install apache-spark21
    
  3. Download SystemML.

Go to the SystemML Downloads page, download systemml-1.0.0-bin.zip (should be 2nd), and unzip it to a location of your choice.

The next step is optional, but it will make your life a lot easier.

  1. [OPTIONAL] Set SYSTEMML_HOME in your bash profile. Add the following to ~/.bash_profile, replacing path/to/ with the location of the download in step 4.

    export SYSTEMML_HOME=path/to/systemml-1.0.0
    

    Make sure to open a new tab in terminal so that you make sure the changes have been made.

  2. [OPTIONAL] Install Python or Python 3 (to follow along with our Jupyter notebook examples).

Python 2:

  brew install python
  pip install jupyter matplotlib numpy

Python 3:

  brew install python3
  pip3 install jupyter matplotlib numpy

Congrats! You can now use SystemML!

Next Steps!

To get started, please consult the SystemML Documentation. We recommend using the Spark MLContext API to run SystemML from Scala or Python using spark-shell, pyspark, or spark-submit.

incubator-systemml open pull requests (View All Pulls)
  • [SYSTEMML-524] Doc turning off parfor parallelization in DML Lang Ref
  • [SYSTEMML-534] Add optional console output of stats to Univar-Stats.dml
  • [Scala Pipeline API] Add scala logisticRegression api for spark pipeline
  • [SYSTEMML-153] Read file with extension 'csv' not require mtd file
  • [SYSTEMML-503] Update license for CSS and JS
  • [SYSTEMML-495] Simplify SystemML Configuration Loading
  • [Systemml-475] [WIP] Add Implicit Conversion Between Scalar Values and 1x1 Matrices
  • [SYSTEMML-545] Document Scala build support in Eclipse
  • Fix path to Zeppelin notebook example
  • [SYSTEMML-536] KNN Algorithm
  • [SYSTEMML-486] Java opts to standalone scripts
  • Inconsistency in descriptions
  • [SYSTEMML-573] Frame CSV Spark read/write test cases
  • [WIP] [SYSTEMML-594] Tutorial to run SystemML on Bluemix/Datascientistworkbench using Zeppelin/Jupyter
  • [SYSTEMML-540] Initial implementation of conv2d/maxpooling builtin functions (new)
  • [SYSTEMML-560] Frame converter between Matrix and Binary block
  • [SYSTEMML-668] Python MLOutput.getDF() Can't Access JVM SQLContext
  • [SYSTEMML-508] Extend "executeScript" In MLContext To Accept PyDML.
  • [SYSTEMML-666] Fix matrix representation example in decision tree docs
  • [SYSTEMML-294] Print matrix capability
  • [SYSTEMML-637] Hybrid Flink Linear regression example
  • [SYSTEMML-618] Deep Learning DML Library
  • [SYSTEMML-692][WIP] Added initial version of DML generator for Caffe proto
  • [SYSTEMML-659] Fix LICENSE and NOTICE for main jar artifact
  • [SYSTEMML-646][SYSTEMML-581] Make testing of MLPipeline wrappers more robust
  • [SYSTEMML-296] Add elif (else if) to PyDML
  • Initial release process doc
  • [WIP] Initial implementation of hop_rewrite explain
  • [SYSTEMML-445] [WIP] Initial version of GPU backend
  • Update Beginner's Guide for toString, 0-based PyDML, and elif
  • Add test suite for knn algorithm.
  • Include guava in standalone distributions
  • [SYSTEMML-562] Frame Left Indexing (Not for 0.10 release)
  • [SYSTEMML-657] Deprecate `ppred(...)` Built-In Function
  • [SYSTEMML-766] Adding a rewrite for Axpy (matrix-scalar product and adds the result to a matrix)
  • [kNN]Modified the code base on the comments in before pr.
  • Clean up and reorganize the existing documentation.
  • [SYSTEMML-776][WIP] Update SystemML to Support Spark 2.0.0
  • [SYSTEMML-831][WIP] Implement t-SNE algorithm
  • [SYSTEMML-451] Python embedded DSL
  • [SYSTEMML-701][WIP]sparse matrix gpu
  • [SYSTEMML-145] Remove crc files from local file system
  • [SYSTEMML-891] Update MLContext Matrix and Frame 'as' methods to 'to'
  • rangeReIndex spits out a better error message if a scalar is indexed.
  • [SYSTEMML-897] Add old MLContext Spark Shell examples to docs
  • [SYSTEMML-1022] Update default spark version build property
  • [SYSTEMML-565][WIP] Document frame support in dml language reference
  • [SYSTEMML-842] Fix javadocs in api package
  • [SYSTEMML-446] [WIP] Exploit cublas libraries for transpose and certain cases of binary operations + add support for invoking custom kernels
  • [SYSTEMML-1062] Add build time to manifest of main jar
  • Implementation of SVD
  • [SYSTEMML-880] Initial version of pushdown loop for Python DSL
  • [SYSTEMML-1099] Mavenize the creation of python package
  • [SYSTEMML-1085] Fix inmemory artifact
  • [SYSTEMML-1084] Change .tar.gz artifact extensions to .tgz
  • [SYSTEMML-1118][WIP] Updated to use JCuda 0.8.0
  • [SYSTEMML-1116] Make SystemML Python DSL NumPy-friendly
  • [SYSTEMML-1112] Add Scala Algorithms API For Spark ML Pipeline
  • [SYSTEMML-1173] Readability of StringIdentifier and DataExpression toString
  • [SYSTEMML-776] Upgrade Spark version
  • [SYSTEMML-1163][WIP] Recursive Block Cholesky Algorithm
  • [SYSTEMML-1161][WIP] recursive qr factorization
  • [SYSTEMML-769] Support for native BLAS and simplify deployment for GPU backend
  • [SYSTEMML-1573] Incorporate ALLOW_OPERATOR_FUSION in ConvolutionOp for developer testing
  • [SYSTEMML-1572] Enable native BLAS on remote executors
  • [SYSTEMML-540] Additional tests to compare the accuracy of different convolution related operators with CuDNN
  • [SYSTEMML-1511] Tab completion for scripts using MLContext
  • [SYSTEMML-1554] IPA Scalar Transient Read Replacement
  • [SYSTEMML-1549] Cox.dml - return S & T in usable format
  • [SYSTEMML-1563] Adding a distributed synchronous SGD MNIST LeNet example.
  • [WIP][SYSTEMML-298] Print matrices without as.scalar or toString
  • [SYSTEMML-1353] Initial Scala DSL implementation
  • [SYSTEMML-1160] [WIP] Enable prefetching of batches via for loop iterator
  • [SYSTEMML-1625][WIP] Gpu unit tests
  • [SYSTEMML-1532][WIP] Python launch script for spark-submit
  • [SYSTEMML-1606] Update notebook samples with latest code
  • [SYSTEMML-1596] Set runtime platform via MLContext
  • [SYSTEMML-1583] [WIP] Read caffemodel using Caffe2DML
  • [SYSTEMML-2004] Covariance Kernels
  • [SYSTEMML-2068] Codegen support for logical and bitwise logical operations
  • [SYSTEMML-1437] Factorization Machines
  • [SYSTEMML-1685] Improve data generation scripts
  • [SYSTEMML-445] [WIP] Added two-step strategy to deal with potential fragmentation on GPU
  • [WIP] [MINOR] Support Native BLAS on Windows, PowerPC and Mac
  • Added tiny LSTM example to illustrate usage
  • [WIP][SYSTEMML-1748] Functionalize Kmeans
  • [SYSTEMML-1831] Improve the efficiency of matrix subsetting
  • [SYSTEMML-1821] Improve the training process in mnist_lenet_distrib_sgd.dml
  • [SYSTEMML-1648] making svm scripts work with mlcontext
  • [SYSTEMML-2068] codegen BitwAnd support
  • [SYSTEMML-1444][PART-2] UDFs w/ single output in expressions
  • [WIP][SYSTEMML-2110] Codegen support for relu operation
  • Gaussian Process Classification Script.
  • [SYSTEMML-1991] Implementation of Sobol quasi random sequence generator
  • [SYSTEMML-1491] Add different ReLU variants ELU
  • [SYSTEMML-1872] Added average pooling and upsampling layers
  • [SYSTEMML-445] Cleanup GPU memory management
  • [SYSTEMML-1994] GP regression script, with predictive mean & variance
  • [DOC] Factorization Machines core module
  • [WIP][SYSTEMML-976] Add explain and stats option to Python DSL
  • [SYSTEMML-2121] PCA test for codegenalg suite
  • [SYSTEMML-2066] JMLC test for candidate exploration with unknowns
incubator-systemml list of languages used
Other projects in Java