Technology moves fast! ⚡ Don't get left behind.🚶 Subscribe to our mailing list to keep up with latest and greatest in open source projects! 🏆


Subscribe to our mailing list

aegisthus

A Bulk Data Pipeline out of Cassandra

Subscribe to updates I use aegisthus


Statistics on aegisthus

Number of watchers on Github 257
Number of open issues 3
Average time to close an issue 7 months
Main language Java
Average time to merge a PR 28 days
Open pull requests 2+
Closed pull requests 2+
Last commit about 2 years ago
Repo Created almost 6 years ago
Repo Last Updated over 1 year ago
Size 621 KB
Organization / Authornetflix
Latest Releasev0.2.4
Contributors9
Page Updated
Do you use aegisthus? Leave a review!
View open issues (3)
View aegisthus activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating aegisthus for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

Aegisthus

STATUS

Aegisthus has been transitioned to maintenance mode. It is still used for ETL at Netflix for Cassandra 2.x clusters, but it will not be evolving further.

OVERVIEW

A Bulk Data Pipeline out of Cassandra. Aegisthus implements a reader for the SSTable format and provides a map/reduce program to create a compacted snapshot of the data contained in a column family.

BUILDING

Aegisthus is built via Gradle (http://www.gradle.org). To build from the command line: ./gradlew build

RUNNING

Please see the wiki or checkout the scripts directory to use our sstable2json wrapper for individual sstables.

TESTING

To run the included tests from the command line: ./gradlew build

ENHANCEMENTS

  • Reading
    • Commit log readers
    • Code to do this previously existed in Aegisthus but was removed in commit 35a05e3f.
    • Split compressed input files
    • Currently compressed input files are only handled by a single mapper. See the discussion in issue #9. The relevant section of code is in getSSTableSplitsForFile in AegisthusInputFormat.
    • Add CQL support
    • This way the user doesn't have to add the key and column types as job parameters. Perhaps we will do this by requiring the table schema like SSTableExporter does.
  • Writing
    • Add an option to snappy compress output.
    • Add an output format for easier downstream processing.
    • See discussion on issue #36.
    • Add a pivot format
    • Create an output format that contains a column per row. This can be used to support very large rows without having to have all of the columns in memory at one time.
  • Packaging
    • Publish Aegisthus to Maven Central
    • Publish Shaded/Shadowed/FatJar version of Aegisthus as well

LICENSE

Copyright 2013 Netflix, Inc.

Licensed under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

aegisthus open issues Ask a question     (View All Issues)
  • almost 3 years Timestamp clustering keys lose precision
  • about 3 years Cassandra 3.0 support
  • over 3 years Functional example?
aegisthus open pull requests (View All Pulls)
  • Ensure checksum is fully read until EOF.
  • Issue #56: Fixes an OOM error processing large compressed column families
aegisthus list of languages used
aegisthus latest release notes
v0.2.4 Add support for outputting SSTables

This is a fairly major rewrite. The idea is that the mapper will output Columns rather than rows. This way we can sort them correctly into the reducer so that we can handle RangeTombstones.

This also gives us several other benefits. One is that we can just send the Cassandra Atoms across to the reducer, which keeps us from having to process Json while in flight, avoiding encoding/decoding problems that we had previously on characters that didn't serialize into json without escaping.

Because of this the final output is now an SSTable. It just makes the whole process of incremental processing easier because we only support one fileformat rather than SSTables and Json.

To get back to Json we will add a new serialization format when we deprecate the old way of processing files.

v0.1.3
Other projects in Java