Skip to main content

Evolution of Hadoop support in Cassandra

·2 mins

Obsolete Post Below is a compilation of all changes that were made in the Cassandra code base related to Hadoop support. The source for this compilation is Apache Cassandra - CHANGES.txt. I have tried my best to avoid any misses or mistakes. In case you notice something amiss, please drop in a comment and I will fix it.

1.0.1 #

0.8.7 #

0.8.5 #

  • Fail jobs when Cassandra node has failed but TaskTracker has not (CASSANDRA-2388)

0.8.3 #

0.8.2 #

0.8.1 #

  • Fix race that could result in Hadoop writer failing to throw an exception encountered after close (CASSANDRA-2755)

0.8.0 #

0.7.5 #

  • Allow job configuration to set the CL used in Hadoop jobs (CASSANDRA-2331)

0.7.3 #

  • Fix Hadoop ColumnFamilyOutputFormat dropping of mutations when batch fills up (CASSANDRA-2255)

0.7.1 #

0.7.0-rc2 #

  • Support multiple Mutations per key in hadoop ColumnFamilyOutputFormat (CASSANDRA-1774)

0.7-beta2 #

  • Remove cassandra.yaml dependency from Hadoop and Pig (CASSANDRA-1322)
  • Support for Hadoop Streaming [non-jvm map/reduce via stdin/out] (CASSANDRA-1368)
  • Rewrite Hadoop ColumnFamilyRecordWriter to pool connections, retry to multiple Cassandra nodes, and smooth impact on the Cassandra cluster by using smaller batch sizes (CASSANDRA-1434)

0.7-beta1 #

0.6.4 #

  • Hadoop jobs no longer require the Cassandra storage-conf.xml (CASSANDRA-1047)

0.6.2 #

  • Fix SlicePredicate serialization inside Hadoop jobs (CASSANDRA-1049)
  • Close Thrift sockets in Hadoop ColumnFamilyRecordReader (CASSANDRA-1081)

0.6.1 #

  • Use hostnames in CFInputFormat to allow Hadoop’s naive string-based locality comparisons to work (CASSANDRA-955)

0.6.0-beta3 #

0.6.0-beta1/beta2 #