Big data is data sets that are so voluminous and complex that traditional data-processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. There are a number of concepts associated with big data: originally there were 3 concepts volume, variety, velocity. Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) and value.

Poor man's blockchain

A five mins bitcoin-ish implementation to understand it.

Timeseries forecasting with H2O

By expanding a time series horizontally you can use H2O to forecast it.

Graph analytics with Spark GraphFrames

, ,
Large scale graph analytics with Spark and Apache GraphFrames.

Statistics on Apache Hive

Basics of stats using Apache Hive.

Apache Spark Streaming

Spark Streaming in HortonWorks.

Getting started with Apache Zeppelin

Zeppelin as the Jupyter of big data.

Spam with Sparkling Water

Using Spark and H2O to detect spam.

Ensemble Learner

Ensemble learner using H2O.