Apache Spark (Dataframe and Dataset)

As to target wider range of audience in “Big Data” spark introduced Data Frame API for users. RDD API is also elegant, it reduces thousand line of code to dozens however, RDD API are now consider as low level API because it requires developer to know much about internal working of apache spark in order … Continue reading Apache Spark (Dataframe and Dataset)

Apache Spark Basic 2

Reduction operations combines the list of elements to produce single combined results. Operations like fold, reduce, groupByKey and aggregate iterates over the elements to produce single combine element. Lets go through different types of reduction operations in Apache Spark Fold vs Fold Left vs Aggregate Fold Left is not recommended to used because it is not parallelized, whereas fold function is … Continue reading Apache Spark Basic 2

Apache Spark Basics 1

Quote from Apache Spark documents Apache Sparkā„¢ is a unified analytics engine for large-scale data processing. Spark is popularity is increasing rapidly among the big data. Spark is one of the key for big data distributed processing.  First before dive into spark one have to know little about Hadoop, then later the pure purpose of the … Continue reading Apache Spark Basics 1