Spark - join transformation

less than 1 minute read

join—Equivalent to an inner join in RDBMS, this returns a new pair RDD with the elements (K, (V, W)) containing all possible pairs of values from the first and second RDDs that have the same keys. For the keys that exist in only one of the two RDDs, the resulting RDD will have no elements. 


leftOuterJoin—Instead of (K, (V, W)), this returns elements of type (K, (V, Option(W))). The resulting RDD will also contain the elements (key, (v, None)) for those keys that don’t exist in the second RDD. Keys that exist only in the second RDD will have no matching elements in the new RDD. 


rightOuterJoin—This returns elements of type (K, (Option(V), W)); the resulting RDD will also contain the elements (key, (None, w)) for those keys that don’t exist in the first RDD. Keys that exist only in the first RDD will have no matching elements in the new RDD. 


fullOuterJoin—This returns elements of type (K, (Option(V), Option(W)); the resulting RDD will contain both (key, (v, None)) and (key, (None, w)) elements for those keys that exist in only one of the two RDDs.