apache-sparkText files and operations in Scala

Introduction

Reading Text files and performing operations on them.

Example usage

Read text file from path:

val sc: org.apache.spark.SparkContext = ???
sc.textFile(path="/path/to/input/file") 

Read files using wildcards:

sc.textFile(path="/path/to/*/*") 

Read files specifying minimum number of partitions:

sc.textFile(path="/path/to/input/file", minPartitions=3)

Join two files read with textFile()

Joins in Spark:

  • Read textFile 1

    val txt1=sc.textFile(path="/path/to/input/file1") 
    

    Eg:

      A B
      1 2
      3 4
    
  • Read textFile 2

    val txt2=sc.textFile(path="/path/to/input/file2") 
    

    Eg:

      A C
      1 5
      3 6
    
  • Join and print the result.

    txt1.join(txt2).foreach(println)
    

    Eg:

      A B C
      1 2 5
      3 4 6
    

The join above is based on the first column.