Apache Beam Key Concepts

less than 1 minute read

Apache Beam has the following key concepts.

  1. Pipeline
  • Modeling processing data
  1. PCollection
  • Immutable dataset
  1. PTransform
  • Various transforming functions

    • ParDo
    • GroupBy
    • Combine
    • Flatten
    • Partition
  1. Side Input
  • Additional data to support transformation
  1. Runner
  • Selecting platform to run logics such as Spark, Flink and etc
  1. Windowing
  • Defining time windows - What
  1. Triggers
  • When aggregation will run - When
  1. Schema
  • Defining data format