PySpark - Custom aggregation count in groupBy

less than 1 minute read

PySpark custom aggregation count example in groupBy.

1
2
3
4
  cnt_cond = lambda cond: F.sum(F.when(cond, 1).otherwise(0))
  df.groupBy(df.date).agg(F.avg(df.price).alias('avg'),
                          cnt_cond(df.include == 'true').alias('count_cnd')) \
                          .show()

Github code

Categories:

Updated: