Pyspark - create a new column with concatenating other columns

less than 1 minute read

Pyspark can create a new column with concatenating other columns

import pyspark.sql.functions as fn

df = df_raw \
  .withColumn('new_col', fn.concat_ws('_', df_raw.col1, df_raw.col2))

Pyspark can also create a json column with other columns

import pyspark.sql.functions as fn
from pyspark.sql.functions import to_json, struct

json_columns = ["col1", "col2"]
df = df_raw \
  .withColumn('json', fn.to_json(fn.struct([df_raw[x] for x in json_columns])))

Pyspark create a json column with all other columns

import pyspark.sql.functions as fn
from pyspark.sql.functions import to_json, struct

df = df_raw \
  .withColumn('json', fn.to_json(fn.struct(fn.col(*))))

Share on

Twitter Facebook LinkedIn

Nam Seob Seo

Pyspark - create a new column with concatenating other columns

Share on

You may also enjoy

FrontEndMaster - OpenAI agent example

ML - basic algorithms

C++ - decltype example

C++ - static_pointer_cast