Applying a function to multiple columns in Pandas

I am currently in the process of wrangling some Automatic Weather Station (AWS) data into a usable format. However, the space-delimited columnar format is not exactly friendly for a Pandas user.

The problem that I have been facing is being able to create a legitimate datetime column in the data which is a function of four "columns" of varying datatypes.

A quick glance through the Pandas documentation and it turns out this is incredibly easy to do:

def myfunc(x):
    """A function that will work on a dataframe row-wise

        df['result'] = df.apply(myfunc, axis=1)

        x (pandas.Series) : A row-wise pandas Series

        pandas.Series : The result of this function

    return x[0] + x[1]

df['result'] = df.apply(myfunc, axis=1)

The x passed into the function will be one row of the dataframe, which is supplied for each row passed into the function and returning the result as the element to place in the new column at that row.

This is obviously a trivial example that can be done natively in Pandas with df['result'] = df['a'] + df['b'], but it sets the groundwork for more complex operations involving multiple columns.

Some things to note:

  • Check your datatypes - for some reason all of my integers suddenly became floats.
  • Ensure that the function will always return something - even if it is just a missing value example so as to complete the new series and control the output for your next processing step.

Hope this post helps out someone else in a similar situation.