Using Pandas in Excel ¶

Pandas Types Options
Passing as Python objects instead of Excel arrays
Using the Pandas type converters outside of a UDF

Pandas DataFrames and Series can be used as function arguments and return types for Excel worksheet functions using the decorator xl_func.

When used as an argument, the range specified in Excel will be converted into a Pandas DataFrame or Series as specified by the function signature.

When returning a DataFrame or Series, a range of data will be returned to Excel. PyXLL can automatically resize the range of the array formula to match the returned data by setting auto_resize=True in xl_func.

The following code shows a function that returns a random dataframe, including the index:

from pyxll import xl_func
import pandas as pd
import numpy as np

@xl_func("int rows, int columns: dataframe<index=True>", auto_resize=True)
def random_dataframe(rows, columns):
    data = np.random.rand(rows, columns)
    column_names = [chr(ord('A') + x) for x in range(columns)]
    return pd.DataFrame(data, columns=column_names)

A function can also take a DataFrame or Series as one its arguments. When passing a DataFrame or Series to a function the whole data area must be selected in Excel and used as the argument to the function.

The following function takes a DataFrame including the column headers row, but not including the index column and returns the sum of a single column.:

from pyxll import xl_func

@xl_func("dataframe<index=False, columns=True>, str: float")
def sum_column(df, col_name):
    col = df[col_name]
    return col.sum()

Pandas Types Options ¶

The following type parameters are available for the dataframe and series argument and return types:

Tip

For specifying multiple or complex type parameters it can be easier to use the xl_arg and xl_return decorators.

See @xl_arg and @xl_return Decorators for more details about how to use xl_arg and xl_return to specify type parameters.

dataframe, when used as an argument type

dataframe<index=0, columns=1, dtype=None, dtypes=None, index_dtype=None, multi_sparse=True>
index
Number of columns to use as the DataFrame’s index. Specifying more than one will result in a DataFrame where the index is a MultiIndex.

columns
Number of rows to use as the DataFrame’s columns. Specifying more than one will result in a DataFrame where the columns is a MultiIndex. If used in conjunction with index then any column headers on the index columns will be used to name the index.

dtype
Datatype for the values in the dataframe. May not be set with dtypes.

dtypes
Dictionary of column name -> datatype for the values in the dataframe. May not be set with dtype.

The dictionary can be specified using standard Python dictionary syntax as part of a function signature string. However, often it is more convenient to use the xl_arg or xl_return decorators. These allow you to set type multiple complex parameters more easily, for example:
@xl_func @xl_arg("df", "dataframe", dtypes={"A": "date"}) def your_function(df): ....
Not all column dtypes need to be specified. Any that are not specified will default to var.
index_dtype
Datatype for the values in the dataframe’s index.

multi_sparse [1]
Return sparse results for MultiIndexes. Can be set to True or False, or 'index' or 'columns' if it should only apply to one or the other.
dataframe, when used as a return type

dataframe<index=None, columns=True>

index
If True include the index when returning to Excel, if False don’t. If None, only include if the index is named.

columns
If True include the column headers, if False don’t.
series, when used as an argument type

series<index=1, transpose=None, dtype=None, index_dtype=None, multi_sparse=True>

index
Number of columns (or rows, depending on the orientation of the Series) to use as the Series index.

transpose
Set to True if the Series is arranged horizontally or False if vertically. By default the orientation will be guessed from the structure of the data.

dtype
Datatype for the values in the Series.

index_dtype
Datatype for the values in the Series’ index.

multi_sparse [1]
Return sparse results for MultiIndexes.
series, when used as a return type

series<index=True, transpose=False>

index
If True include the index when returning to Excel, if False don’t.

transpose
Set to True if the Series should be arranged horizontally, or False if vertically.

Passing as Python objects instead of Excel arrays ¶

When passing large DataFrames between Python functions, it is not always necessary to return the full DataFrame to Excel and it can be expensive reconstructing the DataFrame from the Excel range each time. In those cases you can use the object return type to return a handle to the Python object. Functions taking the dataframe and series types can accept object handles.

The following returns a random DataFrame as a Python object, so will appear in Excel as a single cell with a handle to that object:

from pyxll import xl_func
import pandas as pd
import numpy as np

@xl_func("int rows, int columns: object")
def random_dataframe(rows, columns):
    data = np.random.rand(rows, columns)
    column_names = [chr(ord('A') + x) for x in range(columns)]
    return pd.DataFrame(data, columns=column_names)

The result of a function like this can be passed to another function that expects a DataFrame:

@xl_func("dataframe, int: dataframe<index=True>", auto_resize=True)
def dataframe_head(df, num_rows):
    return df.head(num_rows)

This allows for large datasets to be used in Excel efficiently, especially where the data set would be cumbersome to deal with in Excel when unpacked.

Using the Pandas type converters outside of a UDF ¶

Sometimes it’s useful to be able to convert a range of data into a DataFrame, or a DataFrame into a range of data for Excel, in a context other than function decorated with xl_func. Or, you might have a function that takes the var type, which could be a DataFrame depending on other arguments.

In these cases the function get_type_converter can be used. For example:

from pyxll import get_type_converter

to_dataframe = get_type_converter("var", "dataframe<index=True>")
df = to_dataframe(data)

Or the other way:

to_array = get_type_converter("dataframe", "var")
data = to_array(df)

XLCell be used to get and set values from Excel from a menu or macro function. This can also use the pandas type converters by specifying the type to the XLCell.options method.

For example:

from pyxll import XLCell, xl_macro

@xl_macro
def write_dataframe():
    df = ...  # construct DataFrame
    cell = XLCell.from_range(...)  # get an XLCell instance from a Range object

    # Set the DataFrame value in Excel.
    # This automatically converts the DataFrame to an array type for writing to Excel.
    cell.options(type="dataframe", auto_resize=True).value = df

All the same parameters for the dataframe and series types can be used to control how the conversion is performed.

See Writing Python Values to Excel for more details of how to write Excel macros in Python that write values (including DataFrames to the Excel worksheet).

Footnotes

[1]	(1, 2) The `multi_sparse` parameter is new in PyXLL 5.3.0.

« Other UI Toolkits

Customizing the Ribbon »