Parsing Formulas
openpyxl supports limited parsing of formulas embedded in cells. The openpyxl.formula package contains a Tokenizer class to break formulas into their constituent tokens. Usage is as follows:
>>> from openpyxl.formula import Tokenizer
>>> tok = Tokenizer("""=IF($A$1,"then True",MAX(DEFAULT_VAL,'Sheet 2'!B1))""")
>>> print("\n".join("%12s%11s%9s" % (t.value, t.type, t.subtype) for t in tok.items))
IF( FUNC OPEN
$A$1 OPERAND RANGE
, SEP ARG
"then True" OPERAND TEXT
, SEP ARG
MAX( FUNC OPEN
DEFAULT_VAL OPERAND RANGE
, SEP ARG
'Sheet 2'!B1 OPERAND RANGE
) FUNC CLOSE
) FUNC CLOSE
As shown above, tokens have three attributes of interest:
.value
: The substring of the formula that produced this token.type
: The type of token this represents. Can be one ofToken.LITERAL
: If the cell does not contain a formula, its value is represented by a singleLITERAL
token.Token.OPERAND
: A generic term for any value in the Excel formula. (See.subtype
below for more details).Token.FUNC
: Function calls are broken up into tokens for the opener (e.g.,SUM(
), followed by the arguments, followed by the closer (i.e.,)
). The function name and opening parenthesis together form oneFUNC
token, and the matching parenthesis forms anotherFUNC
token.Token.ARRAY
: Array literals (enclosed between curly braces) get twoARRAY
tokens each, one for the opening{
and one for the closing}
.Token.PAREN
: When used for grouping subexpressions (and not to denote function calls), parentheses are tokenized asPAREN
tokens (one per character).Token.SEP
: These tokens are created from either commas (,
) or semicolons (;
). Commas createSEP
tokens when they are used to separate function arguments (e.g.,SUM(a,b)
) or when they are used to separate array elements (e.g.,{a,b}
). (They have another use as an infix operator for joining ranges). Semicolons are always used to separate rows in an array literal, so always createSEP
tokens.Token.OP_PRE
: Designates a prefix unary operator. Its value is always+
or-
Token.OP_IN
: Designates an infix binary operator. Possible values are>=
,<=
,<>
,=
,>
,<
,*
,/
,+
,-
,^
, or&
.Token.OP_POST
: Designates a postfix unary operator. Its value is always%
.Token.WSPACE
: Created for any whitespace encountered. Its value is always a single space, regardless of how much whitespace is found.
.subtype
: Some of the token types above use the subtype to provide additional information about the token. Possible subtypes are:Token.TEXT
,Token.NUMBER
,Token.LOGICAL
,Token.ERROR
,Token.RANGE
: these subtypes describe the various forms ofOPERAND
found in formulae.LOGICAL
is eitherTRUE
orFALSE
,RANGE
is either a named range or a direct reference to another range.TEXT
,NUMBER
, andERROR
all refer to literal values in the formulaToken.OPEN
andToken.CLOSE
: these two subtypes are used byPAREN
,FUNC
, andARRAY
, to describe whether the token is opening a new subexpression or closing it.Token.ARG
andToken.ROW
: are used by theSEP
tokens, to distinguish between the comma and semicolon. Commas produce tokens of subtypeARG
whereas semicolons produce tokens of subtypeROW
Translating formulae from one location to another
It is possible to translate (in the mathematical sense) formulae from one
location to another using the openpyxl.formulas.translate.Translator
class. For example, there a range of cells B2:E7
with a sum of each
row in column F
:
>>> from openpyxl.formula.translate import Translator
>>> ws['F2'] = "=SUM(B2:E2)"
>>> # move the formula one colum to the right
>>> ws['G2'] = Translator("=SUM(B2:E2)", origin="F2").translate_formula("G2")
>>> ws['G2'].value
'=SUM(C2:F2)'
Note
This is limited to the same general restrictions of formulae: A1 cell-references only and no support for defined names.