Writing Performant Spreadsheets

Computer scientists refer to the amount of time a program takes to execute as its performance.

Spreadsheets are computer programs. Consequently, they are subject to the same computational limitations as other computer languages. Programmers in different computer languages, like Java, Python, and JavaScript, need to understand how fast their programs will run against large amounts of data. The same considerations hold for spreadsheet programmers.

A spreadsheet's performance depends on the amount of time each worksheet takes to complete. In turn, each worksheet's performance depends on the amount of time each cell takes to complete.

We have designed this section to help Pebblers create spreadsheets that can run against arbitrarily large input and still complete in the fastest possible time.

To understand whether or not your spreadsheet will execute quickly, Pebblers must first understand the different categories of cell-level operations in spreadsheet computing. Each of these different categories of operations has different performance profiles. Some categories perform better than others.

Pebblers should reference the following list of the different categories of cell level operations in order of fastest to slowest.

πŸ“˜

Relative Speed varies depending on the speed of your CPU and the amount of data being processed

The provided relative times are for worksheets that contain a thousand rows of data or less. An individual Pebblers' experience may vary, but the relative performance of each category of cell operation is still helpful to understand.

Cell Level OperationExampleRelative SpeedBig O Notation
Arithmetic Calculation in a single cell43 + 17 * 3Less than a millisecondConstant time O(1)
String Operation in a single cellMID, TEXT, LEFT, RIGHT, LENLess than a millisecondLinear time O(n) where n is the length of the string.
Cell ReferenceA1, BA$54, $M$1024millisecondsConstant time O(1)
Single approximate match lookup on a sorted rangeMATCH(D6,B6:B14,1)
MATCH(D6,B6:B14,-1)
tens of millisecondsLogarithmic time O(logn) where n is the number of cells in the searched range.
Single exact match lookup on a rangeMATCH(D6,B6:B14,0)tens of millisecondsLinear time O(n) where n is the number of cells in the searched range.
Joinjoin, outer-jointens of millisecondsLinear time O(n), where n is the number of cells in the worksheets to be joined.
Approximate match applied to every row in a worksheetDrag down using the match-rows of MATCH(D6,B6:B14,1)hundreds of millisecondsLinear time O(m * logn), where n is the number of cells in the search range and m is the number of dragged down cells the search is executed in.
Exact match applied to every row in a worksheetDrag down using the match-rows directive of MATCH(D6,B6:B14,0)hundreds of millisecondsQuadratic time O(m * n) where n is the number of cells in the searched range and m is the number of dragged down cells the search is executed in.
Cartesian productcartesian-producthundreds of millisecondsQuadratic time O(m * n) where m is the number of rows in the first sheet and n is the number of rows in the second sheet.
Lookups performed on multiple columns on every row in the worksheet.MATCH(D6,B6:B14,0)tens of secondsQuadratic time O(m n c) where n is the number of cells in the search range and m is the number of dragged down cells and c is the number of columns the search is executed in.

πŸ“˜

Big O Notation is how computer scientist classify an algorithm's speed.

Pebblers do not have to understand Big (O) notation to write fast-running worksheets; however, if you are interested in how computer scientists think about performance, Wikipedia does a fantastic job of explaining how this works.

Pebblers should use this table when designing worksheets. For instance, it is better to use the join directives to match rows of data in separate worksheets than to perform lookups. The improved performance is why join directives are preferred to lookups when designing Pebble Stream enhanced spreadsheets.