Advanced Usage

While FastXLSX offers a minimal API for simplicity, these advanced features require special attention for optimal performance and data integrity.

Type Handling

Core Concepts

FastXLSX uses the DType enum to enforce type consistency. Explicit type declaration provides:

  • Speed boost in I/O operations

  • Memory efficiency through optimized data structures

  • Type safety for critical data pipelines

Implementation Patterns

  • Reading: Specify types via RangeInfo objects or directly in cell_value method

  • Writing: Declare types directly in write_xxx() methods

Hint

Always specify dtype when possible - auto-detection add ~0.3μs/cell.

Numerical Optimization

Return Type Behavior

FastXLSX employs specialized memory layouts for numerical data:

Data Type

Return Structure

Bool/Int/Float

numpy.ndarray

Str/Date/DateTime/Any

Python list

Code Examples

# Reading with/without explicit typing
from fastxlsx import DType, RangeInfo

# Auto-detected (slower, returns list)
data_list = ws.read_value(RangeInfo((5,2), DShape.Row(3)))

# Type-enforced (faster, returns ndarray)
data_array = ws.read_value(
    RangeInfo((5,2), DShape.Row(3), dtype=DType.Float)
)
# Writing numerical data requirements
import numpy as np

# Valid: NumPy array with matching dtype
ws.write_row((1,0), np.array([4,5,6]), dtype=DType.Int)

# Invalid: Python list with numerical dtype
ws.write_row((2,0), [7,8,9], dtype=DType.Int)  # Raises TypeError

Type Enforcement Modes

Strict Mode (Default)

  • Behavior: Raises ValueError on type mismatch

  • Use Case: Data validation scenarios

# Strict type checking example
try:
    ws.read_value(
        RangeInfo((0,0), DShape.Row(3), dtype=DType.Str)
    )
except ValueError as e:
    print(f"Data integrity violation: {e}")

Lenient Mode (strict=False)

  • Behavior: Replaces invalid values with type-specific defaults

  • Warning: May cause silent data corruption

DType

Default Value

Bool

False

Int

0

Float

nan

Date/DateTime

1970-01-01

Str

Empty string (“”)

Any

None

# Lenient mode example
ws.read_value(
    RangeInfo((0,0), DShape.Row(3), dtype=DType.Date, strict=False)
)
# Returns: [date(1970,1,1), date(2025,2,3), date(1970,1,1)]

Danger

Use lenient mode ONLY IF YOU KNOW WHAT YOU ARE DOING.

Parallel Processing

FastXLSX implements Rust-native parallelism through the rayon library, offering true multi-threaded I/O operations without Python’s GIL limitations.

Core Advantages

  • Cross-Platform Efficiency: Bypasses Windows spawn method limitations

  • Scalability: Linear throughput scaling with CPU cores

  • Memory Safety: Zero-copy data pipelines with Rust’s ownership model

Implementation Patterns

Batch Writing

Optimal for generating multiple files with identical schemas (same worksheet structure/columns):

Writing 10 files with 6 sheets each
import numpy as np
from fastxlsx import DType, write_many, WriteOnlyWorksheet

workbooks = {}
for fid in range(10):  # 10 output files
    sheets = []
    for sid in range(6):  # 6 sheets per file
        ws = WriteOnlyWorksheet(f"Sheet{sid}")
        # Header with workbook/sheet ID
        ws.write_cell("A1", 10*fid + sid, dtype=DType.Int)
        # 3x3 float matrix
        ws.write_matrix((1,1), np.random.rand(100,100), dtype=DType.Float)
        sheets.append(ws)
    workbooks[f"batch_{fid:02d}.xlsx"] = sheets

write_many(workbooks)  # Parallelized write

Batch Reading

Ideal for aggregating data from multiple files with consistent layouts:

Reading 10 files with matrix extraction
from fastxlsx import read_many, RangeInfo, DShape, DType

results = read_many({
    f"batch_{fid:02d}.xlsx": {
        f"Sheet{sid}": [
            RangeInfo((0,0), DShape.Scalar(), dtype=DType.Int),  # Read header
            RangeInfo((1,1), DShape.Matrix(100,100), dtype=DType.Float) # Extract 3x3 matrix
        ]
        for sid in range(6)
    }
    for fid in range(10)
})

Performance Characteristics

Operation

10 Files (ms)

100 Files (ms)

Sequential Write

584

5860

Parallel Write

96.1 (6.07x)

924 (6.34x)

Sequential Read

367

3730

Parallel Read

65.9 (5.57x)

620 (6.01x)

Note

Benchmark environment: AMD Ryzen 7 5600X (6-core), 64GB DDR4, SATA SSD, Win10