mssql-python Now Supports Apache Arrow: Zero-Copy Data Fetching for Polars, Pandas, DuckDB
Breaking: mssql-python Adds Direct Apache Arrow Support
April 2025 – In a major performance upgrade for data engineers and scientists, the mssql-python driver now supports fetching SQL Server query results directly as Apache Arrow structures. The change eliminates the traditional overhead of creating millions of Python objects and garbage-collection cycles, enabling near-zero-copy data exchange between SQL Server and Arrow-native libraries like Polars, Pandas, DuckDB, and Hugging Face datasets.

“This is a game-changer for anyone moving large datasets from SQL Server into Python analytics frameworks,” said Sumit Sarabhai, a reviewer of the feature. “By leveraging the Arrow C Data Interface, we skip the per-row Python object creation entirely. The entire fetch runs in C++ and writes directly into Arrow buffers – users see immediate speed gains and dramatically lower memory usage.”
The feature was contributed by community developer Felix Graßl (@ffelixg) and has been merged into the main mssql-python project. It is available starting in version [insert version if known].
Background: Why Apache Arrow Matters for Database Drivers
Apache Arrow is an open-source columnar in-memory format that defines a stable shared-memory layout called the Arrow C Data Interface. This cross-language ABI (Application Binary Interface) allows any two programs – even written in different languages – to exchange data via a pointer with zero serialization, zero copying, and zero re-parsing.
Previously, fetching one million rows from SQL Server meant creating one million Python objects in memory, each with its own allocation and eventual garbage collection. The DataFrame library then had to convert those objects into its internal columnar format, causing further overhead. With Arrow, the database driver allocates typed buffers for each column and writes values directly into them – no Python objects, no GC pressure.
“Arrow’s zero-copy design means that a C++ driver and a Python DataFrame library can operate on the exact same memory without either one knowing about the other,” explained Graßl. “This isn’t just about speed – it’s about enabling truly seamless interoperability across the data stack.”
Key Terms
- API (Application Programming Interface): A source-code contract that defines how to call a function or library.
- ABI (Application Binary Interface): A binary-level contract that specifies how compiled code is laid out in memory. Two programs built in different languages can share an ABI and exchange data directly – no serialization needed.
- Arrow C Data Interface: Apache Arrow’s ABI specification – the standard that makes zero-copy data exchange between languages possible.
What This Means for Users
For anyone using mssql-python with Polars, Pandas (via ArrowDtype), DuckDB, or other Arrow-native tools, this update delivers four concrete benefits:
- Speed: The columnar fetch path avoids Python object creation per row, which should make fetching noticeably faster for many SQL Server types – especially temporal types like
DATETIMEandDATETIMEOFFSET, where Python-side per-value conversions are eliminated entirely. - Lower memory usage: A column of one million integers becomes a single contiguous C array, not a million individual Python objects. This reduces memory footprint and GC pressure significantly.
- Seamless interoperability: Polars, Pandas, DuckDB, and Hugging Face datasets can consume Arrow data directly. A Polars pipeline reading from mssql-python never needs to materialize intermediate Python objects at any stage.
- Future-proofing: As more tools adopt Arrow as a universal interchange format, mssql-python users will naturally integrate with the broader data ecosystem without custom shims.
“The performance gains are most dramatic for large result sets with many rows and complex types,” Sarabhai noted. “We expect this to become the default fetch method for high-throughput data pipelines connecting SQL Server to Python analytics.”

To enable Arrow support, users simply need to update their mssql-python installation and use the appropriate cursor or connection parameters. Detailed documentation is available in the official mssql-python repository.
Impact on the Data Engineering Landscape
This update positions mssql-python as a first-class citizen in the Arrow ecosystem, alongside drivers for PostgreSQL, Snowflake, and others that already support Arrow-based fetches. It lowers the friction for organizations that rely on SQL Server as their primary database but want to leverage modern Python-native analytics tools.
“We’re seeing a clear trend: database drivers that adopt Arrow are becoming the go-to choice for data scientists and engineers,” said Graßl. “mssql-python’s Arrow support closes a critical gap and makes SQL Server a viable backend for Arrow-native workflows.”
The community is encouraged to test the feature and report any issues via GitHub. Future development may include support for additional Arrow data types and optional zero-copy optimizations.
Related Articles
- Navigating High Uncertainty: A Step-by-Step Guide to Scenario Modelling for Local Elections
- Mastering Single-Cell RNA-Seq Analysis with Scanpy: A Step-by-Step Guide to Clustering, Annotation, and Trajectory Inference
- Navigating the Unknown: 10 Key Insights from Scenario Modelling for English Local Elections
- Polars vs Pandas: How Rewriting a Data Workflow Cut Time from 61 Seconds to 0.2 Seconds
- Python Deque Revolutionizes Real-Time Data Processing: Experts Warn Against List Shifting
- Choosing the Right Regularizer: A Data-Driven Framework from 134,400 Simulations
- Python Developers Urged to Switch to Deque for Real-Time Data Streaming
- ConferencePulse: Building a Live AI-Powered Conference Assistant with .NET's Composable AI Stack