Beyond Pandas: How to stream TBs of Data between MySQL and SQL Server

The most common mistake I see in Python data engineering is the addiction to fetchall().

It works fine for 10,000 rows. But when you need to migrate 50 million records from MySQL to SQL Server, loading everything into a list—or even a Pandas DataFrame—is a guaranteed way to crash your server with an Out of Memory (OOM) error.

The solution isn't a bigger server. It's Python Generators.

The "RAM Explosion" Anti-Pattern

Here is the code that kills servers:

# DON'T DO THIS
cursor.execute("SELECT * FROM massive_table")
data = cursor.fetchall()  # <--- CRASH!
# Python tries to create a list containing 50M tuples in RAM.

The Solution: The `yield` Keyword

Instead of holding the data, we create a function that streams it. The `yield` keyword turns a standard function into a generator, which pauses execution and "yields" a chunk of data only when the consumer asks for it.

The "Server-Side Cursor" Pattern

To make this work, you must tell the database not to send everything at once. In pymysql or psycopg2, this is often done with server-side cursors (SSCursor).

import pymysql

def stream_from_mysql(query, batch_size=1000):
    conn = pymysql.connect(...)
    # SSCursor keeps the data on the server, not in Python's RAM
    with conn.cursor(pymysql.cursors.SSCursor) as cursor:
        cursor.execute(query)
        while True:
            # Fetch only 1,000 rows at a time
            chunk = cursor.fetchmany(batch_size)
            if not chunk:
                break
            # Hand this chunk to the caller, then PAUSE here
            yield chunk 
    conn.close()

Connecting the Pipes

Now we can loop over this generator. The beauty is that whether the table has 1,000 rows or 10 billion rows, your Python script will never use more than a few MBs of RAM (the size of one batch).

# The Generator Pipeline
data_stream = stream_from_mysql("SELECT * FROM orders", batch_size=5000)

for batch in data_stream:
    # Process 5,000 rows
    transformed_batch = transform_currency(batch)
    
    # Write to SQL Server
    sql_server_cursor.executemany(insert_query, transformed_batch)
    sql_server_conn.commit()
    
    print(f"Migrated batch...") 
    # The previous batch is now garbage collected from RAM automatically

Why This Matters

This pattern decouples Data Volume from System Resources.

Cost: You can migrate Terabytes of data using a tiny $5/month VPS.
Stability: No random crashes when data grows unexpectedly.
Speed: Network IO overlaps with processing (especially if you use threading for the writer).