Work-In-Progress

This page is under construction. It doesn't have formatting, probably has misspellings and might very will be incomplete. I'm posting it in it's current state in the spirit of the digital garden and learning in public.




Atomically (not automatically) Moving Files To A Network Drive With Python





TL;DR

This is a modified version of Alex Chan’s Python method for moving files atomically across file systems and network drives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import errno
import os
import shutil
import uuid

from pathlib import Path

def safe_move_file(source_path, destination_path):
    try:
        os.rename(source_path, destination_path)
    except OSError as err:
        if err.errno == errno.EXDEV:
            if Path(destination_path).exists():
                raise Exception(f"File already exists at: {destination_path}")
            else:
                tmp_id = uuid.uuid4()
                tmp_destination_path = "%s.%s.tmp" % (destination_path, tmp_id)
                shutil.copy2(source_path, tmp_destination_path)
                os.rename(tmp_destination_path, destination_path)
                os.unlink(source_path)
        else:
            raise

It uses the virtually instant os.rename(source_path, destination_path) where possible with a secondary process to deal with files crossing file systems.

The two changes I made were:

1] I converted it from shutil.copyfile(source_path, destination_path) to shutil.copy2(source_path, destination_path) since the later preserves more metadata (e.g. time stamps).

2] I added a check to see if the destination file exists on the external file system before making the move. This prevents the tmp file from being written if it wouldn’t be able to be moved into place. The script will error out faster as a result too.


Details

I’m renaming a bunch of files and moving them to my NAS. If I wasn’t renaming them, I’d use rsync and be done with it . Instead, I’m using a Python script to handle the migration.

I use atomic actions when moving files. They either end up at the destination, or they don’t. There’s no in-between state where a file is only partially written.

Atomic moves are easy with Python’s built-in os.rename(source_path, destination_path) function. It’s atomic out of the box. Unfortunately, it doesn’t work if you’re moving across file systems or network drives. Python’s shutil.move() works across systems, but it’s not atomic. In fact, there’s no single built-in command that offers atomic moves across file systems. You have to make your own method.

The way to pull off an atomic move when .rename() isn’t an option is a three step process:

1] Copy the original file to a temporary location on the same file system as the final destination (this is a non-atomic operation).

2] Use the atomic os.rename() function to move the file to it’s final destination. (Just to highlight it, the reason the .rename() method works atomically now when it didn’t before is because the temporary file is on the same file system as the destination path whereas the original copy was not.)

3] Delete the original file.

The way I’ve always done this:

1
2
3
4
5
6
7
8
import os
import shutil

def simple_move_file(source_path, destination_path):
    tmp_destination_path = f'{destination_path}.tmp'
    shutil.copy2(source_path, tmp_destination_path)
    os.rename(tmp_destination_path, destination_path)
    os.unlink(source_path)

It had been a while since I looked to see if there was a better approach so I spent some time searching. That’s when I discovered Alex’s page. They did something wonderfully clever that never occurred to me. Use os.rename() when you can and then fall back to the tmp copy approach when you can’t. My code works fine in all circumstances, but their’s will run loads faster when operating on the local drive.

The other thing they added was the UUID naming for tmp files to avoid collisions. That’s not an issue for my processes, but I dig the approach.

So, this is the new file mover snippet in my grimoire.