Pwalk utility

From OSNEXUS Online Documentation Site
Jump to: navigation, search

It was originally written by John Dey to work as a parallelized version of the 'du -a' unix utility which would be suitable for scanning filesystems with 100s of millions of files. It was then reworked at OSNEXUS to support backups, sliding window backups, additional output formats, etc. If you type 'pwalk' by itself at the QuantaStor ssh or console window you'll see the following usage page / documentation. The pwalk utility has three modes, 'walk' which does a parallelized crawl of a directory, 'copy' which does a backup from a SOURCEDIR to a specified --targetdir, and 'purge' mode which removes files in the PURGEDIR which are not found in the --comparedir. In general you would never need to use pwalk directly but the documentation is provided here to support special use cases like custom backup or replication cron jobs.

pwalk version 3.1 Oct 22nd 2013 - John F Dey john@fuzzdog.com, OSNEXUS, eng@osnexus.com

Usage :
pwalk --help --version
          Common Args :
             --dryrun : use this to test commands
                        without making any changes to the system
       --maxthreads=N : indicates the number of threads (default=32)
           --nototals : disables printing of totals after the scan
               --dots : prints a dot and total every 1000 files scanned.
              --quiet : no chatter, speeds up the scan.
             --nosnap : Ignore directories with name .snapshot
              --debug : Verbose debug spam
        Output Format : CSV
               Fields : DateStamp,"inode","filename","fileExtension","UID",
                        "GID","st_size","st_blocks","st_mode","atime",
                        "mtime","ctime","File Count","Directory Size"

Walk Usage :
pwalk SOURCEDIR
         Command Args :
            SOURCEDIR : Fully qualified path to the directory to walk

Copy/Backup Usage :
pwalk --targetdir=TARGETDIR SOURCEDIR
pwalk --retain=30 --targetdir=TARGETDIR SOURCEDIR
         Command Args :
          --targetdir : copy files to specified TARGETDIR
              --atime : copy if access time change (default=no atime)
  --backuplog=LOGFILE : log all files that were copied.
  --status=STATUSFILE : write periodic status updates to specified file
             --retain : copy if file ctime or mtime within retention period
                        specified in days. eg: --retain=60
            --nomtime : ignore mtime (default=use mtime)
            SOURCEDIR : Fully qualified path to the directory to walk

Delete/Purge Usage :
pwalk --purge [--force] --comparedir=COMPAREDIR PURGEDIR
pwalk --purge [--force] --retain=N PURGEDIR
         Command Args :
         --comparedir : compare against this dir but dont touch any files
                        in it. comparedir is usually the SOURCEDIR from
                        a prior copy/sync stage.
              --purge : !!WARNING!! this deletes files older than the
                        retain period -OR- if retain is not specified
                        --comparedir is required. The comparedir is
                        compared against the specified dir and any files
                        not found in the comparedir are purged.
              --force : !NOTE! default is a *dry-run* for purge, you must
                        specify --force option to actually purge files
              --atime : keep if access time within retain period
             --retain : keep if file ctime or mtime within retention period
                        specified in days. eg: --retain=60

OSNEXUS modified version of the C source code for pwalk is available here pwalk.c. The original version is available here.