AWS S3 Sync Exclude is a powerful tool that allows you to synchronize files between your local machine and an S3 bucket, while excluding specific files or directories.
You can exclude files or directories using the `--exclude` option, which can be specified multiple times. This option takes a pattern as an argument, and you can use wildcards to match multiple files or directories.
For example, if you want to exclude all files with a `.tmp` extension, you can use the pattern `*.tmp`. If you want to exclude a specific directory named `ignore`, you can use the pattern `ignore/`.
The `--exclude` option can be used in conjunction with the `--include` option to exclude specific files or directories while including others.
What is AWS S3 Sync
AWS S3 Sync is a command-line tool that synchronizes files and directories between a local machine and an Amazon S3 bucket.
It's designed to help you manage your S3 data by copying files from your local machine to S3, or vice versa.
AWS S3 Sync can also be used to synchronize files across multiple S3 buckets.
The tool supports bidirectional synchronization, allowing you to update files in both your local machine and S3 bucket.
File Inclusions & Exclusions
File inclusions and exclusions are a crucial part of the AWS S3 sync process. You can use the --exclude and --include flags to determine which files will be considered as part of the sync operation.
These flags support UNIX-style wildcards, which can be used to match multiple files at once. The flags can be repeated multiple times in a single sync command.
The order of the flags matters, as later flags override previous ones. This means that if you include a pattern and then exclude it, the exclusion will take priority.
You can use both --exclude and --include options together and multiple times. By default, all files are included, so you'll need to specify which files to exclude.
If you want to exclude all files at first and then add specific files back to the sync list, you can use the --exclude flag followed by the --include flag.
For example, to exclude all files and then add all PNG and JPG files back to the sync list, you would use the following command:
- Exclude all files
- Include PNG and JPG files
Be aware that if you include a pattern and then exclude it, the exclusion will take priority.
Advanced Options
AWS S3 sync has some hidden gems beyond the basics. You can customize the synchronization process to your specific needs with advanced options.
The AWS Command Line Interface (CLI) offers several advanced options for synchronizing files to and from Amazon S3. These options can be used to fine-tune your sync process.
Some common advanced sync options for AWS CLI include deleting files from the destination that don't exist in the source, excluding files or patterns from being synced, and including files or patterns for syncing.
Advanced S3 sync options are also available for s3cmd and rclone. These options can be used to customize the synchronization process for your specific needs.
Here are some commonly used advanced options for rclone:
- --delete: Deletes files from the destination that don't exist in the source.
- --exclude: Exclude files or patterns from being synced.
- --include: Include files or patterns for syncing.
- --dry-run: Simulate the sync operation without making any changes.
- --quiet: Suppress output and only display errors.
Using AWS S3 Sync
AWS S3 Sync is a powerful tool that allows you to synchronize data between your local machine and an S3 bucket.
You can use AWS S3 Sync to synchronize data in both directions, meaning you can upload files from your local machine to S3 and also download files from S3 to your local machine.
AWS S3 Sync uses a simple and intuitive command-line interface that makes it easy to use, even for those without extensive experience with AWS services.
Using a Dry Run
Using a dry run is a great way to test your S3 sync options without actually transferring any files. You can perform a dry run to see what changes would be made by a sync operation.
The regular S3 sync command output will be shown in your terminal, allowing you to check your options before anything is transferred. This is a big plus, as it saves you time and reduces the risk of errors.
By doing a dry run, you can review the changes that would be made, making it easier to catch any mistakes or unexpected results. You can then adjust your options as needed before running the sync operation.
Lambda
Using AWS Lambda with S3 Sync can be a bit tricky. The Lambda runtime includes additional libraries like the AWS SDK, but the s3 sync command is not available by default.
You'll need to write custom code to achieve the same result, which can be time-consuming and prone to errors. This is because the sync command from the CLI includes features like include and exclude options.
The AWS SDK and Boto3 libraries are included in the Lambda runtime, but they don't offer the same level of functionality as the CLI. This means you'll have to get creative with your code to get the desired outcome.
Reinventing the wheel is not recommended in software development, as it can lead to unnecessary complexity and bugs.
CLI and Installation
To use the AWS CLI with S3, you'll need to install the AWS CLI on your local machine. The installation process is straightforward and can be completed in a few minutes.
You can install the AWS CLI using pip, the Python package manager, by running the command `pip install awscli` in your terminal.
Once installed, you can verify the installation by running the command `aws --version`, which will display the version of the AWS CLI installed on your machine.
CLI and Installation
To get started with AWS CLI, you must have it installed on your system. You can download it from the official AWS website, and the installation process is straightforward.
The AWS CLI is available for multiple operating systems, including Windows, macOS, and Linux. The installation package includes a command-line interface that allows you to interact with AWS services.
Before you can use the AWS CLI, you need to configure it with your AWS credentials. This involves setting up your AWS access key and secret access key. You can do this by running the command `aws configure` in your terminal or command prompt.
Once you have installed and configured the AWS CLI, you can start using its various commands to manage your AWS resources. The `aws s3 sync` command, for example, synchronizes directories to and from S3 by recursively copying files and subdirectories.
Here's a comparison table to help you decide which AWS CLI command to use:
The `aws s3 sync` command is particularly useful when you need to keep a large directory synchronized with its S3 counterpart. It's also worth noting that the `aws s3 sync` command ignores empty directories, so you don't need to worry about copying unnecessary files.
Installing CLI
Installing the AWS CLI is a straightforward process. You can install the latest version by following the documentation for your operating system.
If you're running macOS, you can install the CLI via homebrew with a single command: brew install awscli.
The AWS CLI is a unified tool for managing your AWS resources programmatically, allowing you to control any service directly from your terminal and automate steps through scripts.
However, it's worth noting that the AWS CLI is not an alternative to an Infrastructure-as-Code tool, so use it for small tasks only.
CLI Key Details
The AWS CLI provides sync functionality that's identical to s3cmd.
One of the key benefits of using the AWS CLI is that it can set ACLs, encryption, and other options directly from the command line.
The AWS CLI uses the AWS SDK, which means you can leverage other AWS services if needed.
Credentials for the AWS CLI are managed via AWS config files.
Sync works the same in both directions, making it easy to transfer files between your local machine and AWS.
Delete and Storage
The delete flag is a powerful option when syncing files with AWS S3. You can use it to mirror the state of a directory by deleting files that don't exist at your source.
Be careful with this command, as it can easily wipe files from your buckets, making them unrecoverable. Always preview the sync command's actions by doing a dry-run before executing it.
The Delete Flag
The Delete Flag is a powerful option that allows you to mirror the state of a directory by deleting files that don't exist at your source.
You can achieve this by adding the option --delete to the sync command. This will delete files that exist at the target destination but not at your source.
Be careful with this command, as you can easily wipe files from your buckets, which then won't be recoverable. Always preview the sync command's actions by doing a dry-run to avoid accidental deletions.
Storage Classes
Storage classes are a crucial aspect of managing your data, and AWS offers several options to suit different needs. You can choose from STANDARD, REDUCED_REDUNDANCY, STANDARD_IA, ONEZONE_IA, INTELLIGENT_TIERING, GLACIER, DEEP_ARCHIVE, and GLACIER_IR.
By default, the S3 sync command will always use the standard class. This means you can save money by choosing a more cost-effective option if your data doesn't require high resiliency.
You can directly select the target storage class for the files to copy by using the --storage-class option. This is particularly useful when dealing with large storage requirements.
Here's a breakdown of the available storage classes:
AWS provides extensive documentation about all storage classes, so be sure to explore your options to save money for large storage requirements.
Sources
- https://spacelift.io/blog/aws-s3-sync
- https://serverfault.com/questions/1069852/how-to-actually-exclude-a-directory-in-aws-s3-sync
- https://blog.ippon.tech/sync-two-s3-buckets-using-cdk-and-a-lamdba-layer-containing-the-aws-cli
- https://blog.awsfundamentals.com/aws-s3-sync
- https://simplebackups.com/blog/mastering-s3-sync-s3cmd-rclone-ultimate-guide/
Featured Images: pexels.com