AWS S3

What is S3?

S3 provides developers and IT teams with secure, durable,highly-scalable object storage. Amazon S3 is easy to use, with a simple web services interface to store and retrieve any amount of data from anywhere on the web.

Basics of S3
  1. S3 is Object-based - i.e. allows you to upload files.
  2. Files can be from 0 Bytes to 5TB
  3. There is unlimited storage
  4. Files are stored in Buckets
  5. S3 is a universal namespace. That is, names must be unique globally
  6. Format of the URL - https://s3-ap-south-1.amazonaws.com/samplebucket
  7. Not suitable to install an Operating System on as this is just object-based! duh!
  8. When you upload a file to S3, you will receive an HTTP 200 code if the upload was successful
S3 - Objects

S3 is Object-based. Think of Objects just as files. Objects consist of the following:
  • Key (This is simply the name of the object)
  • Value (This is simply the data and is made up of a sequence of bytes)
  • Version ID (Important for versioning)
  • Metadata (Data about data you are storing)
  • Subresources: Access Control Lists, Torrent
Data consistency model for S3

How does data consistency work for S3?
  1. Read after Write consistency for PUTS of new Objects
  2. Eventual consistency for overwrite PUTS and DELETES (can take some time to propagate)
S3 Guarantees

S3 has the following Guarantees for Amazon:
  1. Built for 99.99% availability for the S3 platform
  2. Amazon guarantee 99.99% availability
  3. Amazon guarantees 99.99999999999% durability for S3 information.
S3 features

S3 has the following features:
  1. Tiered storage available
  2. Lifecycle Management
  3. Versioning
  4. Encryption
  5. can turn on MFA Delete
  6. Secure your data and access to buckets using Access Control Lists and Bucket Policies
S3 Storage Classes

1.S3 Standard

99.99% availability, 99.99999999999% durability, stored redundantly across multiple devices in multiple facilities, and is designed to sustain the loss of 2 facilities concurrently

2.S3 – IA

(Infrequently Accessed): For data that is accessed less frequently, but requires rapid access when needed. Lower fees than S3, but you are charged a retrieval fee.

3.S3 One Zone – IA

For where you want a lower-cost option for infrequently accessed data, but do not require the multiple Availability Zone data resilience

4.S3 - Intelligent Tiering

Designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead

5.S3 Glacier

S3 Glacier is a secure, durable and low-cost storage class for data archiving. You can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. Retrieval times configurable from minutes to hours

6.S3 Glacier Deep Archive

S3 Glacier Deep Archive is Amazon S3’s lowest-cost storage class where a retrieval time of 12 hours is acceptable


S3 Comparision

S3 - Charges

You are charged for S3 in the following ways:
  1. Storage
  2. Request
  3. Storage Management Pricing
  4. Data Transfer Pricing
  5. Transfer Acceleration - This enables fast, easy and secure transfers of files over a long distance between your end-users and an S3 bucket. Transfer acceleration takes advantage of Amazon CloudFront's globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path
  6. Cross-Region replication pricing - Objects stored in a location can be replicated in another location immediately if this is enabled
S3 - Encryption

By default, all newly created buckets are PRIVATE. You can set up access control to your buckets using;
  • Bucket Policies
  • Access Control Lists
Bucket Policies are applied to the Buckets individually and Access Control Lists are applied to individual objects present in them.

S3 Buckets can be configured to create access logs that log all the requests made to the S3 bucket. This can be sent to another bucket and even another bucket in another account.

There are two different types of Encryptions:
  1. Encryption in transit
  2. Encryption at rest
Encryption in transit is achieved by 
  • SSL/TLS - whenever we use https, this is being used
Encryption At Rest is achieved in two ways:
  1. Server-side encryption
  2. Client-side encryption
Server-side encryption further has three different ways to achieve it:
  • S3 Managed Keys - SSE - S3 (Amazon manages the keys for us)
  • AWS Key Management Service, Managed Keys - SSE - KMS (Managed by AWS and Client)
  • Server-Side Encryption with customer-provided keys - SSE - C (Managed by clients and provided to AWS)
Client-side encryption is where encryption is done completely in the client-end.

S3 Versioning

Using Versioning with S3;
  • Stores all versions of an object (including all writes and even if you delete an object)
  • Great backup tool
  • Once enabled, Versioning cannot be disabled, only suspended
  • Integrates with Lifecycle rules
  • Versioning's MFA Delete capability, which uses multi-factor authentication, can be used to provide an additional layer of security.
S3 Life cycle management

This can be achieved for every single Bucket. We can do both the transitioning of buckets from one object to different objects and also set expirations for the same.
  • Automates moving your objects between the different storage tiers.
  • It can be used in conjunction with versioning.
  • It can be applied to current versions and previous versions.
S3 Cross-region replication
  • Versioning must be enabled on both the source and destination buckets
  • Regions must be unique. AWS does not allow to select buckets from the same region to have the cross-region replication enabled
  • Files in an existing bucket are not replicate automatically
  • All subsequently updated files will be replicated automatically
  • Delete markers are not replicated
  • Deleting individual versions or delete markers will not be replicated.
S3 Transfer Acceleration

S3 Transfer Acceleration utilizes the Cloudfront edge network to accelerate your uploads to S3. Instead of uploading directly to your S3 bucket, you can use a distinct URL to upload directly to an edge location which will then transfer that file to S3. You will get a distinct URL to upload to:

example: shakthi.s3-accelerate.amazonaws.com




Comments

Popular posts from this blog

SAP Datasphere Data Integration - Part 1 - Introduction and Integration using Remote Tables

SAP Datasphere - Data Integration - Part 2 - Data Integration based on Data Flows and External Sources

SAP Datasphere - Data Builder types and creation of Tables