Skip to content Skip to sidebar Skip to footer

How To Specify Server Side Encryption For S3 Put In Pyspark?

Thanks to stackoverflow, i managed to copy hadoop-aws-2.7.3.jar and aws-java-sdk-1.7.4.jar from maven repo into $SPARK_HOME/jars/ to get s3a:// going for reading from S3 buckets us

Solution 1:

The way I understood after going through following Hadoop JIRAs - HADOOP-10675, HADOOP-10400, HADOOP-10568

Since fs/s3 is part of Hadoop following needs to be added into spark-default.conf if all S3 bucket puts in your estate is protected by SSE

spark.hadoop.fs.s3a.server-side-encryption-algorithm AES256

And after adding this I was able to write successfully to S3 bucket protected by SSE (Server side encryption).

Solution 2:

Hope you already setup configuration with access-keys, secret-keys, enableServerSideEncryption and algorithm to be used for the encryption.

val hadoopConf = sc.hadoopConfiguration;
    hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
    hadoopConf.set("fs.s3.awsAccessKeyId", "xxx")
    hadoopConf.set("fs.s3.awsSecretAccessKey", "xxx")
    hadoopConf.set("fs.s3.enableServerSideEncryption", "true")
    hadoopConf.set("fs.s3.serverSideEncryptionAlgorithm","AES256")

Enforces server side encryption

--emrfs Encryption=ServerSide,Args=[fs.s3.serverSideEncryptionAlgorithm=AES256].

Command :

./bin/spark-submit --verbose —jars lib/app.jar \

--master spark://master-amazonaws.com:7077  \
--class com.elsevier.spark.SparkSync \
--conf "spark.executor.extraJavaOptions=-Ds3service.server-side-encryption=AES256"

http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html

Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3)

Server-side encryption is about protecting data at rest. Server-side encryption with Amazon S3-managed encryption keys (SSE-S3) employs strong multi-factor encryption. Amazon S3 encrypts each object with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.

Amazon S3 supports bucket policies that you can use if you require server-side encryption for all objects that are stored in your bucket. For example, the following bucket policy denies upload object (s3:PutObject) permission to everyone if the request does not include the x-amz-server-side-encryption header requesting server-side encryption.

{"Version":"2012-10-17","Id":"PutObjPolicy","Statement":[{"Sid":"DenyIncorrectEncryptionHeader","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::YourBucket/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"AES256"}}},{"Sid":"DenyUnEncryptedObjectUploads","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::YourBucket/*","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}}]}

Post a Comment for "How To Specify Server Side Encryption For S3 Put In Pyspark?"