How To Specify Server Side Encryption For S3 Put In Pyspark?
Solution 1:
The way I understood after going through following Hadoop JIRAs - HADOOP-10675, HADOOP-10400, HADOOP-10568
Since fs/s3 is part of Hadoop following needs to be added into spark-default.conf if all S3 bucket puts in your estate is protected by SSE
spark.hadoop.fs.s3a.server-side-encryption-algorithm AES256
And after adding this I was able to write successfully to S3 bucket protected by SSE (Server side encryption).
Solution 2:
Hope you already setup configuration with access-keys, secret-keys, enableServerSideEncryption and algorithm to be used for the encryption.
val hadoopConf = sc.hadoopConfiguration;
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", "xxx")
hadoopConf.set("fs.s3.awsSecretAccessKey", "xxx")
hadoopConf.set("fs.s3.enableServerSideEncryption", "true")
hadoopConf.set("fs.s3.serverSideEncryptionAlgorithm","AES256")
Enforces server side encryption
--emrfs Encryption=ServerSide,Args=[fs.s3.serverSideEncryptionAlgorithm=AES256].
Command :
./bin/spark-submit --verbose —jars lib/app.jar \
--master spark://master-amazonaws.com:7077 \
--class com.elsevier.spark.SparkSync \
--conf "spark.executor.extraJavaOptions=-Ds3service.server-side-encryption=AES256"
http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption Keys (SSE-S3)
Server-side encryption is about protecting data at rest. Server-side encryption with Amazon S3-managed encryption keys (SSE-S3) employs strong multi-factor encryption. Amazon S3 encrypts each object with a unique key. As an additional safeguard, it encrypts the key itself with a master key that it regularly rotates. Amazon S3 server-side encryption uses one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.
Amazon S3 supports bucket policies that you can use if you require server-side encryption for all objects that are stored in your bucket. For example, the following bucket policy denies upload object (s3:PutObject) permission to everyone if the request does not include the x-amz-server-side-encryption header requesting server-side encryption.
{"Version":"2012-10-17","Id":"PutObjPolicy","Statement":[{"Sid":"DenyIncorrectEncryptionHeader","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::YourBucket/*","Condition":{"StringNotEquals":{"s3:x-amz-server-side-encryption":"AES256"}}},{"Sid":"DenyUnEncryptedObjectUploads","Effect":"Deny","Principal":"*","Action":"s3:PutObject","Resource":"arn:aws:s3:::YourBucket/*","Condition":{"Null":{"s3:x-amz-server-side-encryption":"true"}}}]}
Post a Comment for "How To Specify Server Side Encryption For S3 Put In Pyspark?"