Connecting to Google Cloud Storage
Privacy Dynamics can connect to your data lake hosted in Google Cloud Storage (GCS). This guide helps you authenticate and authorize Privacy Dynamics to access your data in GCS.
Requirements
To complete this guide, you will need the following:
- Two GCS buckets (one to read data from, the other to write data to).
- An IAM user with Administrator privileges to create service accounts and assign them storage roles at the Project level.
- A Privacy Dynamics account.
Instructions
Privacy Dynamics connects to GCS using its S3-compatible XML API.
Before you can connect to GCS in Privacy Dynamics, you will want to create a new IAM Service Account for Privacy Dynamics to use.
Configure IAM
- Create a new service account in your relevant Google Cloud Project for Privacy Dynamics.
- Grant the service account the Storage Admin role for the Project, by selecting "Edit Principle" on the IAM page for your Project. This Project-level access is required for our service to list the buckets in your Google Cloud Project.
- Create an HMAC key for your newly-created service account. Privacy Dynamics will use this HMAC key to authenticate with GCS. Save the Access Key and Secret Access Key in a password vault; you will need these later, and any time you would like to edit the Privacy Dynamics connection.
Add the GCS Connection in Privacy Dynamics
Sign in to your Privacy Dynamics account.
Go to the Connections page.
Select Add Connection.
Choose S3 and select Next. Note: since we use the XML API, the setup modal is shared with Amazon S3.
Enter the connection details:
- Name - a name for you to identify the connection.
- Is S3 compatible? - DO check this box.
- AWS Key ID - The Access Key associated with the IAM Service Account you created above.
- AWS Secret Access Key - The Secret Access Key for your IAM Service Account.
- Region Name - The name of the Google Cloud region where your bucket is located (e.g.,
us-west1
). - Endpoint URL - Enter the value
https://storage.googleapis.com
.
Select TEST CONNECTION to verify the credentials.
Select ADD CONNECTION and your connection saves if there are no errors.
Create a Project Using Google Cloud Storage
- Select the Anonymize button on the top nav bar.
- On the "Choose Data" screen, select the new GCS connection as the Origin Connection, then select the Origin Bucket and Destination Bucket.
- Optionally, enter a Bucket Prefix to filter the list of objects below. If the prefix is a folder, enter the trailing slash.
- To use the same IAM Service Account to write the treated data to GCS, select Destination Connection: Same as Origin. Select the bucket and optionally a prefix (or folder, with a trailing slash) to prepend to the object name.
- Select the files from the list of datasets, and then click Next to configure your dataset.
Other Configuration
If you have network access controls in place that limit connections to Google Cloud Storage, you will need to add Privacy Dynamics' IP addresses to your Allowlist. You can find those IP addresses in this public JSON file.