2023 Updates
December 22
Support for PostgreSQL hstore
type
The PostgreSQL hstore
data type is a specialized feature designed for storing sets of key/value pairs within a single column. We added support for hstore
columns which means all key/values pairs will be anonymzed during treatment.
December 5
Improved categorical PII classifier
PII data is often categorical in nature (ex., religion, nationality) and we added added a cardinality-based heuristic to dramatically improve the PII classification during treatment.
November 7
Register webhooks from the UI
Webhooks are now a first class citizen within the application. Previously, webhooks had to be registered by calling the REST API but we added a page in the settings section that makes it easy to register webhook URLs from the UI.
October 12
Easily navigate to job run details from the dataset card
It's now easier to navigate to the job runs page from the project page to view logs and runtimes across groups of jobs. A link was added to datasset cards that points to the corresponding job run that created the dataset. This makes it possible to compare job run times for other datasets in the same or different projects.
Support writing PostgreSQL ARRAY types
PostgreSQL can be used to store a number of unique data types, among them columns variable-length multidimensional arrays. This release adds support for anonymizing string and numeric ARRAY columns.
Improved MySQL writing performance
Performance for writing large, anonymized tables back to MySQL was significantly sped up. This improvement will speed up the overall job run times for large tables in MySQL.
September 27
View detailed logs for job runs
From an individual job run you can view the all of the individual tasks that were executed. The task log is viewable from the slide-out on the right side of the screen on the job runs page.
September 8
Read From SFTP and Write to S3 or GCS
A common customer use case is to anonymize data in multiple points during a data pipeline. This might mean removing DIDs early in the pipeline and treating QIDs much later in the process. With this release, it's now possible to use an SFTP source to anonymize and write treated data to blog storage (AWS S3 and GCP GCS).
Add support for identity columns for Postgres & MySQL
With this release, identity columns (auto-increment) in PostgreSQL and MySQL are preserved in the anonymized output. For MySQL, AUTO_INCREMENT
is set in an ALTER TABLE
statement during the write_indices_and_constraints
step.
July 31
Add Run Project & Re-Run Dataset to Project Page
Improved connection tests for SFTP
The "Test connection" button on the new data connection screen was improved to more thoroughly and quickly verify the connection credentials for the SFTP server.
July 21
Disable Anonymize radio button for Primary Keys
In the dataset configuration settings, the option to anonymize is disabled for primary and foreign key columns. Disabling the anonymization option ensures that column values are not adjusted by mistake and table references are preserved during data treatment.
July 12
Add tabs to filter by treat & passthrough
When anonymizing databases with lots of tables, it's eas to lose track of the treatment plans for each table. This release adds filtered pages to the
June 14
Referential integrity for PostgreSQL, MySQL, and Snowflake
During the anonymization process, the application will duplicate primary and foreign key constraints from source tables to destination tables, maintaining data integrity and ensuring the consistency and accuracy of the anonymized data. This is a new feature and PostgreSQL, MySQL, and Snowflake are suppported.
Support for Google Cloud Storage (GCS)
Data connections to S3 compatible APIs, including GCS, are now supported. S3 compatible APIs can be added as data connections and used as source and destination connections to anonymize data. Initial functionality must be enabled via the REST API and UI support will be available in the next release.
June 2
SSH support for data connections
Support for SSH tunneling is added in this release, allowing SaaS customers to securely connect to their data stores. SSH connection details, like host and private key, can be added to each data connection's properties.
Copy source indices and constraints to destination tables
Particularly in development and testing scenarios, it can be critical that anonymized database copies exactly match production database indices and constraints. This release is a first step in copying over all table metadata. The new functionality copies metadata one table at a time, so information at the table level is brought over. Relationship information, specifically foreign key definitions, is not supported at this time.
- Copy indices and constraints from origin tables and apply to destination tables during the treatment process.
- Column not null constraints cannot be carried over at this time.
May 10
Assessment-only projects
Sometimes you just need to scan some databases or buckets to determine if they contain sensitive data without actually treating the data. This is a fast way to inventory risk across an entire company or organization. We have added a new project option to only assess risk, present in the project wizard. Enabling the assessment-only mode means that destination locations don't need to be provided and assessments can be created quickly.
February 21
Preserve values of primary and foreign key columns
The project wizard will automatically detect columns that contain primary or foreign keys and preserve their values by default, ensuring that data remains the same in source and destination tables and table relationships are preserved. Locking column values is only a suggestion by the application and can be overridden within the UI.