The first stage of building your solution with Skyflow is defining your solution. During this stage, you identify your usage patterns and define the schema for your vault based on how Skyflow fits into your existing systems/architecture.
The first step to implementing your Skyflow solution is understanding your usage patterns and architecture.
Consider the following resources based on your needs and usage patterns:
Protect personally identifiable information (PII) such as names, addresses, and social security numbers.
Protect payment card information (PCI) such as credit card numbers and CVV codes.
Protect sensitive data in large language models (LLMs).
Protect sensitive data in analytics and machine learning.
Protect sensitive data in different geographic locations.
Once you’ve defined your use cases and the flow of your data, define your scope and integration timeline.
Next, access your Sandbox environment and start exploring Skyflow Studio.
After receiving a welcome email from Skyflow, confirm the setup details for your Sandbox environment. Once completed, you can access your Sandbox environment, where you can develop and test a production-ready integration with Skyflow.
Log in to Skyflow Studio, the central management tool for your Skyflow account, to configure and manage all aspects of your Skyflow deployment, including users and permissions.
Your vault schema specifies the tables and columns you can use to store data, and defines how you can integrate data from your existing systems.
Defining your schema requires identifying your present and future data requirements, creating a vault, and configuring how to store your data:
For each type of sensitive data you are storing, consider any compliance requirements like data retention durations and deletion requests.
Map the flow of data through your system:
Review your data flow and determine how Skyflow integrates into your architecture:
Create a Vault via Studio or the Management API. If you haven’t defined your necessary tables and columns yet, you can start with a template and modify the schema later.
You can’t modify some column settings after you add data to your vault.
Define the columns for your vault schema based on your data flow and existing architecture. Skyflow provides two ways to add fields when creating columns in a vault:
Skyflow columns support multiple configuration options that define how data is stored, protected, and accessed. These settings determine a column’s behavior and properties such as validation, tokenization, redaction, and encryption.
Review the following column settings in Studio or using the tags in the Management API (see vault settings):
Basic properties for the column such as name, description, tags, column group, data type, array, regex validations, uniqueness, and required status.
Tokenization substitutes sensitive values with non-sensitive tokens. Determine whether tokenization should be needed for a column and the appropriate tokenization format.
Persistent tokens are for long-lived values and remain in your vault until you delete them. These tokens can be formatted to be random (generates a different token for every instance of a value) or consistent (generates the same token for every instance of a value). Additionally, you can enable format-preserving tokenization to replace the sensitive values in your database without changing the schema or any validation rules, as the token would appear in the same format as the plain text data.
Note: You can define the format you want to use for the generated tokens. If you don’t define the regex, Skyflow attempts to infer the format of the provided value.
Transient tokens are for temporary storage of sensitive values, which are automatically deleted after a predefined Time to Live (TTL). Updating a transient field with a new value resets the TTL, but if the token expires, detokenization requests fail.
Security tip: Consider using transient tokenization if your use case involves short-term storage of highly sensitive values, such as credit card information (refer to CVV best practices).
You can also configure a column to not use tokenization.
Redaction settings control how sensitive data is obscured to prevent unauthorized access. Full redaction returns values as “REDACTED”, partial redaction applies regex mask policy and formats to values, and no redactions returns values in plain text.
Make sure all columns that contain sensitive data have some level of redaction applied.
Determine how your data should be encrypted when stored in your Skyflow vault. Disk-level encryption keeps data encrypted at rest and is always enabled. Column-level encryption enables querying functionality. More specifically, encrypted operations enable exact match queries, and encrypted integer operations (Aggregation and Comparison) enable queries with mathematical operations (SUM, AVERAGE, <, >, sort by).
Best practices to keep sensitive data out of your logs during development.
Create a vault schema in Studio or using the Management API.
When you create your schema, plan for future expansion and use cases. Once you create a vault, you can always add new columns and edit select column settings. However, once data is in a column, certain settings are fixed and cannot be modified later such as tokenization options, column-level encryption, indexing and some general settings (column group, data type and array). If you’ve already inserted some test data, you can delete records in Studio or via the Delete Record API.
Tabular summary of modification conditions for column settings.
Some key considerations for schema flexibility:
If needed, Skyflow can assist in reviewing your schema and provide architecture recommendations to ensure smooth integration.
Once you design your solution and the schema for your vault, you’re ready to start building your solution.