Prepare tables for data quality metrics
Activate a table, set values for its inheritable settings, and manage any partitions or timestamps
Last updated
Activate a table, set values for its inheritable settings, and manage any partitions or timestamps
Last updated
Before you can collect metrics on a data asset, it must be Active. You can only set a data asset Active if its parent asset is already Active:
To make a schema active, select its datasource in the Explorer tree, then choose Manage Metrics on the Actions menu on the top right panel. Then in the dialog that opens, toggle the Active slider for the schema (slide it right for Active). It takes a few minutes for the schema to activate. During this time, the schema's data will be unavailable.
3. To make a column active, select Manage Metrics on the Actions menu for its parent table, then slide the Active toggle to the right for the column you want to activate. You can then enable various column autometrics.
When you make a table Active, its Manage Table Settings dialog opens so you can configure the table's inheritable settings. These values are inherited by metrics that measure the table's data quality (but you can override them when you create or edit a metric).
Specify an Aggregation Interval to set how often the metric's value is aggregated. For daily, metric values are calculated over the 24-hour period starting at 12AM. For hourly, metrics will be calculated at the top of each hour. For weekly, metrics will be calculated starting at 12AM on Monday.
Select the Aggregation Time Zone. This specifies the time zone that is used for metric aggregation. For example, if Aggregation Interval is daily, the data will be aggregated for the 24-hour period starting at 12AM in the Aggregation Time Zone.
Set Evaluation Delay to a value which represents the time period required for your data to be guaranteed stable. Evaluation Delay is a blackout period during which data is considered not stable or not ready for running data quality checks. For example, if a metric has an Evaluation Delay of one day, data with a timestamp in the past 24 hours is ignored.
Some timestamp columns don't include time zone information, so you might need to specify the Time Zone where the data was written. If the time zone is part of the timestamp, this setting says Derived and can't be changed.
Click Confirm and Save to save your settings. Now that you've activated the table, you can create metrics for it.
Select the Timestamp field, then select + Create a virtual column.
In the flyout that appears, select a column with values you can turn into a timestamp using functions.
Enter a function for converting the column's values into UTC-format timestamp values. For example, if you store timestamps as the epoch with millisecond precision, your input might be TO_TIMESTAMP({column_value}/1000).
Enter a function for converting UTC-format timestamp values into valid column values, then select Save. For example, your input might be EXTRACT(epoch FROM {timestamp_value}) *1000.
The following table lists the functions you would use to transform a column value to and from a timestamp value, if the column values were stored as the epoch with millisecond precision. You can use these as a starting point, adjusting the transformation as needed to accommodate your column values.
Athena
FROM_UNIXTIME({column_value} / 1000)
TO_UNIXTIME({timestamp_value}) * 1000
BigQuery
TIMESTAMP_MILLIS({column_value})
UNIX_SECONDS({timestamp_value}) * 1000
Databricks
CAST(({column_value} / 1000) AS TIMESTAMP)
UNIX_TIMESTAMP({timestamp_value})::BIGINT * 1000
Microsoft SQL
DATEADD(s, {column_value} / 1000, '19700101')
CAST(DATEDIFF(s, '19700101', {timestamp_value}) AS BIGINT) * 1000
Oracle
(timestamp '1970-01-01 00:00:00') + numtodsinterval({column_value} / 1000, 'second')
(CAST(SYS_EXTRACT_UTC({timestamp_value}) AS DATE) - DATE '1970-01-01' ) * 86400 * 1000
Postgres
TO_TIMESTAMP({column_value} / 1000)
EXTRACT(epoch FROM {timestamp_value}) * 1000
Redshift
TIMESTAMP 'epoch' + {column_value} / 1000 * interval '1 second'
EXTRACT(epoch FROM {timestamp_value})::bigint * 1000
Snowflake
TO_TIMESTAMP({column_value} / 1000)
EXTRACT(epoch FROM {timestamp_value}) * 1000
2. To make a table active, select its schema in the Explorer tree, then choose Manage Metrics on the Actions menu on the top right panel. Then in the dialog that opens, toggle the Active slider for the table (slide it right for Active). The dialog immediately opens so you can provide values for inheritable settings.
Choose the timestamp column for your table. You can also if no suitable column is ready to use. The aggregation period for a metric is based on the value in the timestamp column, translated into the time zone specified by the Aggregation Time Zone. As described above, only data with timestamps prior to [now] - Evaluation Delay are considered.
Under Partitions, if your table has time-based partitions, you can specify the column and format of the partition so that you can use it to improve metric's query performance. Format should be specified using . If your table doesn't have any partitions, the Partitions section doesn't appear in the Manage Table Settings dialog.
If you don't have a column that can be used as a timestamp you can create a timestamp column by creating a virtual timestamp. You create a virtual timestamp for a table in the dialog.