Qualytics CLI

The Qualytics CLI Workflow

We’re excited to announce that Qualytics now has a command line interface, offering a new workflow for your data quality monitoring. We wanted to provide users with another approach to their data quality that doesn’t involve navigating a user interface. We recognize that everyone has a different preference on how to perform their data quality monitoring and Qualytics wants to provide you with as many approaches as possible for data quality monitoring.    

Initializing the Qualytics CLI

When a user wants to use the CLI with their Qualytics instance, they must first do a quick configuration. All they need is to have their company’s Quaytics URL and one of their personal tokens. Then the user can run the following command to have the CLI ready to go.

    qualytics init 
              --url "https://your-qualytics.qualytics.io/" 
              --token "YOUR_TOKEN_HERE"

Triggering an Operation VIA the CLI

The Qualytics CLI now allows a user to trigger any one of our operations on a connected datastore, providing an easy way to check your company’s data quality at a moment’s notice. No longer does an operation have to be triggered through the UI, but can be done at any time through your command line. The Qualytics CLI also offers all the same parameters as seen in our user interface. The first operation that can be triggered by the CLI is our catalog operation. During the catalog operation, the Qualytics platform will analyze the source datastore’s metadata, preparing it for subsequent Profile and Scan operations.

    qualytics run catalog 
                  --datastore "DATSTORE_ID_LIST" 
                  --include "INCLUDE_LIST" 
                  --prune 
                  --recreate 
                  --background

The second operation that can be triggered from the CLI is our profile operation. This operation generates valuable metadata insights for all the data assets within the source datastore. It also automatically infers customized data quality checks based on the profiled data.

    qualytics run profile 
                  --datastore "DATSTORE_ID_LIST" 
                  --container_names "CONTAINER_NAMES_LIST" 
                  --container_tags "CONTAINER_TAGS_LIST"
                  --infer_constraints 
                  --max_records_analyzed_per_partition "MAX_RECORDS_ANALYZED_PER_PARTITION" 
                  --max_count_testing_sample "MAX_COUNT_TESTING_SAMPLE"
                  --percent_testing_threshold "PERCENT_TESTING_THRESHOLD" 
                  --high_correlation_threshold "HIGH_CORRELATION_THRESHOLD" 
                  --greater_then_date "GREATER_THAN_TIME"
                  --greater_than_batch "GREATER_THAN_BATCH" 
                  --histogram_max_distinct_values "HISTOGRAM_MAX_DISTINCT_VALUES" 
                  --background

The last operation that can be triggered from the CLI is our scan operation. When a scan is triggered on a source datastore, the Qualytics engine asserts the automatically inferred checks (as well as any additional checks you create) against both historical and new data within the source datastore.

    qualytics run scan 
                  --datastore "DATSTORE_ID_LIST"
                  --container_names "CONTAINER_NAMES_LIST" 
                  --container_tags "CONTAINER_TAGS_LIST"
                  --incremental 
                  --remediation 
                  --max_records_analyzed_per_partition "MAX_RECORDS_ANALYZED_PER_PARTITION" 
                  --enrichment_source_records_limit
                  --greater_then_date "GREATER_THAN_TIME" 
                  --greater_than_batch "GREATER_THAN_BATCH" 
                  --background

Lastly, if you want to start an operation, but not waiting for the operation to finish, to use your command line again, you can use the –background parameter to start an operation without having the command line wait for it to finish. This is especially useful because if your datastore possesses a huge quantity of data and operations take a long time to finish, your terminal will not be clogged by the Qualytics CLI.    

Check Operation Status

The Qualytics CLI also allows you to check the status of any operation that was triggered. If you triggered an operation to run in the background and want to see its status, just run this simple command to find out. The Qualytics CLI will report the correct status of the triggered operation, even if the operation failed or was aborted.  

    qualytics operation check_status 
                        --ids "OPERATION_IDS"

Export Checks

The export checks command allows you to export all the checks from a datastore. With this, you can migrate the quality checks to different datastores. Example:

    qualytics checks export --datastore 1 --containers 1,2 

This exports checks from datastore with ID ‘1’, from containers with IDs ‘1’ and ‘2’, and saves them to $HOME/.qualytics/data_checks.json

Import Checks

The import checks command allows you to import data quality checks from a file into a datastore. Example:

    qualytics checks import --datastore 2,3 

This imports checks from the default $HOME/.qualytics/data_checks.json into datastores with IDs ‘2’ and ‘3’

Broaden Your Data Quality Pipeline

By leveraging the Qualytics CLI, we’re providing companies with another pipeline for their data quality governance pipeline. No longer does one have to use the user interface to accomplish their goals, but can use the Qualytics CLI to get the same end results. At the same time, Qualytics is committed to improving and expanding the CLI’s capabilities to provide the user with the smoothest experience in their data quality governance goals.

To learn more about the full feature list of the Qualytics CLI, you can visit the pypi qualytics-cli package or our User Guide .

BY Team Qualytics / ON Mar 26, 2024

Share: