Using Generative API for Better Data Quality in Databases and Files

AI-Powered Individual Name Matching as a Native Application on Snowflake


Easily Access Interzoid's Individual Name Matching API within SQL Statements on the Snowflake Data Cloud.



Access Interzoid as a User-Defined Function on Snowflake within SQL. Identify data inconsistency and similarity with a single SQL query.


Identify and match inconsistent and redundant data within or across Snowflake SQL tables

Examples of Inconsistent Data



Visit the Snowflake Marketplace app listing: learn more & launch
Click Here

Once the native application is added to your Snowflake account with a one-click install, the following SQL query is all you need to identify and cluster similar and inconsistently-spelled organization names in any Snowflake data table:


    SELECT indiv,interzoid_indiv_match_app.indiv_match_core.org_simkey(org) AS indiv_key
    FROM individuals
    ORDER BY indiv_key;
                


Samples of similarity keys generated via SQL Select statements:

Matching individual names with AI examples
Matching individual names and organization names with AI examples

Overview

Interzoid's Individual Name Matching Snowflake Native Application addresses the issues of inconsistency and redundancy in individual names within data tables. These issues can otherwise hinder accurate data analysis, customer communication, data-driven processes, AI model effectiveness, and other data-centric activities.

By leveraging specialized algorithms, machine learning, extensive knowledge bases, and fine-tuned AI models, the application generates a canonical key (a textual string) using Interzoid's Organization Matching API. This key helps identify and match "similar" individual name data values, whether within a single table or across multiple tables, using simple SQL statements on the Snowflake data platform. The functionality is made available as a "User-Defined Function" (UDF) and can be accessed as part of a SQL statement. Examples of similarity keys are provided below.


Features

- Generation of similarity keys helps to enable significant data quality improvement by identifying inconsistent data, discovering duplicate records, addressing matching challenges across joins, and more.

- Snowflake's Native Application platform makes it easy and fast to get started identifying data consistency and quality issues in any Snowflake table.

- Simply install the app from Snowflake Marketplace, obtain an API key from Interzoid, provide it as a "Secret" using Snowflake SQL, and you're off and running.


Snowflake Marketplace Installation

1. Log in to your Snowflake account.

2. Navigate to the Snowflake Marketplace.

3. Search for "Interzoid Org Match".

4. Once located, click "Get" and follow the Snowflake installation instructions to install the native application to your Snowflake account for use.


Initial Setup

1. Obtain your Interzoid API License Key by registering at www.interzoid.com.

2. Enter the API License Key in the Snowflake Secrets Manager using the Security icon in the upper right on the installed Application's main page. On the 'Privileges' tab, click 'Add' next to the API License Key object, enter your API key, and click 'Configure'.

3. In the same security section, click on the 'Connections' tab and click 'Review' on the api.interzoid.com connection, and then 'Connect'. This will complete the setup of the application and the external API access. Copy the (database.schema.secret) value of the 'Authentication with Interzoid API Key' field in this Connections tab of the Security panel, as your GRANT statement values for secret authorization may be different in the configuration script below.

4. Run the following configuration script in a Snowflake SQL Worksheet. This script sets the correct privileges, and then creates a test database and schema as an example to try out use of the API within the created User Defined Function (UDF). This script can be customized to provide authorization and access to your own databases, schemas, and tables so you can immediately begin performing data quality analysis and increasing match rates on your own data.


        USE APPLICATION interzoid_indiv_match_app;

        GRANT USAGE ON DATABASE INTERZOID_INDIV_MATCH_APP_APP_DATA TO APPLICATION interzoid_indiv_match_app;

        GRANT USAGE ON SCHEMA INTERZOID_INDIV_MATCH_APP_APP_DATA.CONFIGURATION TO APPLICATION interzoid_indiv_match_app;

        -- Use your 'Authentication with Interzoid API Key' (database.schema.secret) as described above in the initial setup
        GRANT USAGE ON SECRET INTERZOID_INDIV_MATCH_APP_APP_DATA.CONFIGURATION.INTERZOID_INDIV_MATCH_APP_INTERZOID_AUTH_KEY TO APPLICATION interzoid_indiv_match_app;        

Running the Interzoid Individual Match App

- The following initializes the native application, which then enables the execution of the sample SQL statements using the Interzoid similarity key functionality. This demonstrates how the function can be called a single value at a time.


        CALL interzoid_indiv_match_app.indiv_match_core.init_app(PARSE_JSON('{
                "secret_name": "interzoid_auth_key",
                "external_access_integration_name": "interzoid_external_access_integration",
            }'));

        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Jim Kelly');
        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('James Kelley');
        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Mr. Jim H. Kellie');

        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Sally Anderson');
        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Ms. Sallie P Andersen');

        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Juanita figuroea');
        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('JUNITA FIGUEROA');

        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Chopra, Prateek');
        SELECT interzoid_indiv_match_app.indiv_match_core.indiv_simkey('Pratik Shoprah');
                

Similarity Key Examples

These are examples of the algorithmically and AI-generated similarity key strings used to identify inconsistent individual name data. These keys can be used to identify, match, and cluster similar individual names, either within one table or across multiple tables. Note that similar individual names generate the same similarity key, which can then be used to match and cluster similar data.


        Jim Kelly -> wSj_mICOU22GjuhAyBrtk4bV6R-sK-83drNPEJ1z6tw
        James Kelley -> wSj_mICOU22GjuhAyBrtk4bV6R-sK-83drNPEJ1z6tw
        Mr. Jim H. Kellie -> wSj_mICOU22GjuhAyBrtk4bV6R-sK-83drNPEJ1z6tw

        Sally Anderson -> fOjMwYe9hOgKTa5LCDruK-NgkyDzasOV9Fqf-luTWXw
        Ms. Sallie P Andersen -> fOjMwYe9hOgKTa5LCDruK-NgkyDzasOV9Fqf-luTWXw

        Juanita figuroea -> uCrmoMc8VijLYjuzh4tqL_KcVyXZ_TGfNd61CIsl0mo
        JUNITA FIGUEROA -> uCrmoMc8VijLYjuzh4tqL_KcVyXZ_TGfNd61CIsl0mo

        Chopra, Prateek -> Dp00zJ1_WWddan3nLBSL6aqgwWBtbqaPFF29yR3rzxA
        Pratik Shoprah -> Dp00zJ1_WWddan3nLBSL6aqgwWBtbqaPFF29yR3rzxA
        

Entire Snowflake SQL Table Example

Another way to try is to create a simple table inserting these individual names into a table to demonstrate usage of the similarity key generation function within a SQL statement on the Snowflake platform:


        create database test_db;
        create schema test_db.test_schema;

        CREATE TABLE test_db.test_schema.individuals (
            indiv TEXT);

         INSERT INTO test_db.test_schema.individuals (indiv) VALUES
            ('Jim Kelly'),
            ('Ms. Sallie P Andersen'),
            ('Pratik Shoprah'),
            ('JUNITA FIGUEROA'),
            ('Mr. Jim H. Kellie'),
            ('James Kelley'),
            ('Juanita figuroea'),
            ('Chopra, Prateek'),
            ('Sally Anderson');
                

After the table is created, the following query generates a similarity key for each organization name using the Interzoid API behind-the-scenes. Sorting by the generated similarity key will then cluster similar organization names next to each other.


        -- Use your 'Authentication with Interzoid API Key' (database.schema.secret) as described above in the initial setup
        GRANT USAGE ON SECRET INTERZOID_INDIV_MATCH_APP_APP_DATA.CONFIGURATION.INTERZOID_INDIV_MATCH_APP_INTERZOID_AUTH_KEY TO APPLICATION interzoid_indiv_match_app;

        GRANT USAGE ON DATABASE test_db TO APPLICATION interzoid_indiv_match_app;
        GRANT USAGE ON SCHEMA test_db.test_schema TO APPLICATION interzoid_indiv_match_app;

        USE APPLICATION interzoid_indiv_match_app;

        SELECT indiv,interzoid_indiv_match_app.indiv_match_core.indiv_simkey(indiv) AS indiv_key
        FROM individuals
        ORDER BY indiv_key;
                

Snowflake query results: Sorted Snowflake Match Results Sorting by similarity key clusters similar organization names together.

From here, the flexibility and possibilities are endless with your own data, including duplicate identification, using multiple-column matching, custom business logic, joins that overcome data inconsistency, and more.


Additional questions? Email us at support@interzoid.com


All content (c) 2019-2024 Interzoid Incorporated. Questions or assistance? Contact support@interzoid.com

201 Spear Street, Suite 1100, San Francisco, CA 94105-6164

Interested in Data Cleansing and Enhancement Services?
Let us put our Generative AI-enhanced data tools and processes to work for you.

Start Here
Terms of Service
Privacy Policy

Use the Interzoid Cloud Connect Data Platform and Start to Supercharge your Cloud Data now.
Connect to your data and start running data match reports in minutes: connect.interzoid.com
API Integration Code Examples and SDKs: github.com/interzoid
Documentation and Overview: Docs site
Interzoid Product and Technology Newsletter: Subscribe
Partnership Interest? Inquire