Learn about Feature Store, which is an emerging concept in AI and advanced analytic, and why Data Virtualization is one of the best options for Feature Store as a Service (FSaaS).
Feature Store is a hot topic in today’s AI and Advanced Analytics and a lot of vendors are actively looking into it and are working on solutions and products to fulfill the requirements. Before explaining data virtualization as a good fit for FSaaS, let’s explain a feature first.
Even though “feature” is a more common word in AI and advanced analytics, but a feature is essentially a form of data that is built based on raw or existing feature(s). A new form of data used to be generated by integration, ETL, RPA tools, etc, however, the common understanding is that features are those forms of data that are generated by feature engineering processes and are meant to be used by AI services. When we look at some real feature implementations, we can see that not all features are too complex or specific to AI, and most of the time, the same features can be used for integration, analytic, or reporting also.
A feature, regardless of its feature engineering complexity, can be an enterprise data asset and may have other consumers than only AI. Therefore, if it turns out to be an enterprise asset, it still needs a centralized repository, data catalog, and governance and must be sharable to authorized people/systems.
Now let’s look at some examples of feature generation:
More about the feature
The reason for generating a feature is important, as that explains the position of a feature in the feature spectrum (next item). A feature is generated for the below purposes:
Feature engineering tool
What we learn from all the above contexts are
- Data Virtualization generates virtual data objects (VDOs) based on any data source from anywhere and it decouples itself from storage technologies, whether source or target!
- Data Virtualization is essentially a No-ETL concept but can be used/accessed by all T-SQL, ETLs, programming languages/scripts/tools, visualization tools, and AI platforms, either through virtualization platform, API, or ODBC, which makes it decoupled from tooling.
- Data Virtualization acts as a real data hub and all VDOs are accessible by authorized systems/people.
- All VDOs are governed and secured through a centralized process.
- As VDOs are logical, data catalog and metadata get automatically generated by creating/updating a VDO.
Disclaimer: Bear in mind that, I tried to have a fair comparison based on my own experiences, my definition of feature store and common market tools & platforms which I have seen and been working with, but if I missed something or new works are going around, I would be happy to get your feedback through my LinkedIn, please.