Modern data warehousing solutions are often highly distributed and cloud-based. Users, usually authenticated via security mechanisms available in solutions like Active Directory, should have secure access to the data warehouse or a Data Lake, hosted anywhere and on any platform (Windows, Linux). Technologies like Apache Spark or Microsoft Azure HDInsight do not provide an easy way to configure the use of Windows Integrated Authentication (WIA). Some other data sources do not support WIA at all.
Lyftron greatly simplifies the process of using an Apache Spark cluster as a secure data warehouse. It comes with a bundled stand-alone Apache Spark 2 instance that may be easily migrated to an external cluster. We will show how to use Lyftron as a secure authentication proxy for an Apache Spark data warehouse. We assume that we have an existing Northwind database on a standalone or cloud-based Spark cluster (for example Azure HDInsight) and the Lyftron Spark driver is already installed there. Installation of Lyftron driver is very straight-forward operation – you need to run a shell script that will automatically execute on head nodes and install Lyftron Spark application. Please note that Lyftron fully supports loading data into Apache Spark and we will write about it in a separate blog post. Let’s create a virtual database and import the metadata from the Apache Spark cluster. In Lyftron, go to Databases and click Add Database:
Fill in the details and select “Add new connection” to at the same time create a virtual database and an underlying connection:
On the next screen, select the (default) public role and finish by clicking Create:
The new user should appear on the list. Now we need to assign the user to our virtual database. Go back to Databases and drill through Databases -> Spark-Northwind -> Access Rights -> Select User. Pick the user from the list:
Let’s give the user CRUD permissions together with View definition and click Save:
Let’s run a sample query: