Modern data warehousing solutions are often highly distributed and cloud-based. Users, usually authenticated via security mechanisms available in solutions like Active Directory, should have secure access to the data warehouse or a Data Lake, hosted anywhere and on any platform (Windows, Linux). Technologies like Apache Spark or Microsoft Azure HDInsight do not provide an easy way to configure the use of Windows Integrated Authentication (WIA). Some other data sources do not support WIA at all.
Lyftron greatly simplifies the process of using an Apache Spark cluster as a secure data warehouse. It comes with a bundled stand-alone Apache Spark 2 instance that may be easily migrated to an external cluster. We will show how to use Lyftron as a secure authentication proxy for an Apache Spark data warehouse. We assume that we have an existing Northwind database on a standalone or cloud-based Spark cluster (for example Azure HDInsight) and the Lyftron Spark driver is already installed there. Installation of Lyftron driver is very straight-forward operation – you need to run a shell script that will automatically execute on head nodes and install Lyftron Spark application. Please note that Lyftron fully supports loading data into Apache Spark and we will write about it in a separate blog post. Let’s create a virtual database and import the metadata from the Apache Spark cluster. In Lyftron, go to Databases and click Add Database:
Fill in the details and select “Add new connection” to at the same time create a virtual database and an underlying connection:
On the next screen, fill in all necessary details and click Test Connection to ensure everything works as expected:
Now Lyftron will download all metadata from the source database – select all tables and finish the wizard by clicking Create:
The newly created virtual database appears on the list:
Now, let’s grant an existing Windows user access to the virtual database and configure role membership. In Lyftron, go to Administer -> User Management and click Add User. In the Login field enter the domainlogin-name of the selected user, check Integrated authentication, and select a Regular User type:
On the next screen, select the (default) public role and finish by clicking Create:
The new user should appear on the list. Now we need to assign the user to our virtual database. Go back to Databases and drill through Databases -> Spark-Northwind -> Access Rights -> Select User. Pick the user from the list:
Let’s give the user CRUD permissions together with View definition and click Save:
Now, assuming the user is logged into Windows, let’s use SQL Server Management Studio to connect to Lyftron via TDS using Integrated Authentication:
After successful connection, expand Databases and you should see your database; drilling down will allow you to view the structure:
Let’s run a sample query:
Now, let’s try to drop a table:
As expected, we get an Access Denied message. And voila! We have just secured an internal or external Apache Spark cluster using Windows Integrated Authentication provided by Lyftron.