When spinning up an Azure HDInsight cluster, there are a few options during setup, including adding marketplace apps from 3rd-party vendors.
The blurb from the install screen.
External Apps
- Datameer: Datameer offers analysts an interactive way to discover, analyze, and visualize the results on Big Data. Pull in additional data sources easily to discover new relationships and get the answers you need quickly.
- Streamsets Data Collector for HDnsight provides a full-featured integrated development environment (IDE) that lets you design, test, deploy, and manage any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code.
- Cask CDAP 3.5 for HDInsight provides the first unified integration platform for big data that cuts down the time to production for data applications and data lakes by 80%. This application only supports Standard HBase 3.4 clusters.
More info on each of these.
Datameer is a data ingest, transformation, and visualization tool. Connects to tons of sources, including cloud-based, on-prem, relational, nosql, and even Cobol.
There's a built-in plugin to export to a Tableau Server.
Streamsets Data Collector, for building data pipelines and providing operational dashboards.
Cask CDAP
Each of the applications have specific requirements for the version of HDInsight / HDP that will be used and additional configuration within the applications themselves.
Custom Apps
In addition to the apps in the Azure Marketplace, you can install custom applications, one way is by using shell actions.
Some resources for installing custom applications with Azure HDInsight.
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-install-custom-applications
HDInsight provides
several scripts to install the following components on HDInsight clusters:
Name
|
Script
|
Install Spark
|
|
Install R
|
|
Install Solr
|
|
- Install Giraph
|
|
Pre-load Hive libraries
|
Bootstrapping Config Files
If you download configs through Ambari or from command-line, you can provision new clusters with changed configuration files.
Connectivity
Connecting to the cluster using sshWorking with HD Insight, Hue and many of the other browser-based tools sometimes requires Setting up SSH tunneling, if you don't have access directly to HDInsight network.
You can avoid this by setting up some edge nodes in the cluster.
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-use-edge-node
Extending the network is also useful. When provisioning an HDInsight cluster, you can assign it to a VNet.
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-extend-hadoop-virtual-network