Friday, December 2, 2016

Azure HDInsight Apps & Domain-Joined Premium HDInsight Clusters

Up until recently, HDInsight had a separate security model from Azure AD.  Microsoft has introduced Domain-Joined HDInsight clusters, which brings authentication with Azure AD & authorization with Apache Ranger to HDInsight.  Currently this feature is part of Premium HDInsight, and supports Hive, though you should be able to install/custom configure other applications.

When spinning up an Azure HDInsight cluster, there are a few options during setup, including adding marketplace apps from 3rd-party vendors.

The blurb from the install screen.

External Apps

  • DatameerDatameer offers analysts an interactive way to discover, analyze, and visualize the results on Big Data. Pull in additional data sources easily to discover new relationships and get the answers you need quickly.
  • Streamsets Data Collector for HDnsight provides a full-featured integrated development environment (IDE) that lets you design, test, deploy, and manage any-to-any ingest pipelines that mesh stream and batch data, and include a variety of in-stream transformations—all without having to write custom code.
  • Cask CDAP 3.5 for HDInsight provides the first unified integration platform for big data that cuts down the time to production for data applications and data lakes by 80%. This application only supports Standard HBase 3.4 clusters.
More info on each of these.

Datameer is a data ingest, transformation, and visualization tool.  Connects to tons of sources, including cloud-based, on-prem, relational, nosql, and even Cobol.  
There's a built-in plugin to export to a Tableau Server.

Streamsets Data Collector, for building data pipelines and providing operational dashboards.

Cask CDAP

Each of the applications have specific requirements for the version of HDInsight / HDP that will be used and additional configuration within the applications themselves.

Custom Apps

In addition to the apps in the Azure Marketplace, you can install custom applications, one way is by using shell actions.  

Some resources for installing custom applications with Azure HDInsight.
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-install-custom-applications

HDInsight provides several scripts to install the following components on HDInsight clusters:



Installing the script action for Hue and some other components will restart the HDInsight cluster services, rendering the server unavailable for jobs.

Bootstrapping Config Files

If you download configs through Ambari or from command-line, you can provision new clusters with changed configuration files.

Connectivity

Connecting to the cluster using ssh

Working with HD Insight, Hue and many of the other browser-based tools sometimes requires Setting up SSH tunneling, if you don't have access directly to HDInsight network.

You can avoid this by setting up some edge nodes in the cluster.
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-use-edge-node

Extending the network is also useful.  When provisioning an HDInsight cluster, you can assign it to a VNet.
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-extend-hadoop-virtual-network

RStudio & RevoR

Rikard Sandström has some helpful links - installing a Premium HDInsight Spark cluster with RStudio + Tunnel and Creating HDInsight Clusters with Templates.

 








Tuesday, May 31, 2016

ChatOps and Azure Resources

Working with Azure this week, I setup a few Azure Logic Apps to send messages from Twitter to Slack using Zapier, Very interesting use case for using up your Slack message limits. :)  Similar functionality exists(ed) with Yahoo Pipes (RIP),  ITTT and other web 2.0 integration tools, this one is great since it can live inside a protected Azure network, is relatively simple to setup with a visual designer, and lets you upsize to Biztalk functionality for BPM activities.

Slack seems to be the method of choice for many firms coming back to the '90s realm of IRC, with persisted chat windows, ChatOps continuous integration and automated deployments.  Companies like Shopify use it for their deployments activities and for their customers feature stories.

The ability to do ChatOps activities, wire up deployment commands and notification messages in Slack is really cool.

Not to be late to the party, Microsoft introduced /skype integration with Slack in the new year.
https://www.skype.com/en/features/slack/  The motto of Microsoft in 2016 seems to be if you can't beat 'em... hang out with them.

Coming back to Azure with a _few_ resources, an enormous amount of Azure shortcuts here.
https://blogs.technet.microsoft.com/tangent_thoughts/2016/02/04/bookmark-this-aka-msazureshortcuts/

The Azure ML Learning Cheat Sheet
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/

The Azure Web Sites (Web Apps?) Cheat Sheet
http://microsoftazurewebsitescheatsheet.info/

A Complete List of Microsoft Azure Tools
http://scottge.net/2015/08/03/a-complete-list-of-microsoft-azure-tools/

With Microsoft Canada adding its new data centres and ExpressRoute functionality, it's an exciting time to be a cloud developer above the 49th parallel north.


Wednesday, June 13, 2012

Meet the New Windows Azure - ScottGu's Blog

Scott Guthrie announces full VM support in Windows Azure.  Host your VMs, Linux, Windows, DOS in the cloud.

Now if they had a VM to VHD converter they may get even more traction.

Windows Azure now allows you to deploy up to 10 web-sites into a free, shared/multi-tenant hosting environment (where a site you deploy will be one of multiple sites running on a shared set of server resources). This provides an easy way to get started on projects at no cost.

Microsoft has gone and given away free hosting.

Lots more features here.

Meet the New Windows Azure - ScottGu's Blog

Wednesday, May 2, 2012

Crowdsourcing and Cloud Labor

Technology isn’t the only thing contained in the cloud.  There are people floating around in there too. 

One of the challenges with natural learning, neural networks and text analysis is the ability to categorize an opinion context.  Using a lexicon of various terms, and the syntax of the terms in a statement, you can do a general categorization.  However, what’s to say that “you’re so dumb” and “you’re SO dumb” mean the same thing.  The one with capitals could be a sarcastic comment, depending on the context.

We can do some general analysis of text terms, however manually tagging these may prove to be more accurate.  How do you setup a manual tagging job using real labor?

CrowdFlower’s system leverages large groups of workers to solve massive but technically simple tasks. One of these services is sentiment analysis, the study of public opinion on a given subject. Other sentiment analysis programs use computational linguistics to perform this task, but computers cannot always discern nuances of tone, such as sarcasm or irony. CrowdFlower’s version is unique in that actual humans perform the sentiment analysis.

What Motivates Crowdsource Workers on CrowdFlower & Kaggle – Highlights of SXSW 2012′s “Pay or Play” Crowdsourcing Talk | The CrowdFlower Blog

Sounds interesting!

Monday, April 9, 2012

SSIS Junkie : AdventureWorks2012 now available for all on SQL Azure

Why not host your training platforms in the cloud?  It only makes sense, especially for databases, as every developer doesn’t need the latest Adventureworks database on their laptop, and it’s a big waste to add to a local server.

AdventureWorks2012 now available for all on SQL Azure

SSIS Junkie : AdventureWorks2012 now available for all on SQL Azure

Awaiting the launch of Tailspin toys to the cloud….

Monday, April 2, 2012

The R programming language for programmers coming from other programming languages

Learning R?  An introduction and comparison against other common programming languages.

R is more than a programming language. It is an interactive environment for doing statistics. I find it more helpful to think of R as having a programming language than being a programming language. The R language is the scripting language for the R environment, just as VBA is the scripting language for Microsoft Excel. Some of the more unusual features of the R language begin to make sense when viewed from this perspective.

The R programming language for programmers coming from other programming languages