Blog

In this post, I’ll go over how you can use Oracle WebLogic as your Java based webservice to host a Logi Analaytics application. The process is similar to hosting a Logi Analytics application on another Java service, such as Apache Tomcat or JBoss, but features a couple of specific quirks that are worth going over in detail.

Installing Java and WebLogic

The first step in hosting your LogiAnalytics application is to install the Java Development Kit (JDK) and the web service, WebLogic. To download the latest version of the JDK, go to http://www.oracle.com/technetwork/java/javase/downloads/index.html and select the correct version of the JDK installer for your operating system. Once the installer is downloaded, run it, and make sure that you know where the Java system files are located on your hard drive, as you’ll need to reference them later.

Installing Java and WebLogic
Installing Java and WebLogic

 

Once you have the JDK, you can download the latest version of WebLogic from http://www.oracle.com/technetwork/middleware/weblogic/overview/index.html. Again, go to the downloads page and select the correct version for your operating system. Then, within a terminal program, go to the folder that contains your WebLogic installer jar, and run the command: “java -jar <weblogic_jar_file_name>”. If the java command has not been added to your terminal path, you may need to use the full file path to your java executable, such as: “C:\Program Files\Java\jdk1.8.0_131\bin\java -jar …..” This command will open the WebLogic installer. Follow the prompts to install WebLogic Server.

installing WebLogic Server
installing WebLogic Server

Creating a WebLogic Domain

With WebLogic installed, it’s time to set up your server domain. Open a terminal, and navigate to your oracle home location (e.g., “C:\Oracle\Middleware\Oracle_Home”), and then further to: “<ORACLE_HOME>\oracle_common\common\bin”. Run “config.cmd”. This will open the WebLogic configuration wizard, which will allow you to set up a new domain where your server will be hosted. Follow the prompts in the wizard to create a new domain.

create a new domain
create a new domain

 

 

 

 

Deploying Your Logi Application

This should create a folder at “<ORACLE_HOME>\user_projects\domains\<your domain name>”. Inside your domain files, place your Logi program folder into the folder “autodeploy”. Once the files are copied, add the extension “.war” to the end of your Logi program folder’s filename. Lastly, you need to download an additional xml file provided by Logi, and save it into the WEB-INF folder of your Logi program. If you’re using Oracle WebLogic 12c, download this file at http://devnet.logianalytics.com/downloads/info/java/weblogic12c/weblogic.xml. If you are using BEA WebLogic 10, only versions 10-10.3 are supported. For versions 10-10.2, download the weblogic.xml file here: http://devnet.logianalytics.com/downloads/info/java/weblogic10/weblogic10.xml. If you are using BEA WebLogic 10.3, download the xml file here: http://devnet.logianalytics.com/downloads/info/java/weblogic10/weblogic10.3.xml.

WEB-INF folder
WEB-INF folder

Run Your Server

Finally, in your terminal window, navigate to your domain’s main folder (“<ORACLE_HOME>\user_projects\domains\<your domain name>”) and run “startWebLogic.cmd”. This will start your hosting service, and automatically deploy your Logi application. When the starter’s logs say the server is running, confirm that your application is deployed by visiting its URL, or by going to the WebLogic console, which will be located at http://host:<WebLogic_port>/console.

Run Your Server
Run Your Server

 

 Like
AWS Database Migration Pitfall #4 of 4: Making Perfect the Enemy of Good

Welcome to the final installment of our blog series on AWS database migration pitfalls. In addition to reading the preceding 3 blogs you can download our eBook for further details on all four.

Good Enough is Good Enough

AWS Database Migration Pitfall #4 of 4: Making Perfect the Enemy of Good

It might not be an impressive statement to engender admiration, but it’s true nonetheless. It’s important to balance the drive for excellence with the need for momentum.

In blog #1, we discussed the pitfall of failing to set clear goals, and blog #2 covered failing to evaluate all AWS database options. It may seem like pitfall #4, making perfect the enemy of good, contradicts that advice, but it’s important to bear in mind that perfect is always elusive.

Prioritizing your goals and educating yourself on choices are important preliminary steps, but there comes a point where a project becomes thwarted by “analysis paralysis.” The benefits of further planning won’t outweigh the delay in launch.

Making Perfect the Enemy of Good vs. “Lift and Shift”

Making perfect the enemy of good is the opposite of “lift and shift,” in which organizations don’t make any changes, or redesign the database and data processing engine.

But you must evaluate the tradeoff of each addition. Your biggest gains may come from two to three substantial things. Piling on additional services may not only delay your launch, but unnecessarily inflate your costs.

Incremental Improvements are The Norm in AWS Database Migrations

The TCO benefits of AWS increase over time. According to IDC, “There is a definite correlation between the length of time customers have been using Amazon cloud services infrastructure and their returns. At 36 months, the organizations are realizing $3.50 in benefits for every $1.00 invested in AWS; at 60 months, they are realizing $8.40 for every $1.00 invested. This relationship between length of time using Amazon cloud infrastructure services and the customers’ accelerating returns is due to customers leveraging the more optimized environment to generate more applications along a learning curve.”

Iterative progress is the norm. There’s simply no need to wait for perfection prior to migrating.

Learning and Growing with AWS

The rate at which AWS innovates continues to accelerate. During his 2016 re:Invent keynote, CEO Andy Jassy stated that AWS added around 1,000 significant new services and features in 2016, up from 722 the previous year. Jassy stated, “Think about that. As a builder on the AWS platform, every day you wake up, on average, with three new capabilities you can choose to use or not.”

The nature of AWS necessitates growth and change. Even with the best initial migration, you’ll need to evaluate new product offerings over time.

Perspective

An AWS database migration isn’t a rocket launch. Even with zero rearchitecting, you’ll still realize numerous benefits, such as translating CapEx to OpEx. While failure to consider rearchitecting can result in leaving money on the table, it’s not going to result in a catastrophic failure for your business.

A skilled AWS partner like dbSeer can help you launch successfully in a reasonable time frame. Contact us to discuss your unique priorities.

 Like

Welcome to number three in our series of four blogs on mistakes commonly made in AWS database migrations. You can read the first two blogs, “Failing to Set Clear Goals,” and, “Failing to Evaluate All AWS Database Options,” and download our eBook for further details on all four.

Optimize your AWS Database Migration to Reduce Costs

One well-known benefit of cloud computing is the ability to translate capital expenses to operating expenses. Moving to AWS in a simple “lift and shift” manner will achieve this objective. However, optimizing your AWS architecture can further reduce costs by as much as 80%.

Determine Total Cost of Ownership

Weighing the financial considerations of owning and operating a data center facility versus employing Amazon Web Services requires detailed and careful analysis.

AWS has a calculator you can use to estimate your Total Cost of Ownership in moving to the cloud. The TCO website states, “Our TCO calculators allow you to estimate the cost savings when using AWS and provide a detailed set of reports that can be used in executive presentations.” The TCO Calculator is an excellent tool to start off with.

Leaving money on the table is a common AWS database migration pitfall. The AWS TCO Calculator is an excellent tool to start investigating sources of savings.
The AWS TCO Calculator can provide an instant summary report which shows you the three year TCO comparison by cost categories.

Include Both Direct and Indirect Costs of AWS Database Migration

Network World published four steps to calculating the true cost of migrating to the cloud. As part of step 1, “Audit your current IT infrastructure costs,” author Mike Chan writes, “You should take a holistic approach and consider the total cost of using and maintaining your on-premise IT investment over time, and not just what you pay for infrastructure. This calculation will include direct and indirect costs.”

And the same considerations must be made in calculating your costs when considering your unique approach to migration.

Pay Only for What You Use, Use Only What You Need

AWS customers pay only for computing resources consumed. But to fully leverage this benefit, you must be sure you’re not consuming more than necessary.

For example, enterprises often size servers to support peak use to ensure high availability. But rather than remaining sized for max processing, you can code your apps to support the elasticity offered by AWS. Such adjustments could include rearchitecting a processing engine to process on a smaller machine, or shutting it down to create a bigger machine, or processing across ten machines if the architecture supports partitioning.

Taking full take advantage of this requires the expertise of a knowledgeable AWS partner.

AWS Database Options

While AWS easily hosts a variety of licensed databases, just shifting your databases to an Amazon EC2 instance will require you to continue paying those licensing fees. But by selecting AWS database options, such as Redshift and Aurora, you can eliminate many of those existing license fees. This often requires you to also move your existing logic, structure, and code, which can be both time-consuming and expensive to convert to another platform.

However, the savings can be very high, enabling a quick return on investment.

Ensure You’re Not Leaving Money on The Table

Rearchitecting requires both time and a financial investment. Many organizations lack the expertise in-house to conduct the re-architecture properly. It’s therefore difficult to determine the cost of making changes, and the potential savings that could result.

By analyzing your existing bills and structure, a skilled AWS partner like dbSeer can estimate potential cost savings by taking these types of steps. Contact us and we’ll help figure out how you can save every penny possible.

 Like
AWS Database Migration Pitfalls: You must evaluate all the options to see where each piece fits.

We’re addressing four mistakes commonly made in AWS database migrations. You can also read the first blog, “AWS Database Migration Pitfall #1 of 4: Failing to Set Clear Goals,” and download our eBook for further details on all four.

You Don’t Know What You Don’t Know

It’s essential to conduct thorough research prior to making any major purchase. Getting educated about your AWS database options is a necessary prerequisite to making the selections that will best accomplish your migration goals.

Moving your existing infrastructure in a “lift and shift” style might possibly be ideal. But you can’t know that is the best path without first considering the alternatives.

Insight from AWS Chief Evangelist

On the AWS blog, AWS Chief Evangelist Jeff Barr wrote, “I have learned that the developers of the most successful applications and services use a rigorous performance testing and optimization process to choose the right instance type(s) for their application.” To help others do this, he proceeded to review some important EC2 concepts and look at each of the instance types that make up the EC2 instance family.

Barr continues, “The availability of multiple instance types, combined with features like EBS-optimization, and cluster networking allow applications to be optimized for increased performance, improved application resilience, and lower costs.”

And it is that availability of numerous options that results in opportunities for optimization. But evaluating the options first is necessary to know which configuration is optimal.

AWS Database Migration Choices: Open Source, Relational, Decoupled?

Just as the EC2 instance family is comprised of numerous instance types, there are multiple database options available to you when migrating to AWS. Choosing between licensed and open source is one of the many AWS database choices you’ll have to make. Be sure to evaluate columnar data store versus relational, and consider NoSQL options.

Additionally, AWS offers a wide variety of storage solutions with a corresponding variety of price points. In rearchitecting systems for the cloud, you may want to consider:

  • Keeping your raw data in cheap data storage, such as S3
    • And using services such as Elasticsearch, Spark, or Athena to meet app requirements
  • Decoupling batch processes from storage and databases
  • Decoupling data stores for various apps from a single on-premise data store
  • Streamlining time-consuming, mundane infrastructure maintenance tasks, such as backup, high availability

Developing an AWS Database Migration Plan to Improve Infrastructure

Failing to plan is planning to fail. Our first pitfall, “Failing to Set Clear Goals,” addressed the risks of not setting priorities. Once objectives have been prioritized, you must consider how to rearchitect your legacy workloads in order to best meet your goals.

In addition to maximizing cost savings, rearchitecting can enable you to improve lifecycle maintenance and incorporate new services like real-time data analytics and machine learning.

Be sure to:

Download our eBook for more information and contact us. We can help you figure out how to not just leverage the cloud, but leverage it properly.

 Like
You need to balance your organization’s unique AWS database migration objectives

To be fair, there are likely more than four mistakes that can be made in the process of an AWS database migration. But we’ve grouped the most common errors into four key areas. This blog is the first in a series. You can also download our ebook for further details on all four.

Consider AWS Database Migration Objectives

With constant pressure to improve, organizations sometimes move to the cloud in a frenzied rush. In mistakenly thinking the cloud itself achieves all objectives, some abandon proper upfront planning.

Organizations frequently move to the cloud for benefits such as elasticity and costs savings. While everyone moving to the cloud will be able to translate capital expenses to operating expenses, benefits beyond that can vary.

And this variation requires you to set priorities. Every organization is unique, so you must carefully examine your unique objectives.

Common goals include:

  • Reduce costs
  • Improve scalability
  • Reduce maintenance effort and expense
  • Improve availability
  • Increase agility
  • Speed time to market
  • Retire technical debt
  • Adopt new technology

Set Priorities

AWS Migration Priority

 

You may read the above bulleted list and think, “Yes, exactly! I want to achieve that.”

Unfortunately, it may be challenging to accomplish every goal simultaneously, without delaying your migration. You must prioritize your organization’s particular goals so you can make plans which will appropriately balance objectives.

Take a Realistic Approach to AWS

It’s becoming ubiquitous, but still, AWS is no panacea. Goals may conflict with one another. Without first establishing priorities, you can’t determine the tactics which meet your goals.

“Do it all now,” isn’t an effective strategy for success.

AWS Database Migration Issues to Consider

You will likely need to reevaluate your architecture to fully take advantage of all the possible benefits. Issues to take into consideration include:

AWS Migration Problem To Consider

 

  • How much downtime your business could sustain
  • Your current licensing situation
  • Third party support contracts
  • Current use of existing databases. For example, consider the maximum possible number of applications which could use a database
  • Application complexity, where the code is running
  • Skills required – both internally and from an AWS Consulting Partner

 

Various changes are often required, such as shifting to open source services in order to eliminate unnecessary licensing expenses.

 

Tips for Setting AWS Migration Goals

Focus on why you are migrating. Maintaining focus on your specific objectives will impact the way you implement.

Make sure all stakeholders are aligned on prioritization. Take the time to work as a team and get as much consensus as possible from multiple people. Yes, it’s difficult to manage a project by committee, and can lead to delays. But you should, again, strive to balance these often competitive objectives.

Download our eBook for more information and contact us. We can help you determine the ideal pathway for your organization and get started.

 Like


As an increasing number of companies are moving their infrastructure to Microsoft’s Azure, it seems natural to rely on its Active Directory for user authentication. Logi application users can also reap the benefits of this enterprise level security infrastructure without having to duplicate anything. Additionally, even smaller companies who use Office365 without any other infrastructure on the cloud, excluding email of course, can take advantage of this authentication.
Integrating Logi applications with Microsoft’s Active Directory produces two main benefits: attaining world class security for your Logi applications, and simplifying matters by having a single source of authentication. The following post describes how this integration is done.
1. Register Your Application with Microsoft
First, register your application with Azure Active Directory v. 2.0. This will allow us to request an access token from Microsoft for the user. To do this navigate to “https://apps.dev.microsoft.com/#/appList,” and click the “Add an app” button. After entering your application name, on the following page, click the “Add Platform” button and select “Web”. Under Redirect URLs, enter the URL of your website logon page (sample format: https:////.jsp). Microsoft does not support redirects to http sites, so your page must either use https or localhost. Make note of the redirect URL and application ID for the next step.
2. Create Custom Log-on Page For Logi Application:
Microsoft allows users to give permissions to an application using their OAuth2 sign-in page. This process returns an access token, which has a name, email address, and several other pieces of information embedded within which we use to identify the user. These next steps show you how to create a log-in page that redirects users to the Microsoft sign-in, retrieves the access token, and passes whichever value you want to use to identify the employee to Logi.
1) Download the rdLogonCustom.jsp file or copy paste the contents into a file. Place it in the base folder of your application.
2) Configure the following settings within the rdLogonCustom.jsp file to match your Logi application:
Change the ‘action’ element in the HTML body to the address of your main Logi application page:

Change the “redirectUri” and “appId” in the buildAuthUrl() function to match the information from your application registration with Azure AD v2.0:

The sample log-on page redirects the user to Microsoft’s page, allows the user to sign in before redirecting back to the log-on page. At the log-on page, it parses the token for the email address, passes the value to the authentication element using the hidden input to pass as a request parameter.
If you want to use a different value from the access token to identify the user, adjust the key in the “document.getElementById(‘email’).value = payload.” in the bottom of the custom logon file to match your desired value.

3. Configure Logi App:
In your _Settings.lgx file, add a security element with the following settings:

*If your log-on page and failure page have different names, adjust accordingly.
Under the security element, add an authentication element with a data layer that uses the value found in @Request.email~ to identify the user. Optionally, you can add rights and roles elements to the security element as well.
In conclusion, utilizing this integration for your Logi applicatons can not only make your process more efficient by eliminating a duplicate authentication, but it can also provide for an added level of security because of Microsoft’s robust infrastructure.

 Like

Recently, options for connecting to distributed computing clusters with SQL-on-Hadoop functionality have sprung up all over the world of big data.

Amazon’s Elastic Map Reduce (EMR) is one such framework, enabling you to install a variety of different Hadoop based processing tools on an Amazon hosted cluster, and query them with software such as Hive, Spark-SQL, Presto, or other applications.

This post will show you how to use Spark-SQL to query data stored in Amazon Simple Storage Service (S3), and then to connect your cluster to Logi Studio so you can create powerful visualization and reporting documents.

 

Amazon EMR

Amazon Elastic Map Reduce is an AWS service that allows you to create a distributed computing analytics cluster without the overhead of setting up the machines and the cluster yourself. Using Amazon’s Elastic Compute Cloud (EC2) instances, EMR creates and configures the requisite number of machines you desire, with the software you need, and (almost) everything ready to go on startup.

Spark-SQL

Spark-SQL is an extension of Apache Spark, an open source data processing engine that eschews the Map Reduce framework of something like Hadoop or Hive for a directed acyclic graph (or DAG) execution engine. Spark-SQL allows you to harness the big data processing capabilities of Spark while using a dialect of SQL based on Hive-SQL. Spark-SQL is further connected to Hive within the EMR architecture since it is configured by default to use the Hive metastore when running queries. Spark on EMR also uses Thriftserver for creating JDBC connections, which is a Spark specific port of HiveServer2. As we will see later in the tutorial, this allows us to use Hive as an intermediary to simplify our connection to Spark-SQL.

Logi Analytics

            Logi Anaytics is an integrated development environment used to produce business intelligence tools, such as data visualizations and reports. Within Logi Studio, one can create powerful and descriptive webpages for the displaying of data, while requiring relatively little production of traditional code, as the Studio organizes (and often develops) the HTML, Javascript, and CSS elements for the developer. Logi Anaytics can connect to a variety of different data sources, including traditional relational databases, web APIs, and – as this tutorial will demonstrate – a distributed data cluster, via a JDBC connection.

Creating a Cluster

First things first: let’s get our cluster up and running.

  1. For this tutorial, you’ll need an IAM (Identity and Access Management) account with full access to the EMR, EC2, and S3 tools on AWS. Make sure that you have the necessary roles associated with your account before proceeding.
  2. Log in to the Amazon EMR console in your web browser.
  3. Click ‘Create Cluster’ and select ‘Go to Advanced Options’.
  4. On the ‘Software and Steps’ page, under ‘Software Configuration’ select the latest EMR image, Hadoop, Spark, and Hive (versions used at time of writing shown in image below).
  5. On the ‘Hardware Configuration’ page, select the size of the EC2 instances your cluster should be built with. Amazon defaults to m3.xlarge, but feel free to adjust the instance type to suit your needs.
  6. On the ‘General Options’ page, make sure you name your cluster, and select a folder in S3 where you’d like the cluster’s logs to go (if you choose to enable logging).
  7. Finally, on the ‘Security Options’ page, choose an EC2 SSH key that you have access to, and hit ‘Create Cluster’. Your cluster should begin to start up, and be ready in 5-15 minutes. When it’s ready, it will display ‘Waiting’ on the cluster list in your console.

Connecting via SSH

In order to work on your cluster, you’re going to need to open port 22 in the master EC2’s security group so you can connect to your cluster via SSH.

  1. Select your new cluster from the cluster list and click on it to bring up the cluster detail page.
  2. Under the ‘Security and Access’ section, click on the link for the Master’s security group.
  3. Once on the EC2 Security Group page, select the master group from the list, and add a rule to the Inbound section allowing SSH access to your IP.
  4. You should also allow access to port 10001, as that will be the port through which we’ll connect to Spark-SQL later on. You can also do this via SSH tunneling in your SSH client if you’d prefer.
  5. Open an SSH client (such as PuTTY on Windows or Terminal on Mac) and connect to your cluster via the Master Public DNS listed on your cluster page, and using the SSH key you chose during the configuration. The default username on your EMR cluster is ‘hadoop’ and there is no password.

Configuring Permissions

Some of the actions we’re going to take in this tutorial will require the changing of certain file permissions. Make sure that you complete these commands in your SSH terminal before you try to start your Spark service!

  1. Navigate via the cd command to /usr/lib/ and type the command ‘sudo chown hadoop spark -R’. This will give your account full access to the spark application folder.
  2. Navigate via the cd command to /var/log/ and type the command ‘sudo chown hadoop spark -R’. This will give your account access to modify the log folder, which is necessary for when you begin the Spark service.

Importing the Data into Hive from CSV

For this tutorial, we’ll be assuming that your data is stored in an Amazon S3 Bucket as comma delimited CSV files. However, you can use any Hive accepted format – ORC, Parquet, etc – and/or use files stored locally on the HDFS. Refer to the Hive documentation to change the SQL below to suit those scenarios; the rest of the tutorial should still apply.

  1. In your SSH session, use the command ‘hive’ to bring up the hive shell.
  2. Once the shell is opened, use the following SQL to create a table in the Hive metastore – no actual data will be imported yet:

“CREATE EXTERNAL TABLE tablename(values) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ STORED AS TEXTFILE AT LOCATION ‘s3://bucket-name/path/to/csv/folder’;”

  1. Now that the table exists, use the following command to populate the table with data:

“LOAD DATA INPATH ‘s3://bucket-name/path/to/csv/folder’ INTO TABLE tablename;

  1. If you provide a path to a specific folder, Hive will import all the data in all the CSV files that folder contains. However, you can also specify a specific .csv in the filename instead, if you so desire.

Starting Spark-SQL Service

Luckily for us, Amazon EMR automatically configures Spark-SQL to use the metadata stored in Hive when running its queries. No additional work needs to be done to give Spark access; it’s just a matter of getting the service running. Once it is, we’ll test it via the command line tool Beeline, which comes with your Spark installation.

  1. In SSH, type the command: “/usr/lib/spark/sbin/start-thriftserver.sh”
    1. Make sure you’ve configured the permissions as per the section above or this won’t work!
  2. Wait about 30 seconds for the Spark application to be fully up and running, and then type: “/usr/lib/spark/bin/beeline”.
  3. In the beeline command line, type the command: “!connect jdbc:hive2://localhost:10001 -n hadoop”. This should connect you to the Spark-SQL service. If it doesn’t, wait a few seconds and try again. It can take up to a minute for the Thriftserver to be ready to connect.
  4. Run a test query on your table, such as: “select count(*) from tablename;”
  5. Spark-SQL is currently reading the data from S3 before querying it, which slows the process significantly. Depending on the size of your dataset, you may be able to use the command “cache table tablename” to place your data into Spark’s local memory. This process may take a while, but it will significantly improve your query speeds.
  6. After caching, run the same test query again, and see the time difference.

Downloading the JDBC Drivers

Spark Thriftserver allows you to connect to and query the cluster remotely via a JDBC connection. This will be how we connect our Logi application to Spark.

    1. Using an SFTP program (such as Cyberduck), connect to your cluster, and download the following JARs from /usr/lib/spark/jars:
      SPARK HIVE JDBC
    2. To test the connection, download SQL Workbench/J.
    3. Make a new driver configuration, with all of the above jars as part of the library.
    4. Format the connection string as: “jdbc:hive2://host.name:10001;AuthMech=2;UID=hadoop” and hit “Test”.
    5. If your Thriftserver is running, you should see a successful connection.

    Connecting with a Logi Application

    Because we’re using a JDBC connection, you’ll have to use a Java based Logi Application, as well as Apache Tomcat to run it locally. If you don’t already have it, download Tomcat, or another Java web application hosting platform.

    1. Create a Java based Logi Application in Logi Studio, with the application folder located in your Apache Tomcat’s webapps folder.
    2. Add all of the JARs from the previous section to the WEB_INF/lib folder of your Logi application folder.
    3. Within Logi Studio, create a Connection.JDBC object in your Settings definition. The connection string should be formatted as: “JdbcDriverClassName=org.apache.hive.jdbc.HiveDriver;JdbcURL=jdbc:hive2://host.name:10001/default;AuthMech=2;UID=Hadoop”
    4. In a report definition, add a Data Table and SQL Data Layer, connected to your JDBC connection, and querying the table you created in your cluster. Make sure you add columns to your Data Table!
    5. In the command shell, navigate to your Tomcat folder, and start Tomcat via the command: “./bin/catalina.bin start”
    6. Try to connect to your Logi webpage (usually at “localhost:8080/your-Logi-App-name-here”) and see your data.

     

    Final Thoughts

    Because EMR clusters are expensive, and the data doesn’t persist, it becomes important eventually to automate this setup process. Once you’ve done it once, it’s relatively easy to use the EMR Console’s “Steps” feature to perform this automation. To create the table, you can use a Hive Step and a .sql file containing the commands in this tutorial saved on S3. To change the permissions and start the server, you can use a .sh script saved on S3 via the script-command jar. More information on that process is listed here. Make sure that with the .sh script that you don’t include a bash tag, and that your end line characters are formatted for Unicode, instead of a Windows/Mac text document.

    Another possible solution to the lack of data persistence is to use a remote metastore. By default, Hive creates a local MySQL metastore to save the table metadata. However, you can override this behavior and use your own MySQL or PostgreSQL database as a remote metastore that will persist after the cluster is shut down. This is also advantageous for the Logi application, as the local metastore – Derby – does not allow for concurrent running of queries. Information on how to configure your cluster to use a remote metastore will be coming in a second tutorial, to be published soon!

     

 Like
Easy_Access
Easy Access
Automation and accessibility. If I were to choose two of the most important characteristics in digital transformation for businesses today, these two would be it. Automation and accessibility. While automation may come as a no-brainer, accessibility might leave you wondering what I mean. This feature is what holds the future of business intelligence.Automation needs little explanation. As data collection methods become more sophisticated, so does the analytics of that data. Automation in collection, analysis, and even acting upon the information is the obvious advancement. It removes dependencies for data-based decisions.
The benefits of technology assisted intelligence have enhanced our lives for quite a while now. Early on, it began as a luxury reserved for the computer science literates. Anyone who wanted to enhance their business decisions with technology-fueled analytics, needed to hire a tech-savvy analyst, or two, or three. The analyst would devise a system to gather and analyze the data. Then she would make the data presentable and digestible for the business decision maker.As time goes on, the frustration of having a middle-man to interpret technology-assisted intelligence fueled innovation. Simplification of user experiences to accommodate non-technical decision-makers came out of this frustration. In other words, user experiences are trying to make the intelligence digestible–accessible to every user.

For business intelligence solutions, accessibility of information to a broader audience of users proves indispensable. Accessibility brings relevance to business users. Relevance brings more usability. While automation is already underway as a natural progression of robust digital transformation, let’s keep an eye towards maintaining accessibility so we remain relevant.

 Like
Thoughts on BI and Analytics as Storytelling
Thoughts on BI and Analytics as Storytelling
Generally people are not that good at remembering or retaining an unending list of facts, especially with no context of why they are important. However, they can remember facts that are narrated as a story.In your organization, you should strive for an overall BI or analytics “story.”

Like any story, your story should have a beginning, a middle, and an end. The beginning of the story should tell what you want to convey and why the story matters. The middle should convey the how of the story. And, the end should convey a data narrative.

Data Storytelling = Visualization + Context + Narrative
Data storytelling is not data visualization. Every person has a different perspective and may visualize data differently. For effective storytelling, visualization should be just one component of your storytelling. You also need a textual narrative, such as what the visualization is trying to convey. Ask yourself, “what story are you trying to tell about your organization?” Another important component of storytelling is context, why it’s important or meaningful, what themes do you want to uncover. Without context, both the visual and narrative would not convey the full picture. Further, there isn’t a single correct answer for all storytelling. Based on the narrative your story could be represented as a: an annotated dashboard, a flowchart, a slide show, an infographic, a storyboard.

Also note that all stories are not “happily ever after.” Your story should convey the results as they are, and let your users actively explore and question based on their experience and needs. Then will the data help in proper decision-making.

 

 

 Like

At the beginning of the month, I had the pleasure of attending the Gartner Data & Analytics Summit in Grapevine, TX. I was particularly interested in hearing about latest trends as well as about the Gartner maturity model for analytics (since our own dbSeer framework for analytics is based on the model).

It was most interesting to hear where organizations today fall in the different phases of this maturity model: 74% of organizations currently have Descriptive Analytics, 34% have Diagnostic Analytics, 11% have Predictive, and only 1% has Prescriptive. However, Gartner notes that the market is changing rapidly, predicting a growing trend toward Predictive and Prescriptive Reporting in the next 3-5 years.

Here’s a more detailed recap of how Gartner defines these four different phases:

Descriptive
What is the Descriptive approach? This approach provide trends and information on historical data and answers the question, “What happened?”
What does it provide? It can be used to monitor the past and provides consistent and credible reporting. The data is governed and the reports are based on a predefined set of KPIs.
What kinds of analytic capabilities are used? Reports, Dashboards, and OLAP are the most common analytic capabilities used to monitor the data.
What process and governance does this architecture require? Collect the user requirements, prepare the backend data models, and create the front-end visualizations and reports.
What roles in the organization consume or create these types of reports/dashboards? Data Architect, BI Developer, Information Consumer

Diagnostic
What is the Diagnostic approach? This approach helps explore historical data to answer the question “Why did it happen?”
What does it provide? It can help us to be more agile and look more at insights that the data provides. It directs you to investigate KPIs or review how KPI values were derived.
What kinds of analytic capabilities are used? Self-service reporting, data discovery, self-service data preparation are some of the approaches to explore the data.
What process and governance does this architecture require? Give analysts access to data models, provide a process to certify data, and train analysts on data models and provide support.
What roles in the organization consume or create these types analytics? Analyst, Data Quality Manager, Analytics Support Expert, Data Engineer

Predictive
What is the Predictive Approach? This approach helps in forecasting trends based on historical data and patterns and helps to answer the question, “What will happen?”
What does it provide? More advanced analytics where machine learning has been applied to predict outcomes based on historical trends and discovered insights.
What kinds of analytic capabilities are used? Big data discovery, predictive modeling, graph analysis, behavioral analytics (used to investigate and predict).
What process and governance does this architecture require? Collecting and processing complex and vast data sets, iterating & optimizing analytic models, confirming predictions with business users.
What roles in the organization consume or create these types of reports/dashboards? Data Scientist, Data Steward, Analytics Enterprise Architect

Prescriptive
What is the Prescriptive approach? – This approach is based on predictive data and models to provide and/or automate a set of actions and to help answer the question, “What should I do?”
What does it provide? Decision support & automation to optimize, actions for prescription, automation and industrialization.
What kinds of capabilities are used? Prescriptive engines, simulation & optimization engines (used to prescribe actions and these could be industrialized).
What process and governance does this architecture require? Optimizing business processes and automating analytical models.
What roles in the organization consume or create these types of reports/dashboards? Analytics Project Manager, Analytics System Integrator.

 Like