Welcome to data science 101, your guide to important introductory topics in predictive analytics. If you are an analytics specialist looking to learn more about data science, or even just someone interested in learning more about the topic, then this is the series for you!

We want to show you that if you do not have a strong background in statistics, or if you are not considering yourself as a data scientist, you can still effectively work with and analyze data. Throughout this four part series, we will go through the following topics: 

  1. Sampling and Distribution (click HERE to read)
  2. Correlation and AB Hypothesis Testing (click HERE to read)
  3. Single and Multiple Linear Regression (click HERE to read)

Throughout this series we will use a real data set to help explain and demonstrate the discussed concepts. Should you choose to follow along with us, we have all of our work available to you in the following links: 

  • For the Excel enthusiasts, click HERE to download the Excel workbook file
  • For Python gurus, click HERE to download a zip file that contains the Jupyter Notebook and necessary csv files. When in Jupyter Notebook, only open the file “Blog Post Data Set and Tests (Code).ipynb” to access the notebook with the code. The other files within the folder are just csv files of the data to be read in the notebook. 

All of the calculations were done in both Excel and Jupyter Notebook, but all of our explanations are done using Excel. The Jupyter Notebook file will show you how we coded the same tests as well as the graphics and figures using Python. 

Before jumping into any tests or analysis, we first go through how to approach your raw data set. We have a lot to get through, so let’s get started!   

The Basics: How to Look at Your Data

Before we even begin to analyze our data, it is important to first assess our raw data. Ensuring you have an in depth understanding of your data allows you to better manipulate it and draw better conclusions. When collecting or selecting data, it is important to keep two overarching questions in mind: 

  1. Who is my target audience?
  2. What question am I trying to answer?

When thinking of your target audience, you want to make sure that your data is applicable and appropriate for that population. For example, if you were exploring housing prices for an intended US audience, it would be more beneficial to find data for houses in the United States rather than in Europe or Canada. When thinking of your question, you must ensure that your data is actually is viable for answering it. Think about the type of data you are looking for, whether it be qualitative or quantitative. For example, if you were looking at tumor sizes and dimensions, quantitative data would be more advantageous to use, but if you were looking at tumor attributes, such as shape and texture, qualitative data would be better to look at. In addition, it is important to gather any and all information that you may have about the data set. For example: 

  1. Where was the data collected from?
  2. Who collected the data?
  3. When was the data collected?
  4. How large is the data set?

Some of these questions may be more relevant than others, especially depending on whether your data is primary, meaning it was collected yourself firsthand, or secondary, meaning it was collected by someone else that you intend to use secondhand. The goal of data analytics is to answer a question, and that begins with information outside of just the numbers. 

Throughout this series, we will apply these thought processes and methods to a real, secondary data set pertaining to housing prices. This data set contains a sample of 21,613 houses describing their price, number of bedrooms, number of bathrooms, number of floors, grade, condition, size of the living room, size of lot, square footage of the house above ground level, size of the basement, the year it was renovated, and the year it was built. The data was collected from King County in Washington in 2015. 

We are looking to answer the question: what characteristics of a home affect its price? This is an extremely overarching question that will require multiple methodologies to reap the answers we want. In the next section, we will briefly discuss key statistical terms and methodologies so you can better understand the proceeding content.


Key Statistical Terms and Definitions

Before we dive in, let’s talk definitions. Just like any other field of study, statistics has its own language, or jargon. If you aren’t a native statistics speaker, understanding this language may seem a little bit confusing, but hang with us! Throughout the series we will ensure that you can decode some of the tricky statistics language that we use, starting with this brief section where we define some critical terms and topics that you will hear frequently throughout. If you ever get confused in latter sections, just pop back up here and read through our explanations. Here is a list of some of the important terms we will be using throughout this series: 

  • Primary Data → Data that has been collected by you, firsthand
    • Ex: Surveys, interviews, experiments, focus groups, etc.
  •  Secondary Data → Data that has been collected or produced by someone else and used secondhand by another person other than the researcher
    • Ex: Information libraries, public government data, population census, etc. 
  • Statistical Significance → In essence, this just means that something is of importance and worth investigating. If something is statistically significant, to be a bit more technical, that means that the relationship between two variables is likely not due to chance alone and can be attributed to something else.
  • Practical Significance → Tells us how applicable or relevant our findings actually are; shows the magnitude of our statistical significance
    • P-value → p-value, or probability value, is possibly one of the most important statistical terms to be discussed. This metric places a numerical value on statistical significance. There are different p-value cut-offs that can be used, but it is standard to say that a p-value less than or equal to 0.05 indicates statistical significance (we will be using this cutoff throughout the remainder of the series). 
  • Discrete Variable → A variable that can only take on a finite number of values
    •  Ex: Number of rooms, number of floors, number of pets
  • Continuous Variable → A variable that can take on an infinite number of values
    •  Ex: Height, square footage of a house, weight
  • Hypothesis Testing → Just like a science experiment, in hypothesis testing, we are simply trying to test a hypothesis or a “prediction” that we may have. In hypothesis testing, we will form two types of hypotheses: 
  • Null Hypothesis → Put simply, this is our hypothesis, or statement, that there is nothing going on between our two variables.
    •  Ex: There is no statistically significant difference between the price of pencils at Target and the price of pencils at Walmart 
  • Alternative Hypothesis → This is the claim that we are trying to prove with the hypothesis test; once again, put simply, this means that there is, in fact, something going on between our two variables worth noting 
    •  Ex: There is a statistically significant difference between the price of pencils at Target and the price of pencils at Walmart
    •  Note that all hypothesis testing is done under the assumption that the null hypothesis is true
  • Dependent Variable → Also known as the “y” variable
  • Independent Variable → Also known as the “x” variable; the variable we think has some effect on the dependent variable
  • Linear Regression → This is a commonly used methodology to see if one variable can be used to predict another. There are two types of modeling methods: 
  • Single Linear Regression → Seeing if one, independent variable, or predictor, is good at predicting a dependent variable
    •  Ex: Is an SAT score a good predictor of GPA?
  • Multiple Linear Regression → Seeing if multiple independent variables / predictors have a relationship with a dependent variable
    •  Ex: Are AP, SAT, and ACT scores good predictors of GPA?
  • Correlation → The relationship between variables, typically denoted as r; r is between -1 and 1, with -1 being perfectly, negatively correlated and 1 being perfectly, positively correlated. The closer the r value is to |1|, the stronger the association. If your r value is 0, that means that there is absolutely no correlation between the two variables, so the closer the r value is to 0, the weaker the association. 


If you’re ever confused about certain terminology used throughout, you can jump back up here for a quick refresher!

Can’t get enough? Well neither can we! In our next installment of this series, we will dive into the importance of the distribution curve and sample size, two concepts that are imperative for setting the stage for most of our consequent statistical testing. Click here to read. 



PostgreSQL11 became available on Amazon Relational Database Service (RDS) in March. Have you tried it? We have, and are here to report all of the awesome enhancements. As a preview, there are major improvements to the table partitioning system, added support for stored procedures capable of transaction management, improved query parallelism, added parallelized data definition capabilities, and just-in-time (JIT) compilation for accelerating the execution of expressions in queries. We’ll now go more in depth about each of these improvements, and by the end of this, trust me, you’ll want to go give it a try!

Improvements to partitioning functionality

  • Partitioning can now be created on hashing a key column
  • Supports for PRIMARY KEY, FOREIGN KEY, and indexes on partitioned tables
  • Partitioned tables can have a “default” partition to store data that does not match any of the other defined partitions
  • On UPDATES, rows are moved to appropriate partitions if partition key column data changes
  • Faster partition elimination during query processing and execution speeds up SELECT queries

Lightweight and Fast ALTER TABLE for NOT NULL Column with DEFAULT Values

  • With this new version, ALTER table doesn’t do a table rewrite when adding a column with non-null default values. This significantly helps when altering tables with millions of records where a new column is added with a default value.

Stored Procedures with Transaction Control

  • Finally, Postgres 11 supports creating stored procedures. Prior versions of Postgres supported functions, however, functions cannot run transactions. With the support of Stored Procedures you can now COMMIT and ROLLBACK transactions with the Stored Procedure.

Improvements to Parallelism

  • CREATE INDEX can now use parallel processing while building a B-tree index
  • Parallelization is now possible in CREATE TABLE…AS, CREATE MATERIALIZED VIEW, and certain queries using UNION
  • Hash joins performed in parallel
  • Improvements to partition scans to more efficiently use parallel workers
  • Sequential scans now perform better with many parallel works


  • Selection of the most common values (MCVs) has been improved. MCVs earlier were chosen based on their frequency compared to all common values. In Postgres 11, MCVs are chosen based on their frequency as compared to non-MCV values
  • Selectivity estimates for >= and <= has been improved. This improves the performance using BETWEEN
  • Improvements to the optimizer row counts for EXISTS and NOT EXISTS queries

Optimal Just-in-Time (JIT) Compilation

  • Just-in-Time (JIT) compilation is the process of turning some form of interpreted program evaluation into a native program, and doing so at run time. JIT would be beneficial for CPU bound queries. JIT currently aims to optimize two essential parts of query execution: expression evaluation and tuple deforming.

Expression evaluation is used to evaluate WHERE clauses, target lists, aggregates, and projections. It can be accelerated by generating code specific to each case.

Tuple deforming is the process of transforming an on-disk tuple into its in-memory representation. It can be accelerated by creating a function specific to the table layout and the number of columns to be extracted.

I know you won’t believe it, but these aren’t even all of the benefits of the new PostgreSQL 11. There are so many improvements for Window functions, indexes, and monitoring that would be greatly beneficial. If that doesn’t get you excited, I don’t know what will! The best way to use PostgreSQL 11 is with Amazon RDS. Reach out to our team if you’d like to get started with AWS or want to unlock the full potential of your current environment!


Let’s admit it – managing licenses is difficult. This complex process often involves manual or ad-hoc reporting that can quickly become outdated or result in inaccuracies. Within AWS, licenses are used across a variety of tools, which only making the situation worse. We’ve heard this compliant many times from our customer base, and decided it was time to introduce a solution: AWS License Manager!


This service is available to all AWS customers and provides an easy way to manage licenses in AWS and on-premises servers from software vendors like Microsoft, SAP, Oracle, and IBM. Here are four reasons why you should take advantage of this service:



  1. It’s Simple

AWS License Manager gives you a single, centralized view that allows tracking of all the licenses across AWS and on-premises. You can track how many licenses are being used, how many are available, and how many have breached limits – all on the built-in dashboard. AWS License Manager integrates with AWS services to simplify management of licenses across multiple AWS accounts, IT catalogs, and on-premises from one AWS account.

  1. You Have Control

As an Administrator, you can create your own custom licensing rules that fit the terms of your licensing agreements, giving you control over license usage. These rules can be made centrally or you can specify different rules for various groups in your organization. These rules will be enforced when an EC2 instance is launched. With AWS License Manager, you have visibility over how software licenses are used and can prevent misuse before it happens.

  1. Lower Costs

Say goodbye to wondering if the right number of licenses are being used or worrying if additional licenses are required! AWS License Manager does all of this for you, saving you the time and costs of tracking and managing licenses. You can also enforce controls on software usage to reduce the chance of overages.

  1. Reduced Risk of Violations

The consolidated view of your licenses reduces the risk of non-compliance. Additionally, the rules administrators set can limit violations for using more licenses than an agreement stipulates, and by reassigning licenses to different server on a short-term basis. It’s possible to limit a licensing breach by stopping the instance from launching or by automatically notifying the administrators about the infringement.

Well, there you have it – four reasons why you should use AWS License Manager and 0 reasons why you shouldn’t (because they don’t exist)! Do yourself a favor and start using this service to keep you compliant and to save you time, effort, and money.

If you have issues with set-up or have questions about the service, feel free to contact us!


It’s been over 11 years since AWS began supporting Microsoft Windows workloads. In that time, AWS has innovated constantly to maintain its title as the #1 cloud provider for these workloads. You can run the full Windows Stack on AWS, including Active Directory, SQL Server, and System Center.

Many third parties have completed studies that show why AWS is superior when it comes to performance, cost, and reliability. In 2018, the next-largest cloud provider had almost 7x more downtime hours than AWS. Additionally, SQL Server on AWS boasts a 2-3x better performance record. When costs are calculated correctly, SQL Server running on AWS’s competitor’s platform would be almost twice as much. This includes the cost of storage, compute, and networking.

Reliability is the quality that puts AWS high above the rest. AWS has 64 availability zones within 21 different regions. AWS customers can deploy their applications across multiple zones in the same region for fault tolerance and latency. Instead of having a single-region instance that scales up, AWS’s services are divided into smaller cells that scale out within a region. This design reduces the effects when a cell-level failure occurs. Notably, AWS has never experienced a network event that spans multiple regions.

When migrating your SQL Server Workloads to AWS, there are a few things you should consider. It’s important to optimize your total cost of ownership, which includes optimizing your workloads to benefit from the scalability and flexibility of the cloud. On-premises servers are not optimized, in fact, 84% of workloads are over-provisioned. Many Windows and SQL Server 2008 workloads are running on older and slower server hardware. To optimize your cloud migration, you need to size your workloads for performance and capability, not by physical servers. To reduce cost, you can also decrease the number of licenses that you use by server and core counts.

Another strategy is to decouple your storage and compute processes. When these are combined, they must be scaled together. On the cloud, compute and storage can be separated. Decoupling makes elasticity easier to achieve and manage. Many people question this because SQL Server instances often contain logic to ingest or process data before it is stored in a schema. Many time, ETL logic is written within SQL Server processing engine and the servers are sized to handle a large volume of ETL processes. These ETL processes often run a couple of times a day and the capacity is only needed during the time it is running. By moving the ETL logic outside of the SQL engine, you can utilize the elasticity of the cloud and expand your compute power whenever needed. This will reduce your SQL Server footprint in the long-run. Of course, this doesn’t apply to every use case in SQL Server, but if you have ETL logic, enrichment logic, or load logic inside your SQL Server, decoupling might be the correct choice. This was the case with one of our customers, Telarix. See here to read their white paper.

As part of your migration, you should consider running your SQL Server on a Linux Instance within the AWS platform. The majority of current Windows functionality is supported on the Linux platform. Additionally, there is a minimum of a 20% cost benefit of running SQL Server on a Linux instance! This decision will give you the best performance and save you the most money.

You can also use Amazon RDS to upgrade your database instance to SQL Server. This is performed in place and is initiated with just a couple of clicks. Before you upgrade, you can create a snapshot backup, use it to create a test DB instance, then upgrade that instance to the desired new version. You can also opt-in to automatic upgrades that take place within your preferred maintenance window. 

If you’re considering migrating your Windows Workloads to the cloud, AWS is the optimal choice because of the price, performance, and reliability. This is the perfect time to migrate and modernize your outdated servers. Contact us to learn more or get started on your project.


AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

AWS customers gain numerous benefits for moving or building new applications using Elastic Beanstalk. These include capacity provisioning, load balancing, auto-scaling, and application health monitoring. At the same time, customers retain full control over the AWS resources powering their application and are able to access the underlying resources at any time.

AWS Elastic Beanstalk if free to use and AWS customers only pay for the underlying AWS resources used to store and run the application.

AWS customers have two options for getting started with Elastic Beanstalk:

  • Re-host: This is the fastest option. No code changes are required, and it needs less testing and migration time. Re-hosting does not use all the features of the cloud like multi-AZ for high availability. The DB stays on-premise and the .NET app is moved to EC2 using Elastic Beanstalk.
  • Re-platform: In this method, the DB server is migrated to the cloud using manual backup and RDS SQL. A manual backup for the SQL server must be created first and then restored into the newly created RDS. Next, the RDS connection string will be provided to the Application and deployed using Elastic Beanstalk. This approach requires more testing and comparison between the old and new environment.

In this blog we are going to introduce how to move an existing .Net application to AWS using Elastic Beanstalk.

This tutorial will use the following:

Create a .NET application:

You can use your existing .NET application or create a new one. In this tutorial we are using a sample .NET application that you can download here.

  1. Download and Verify that the application is running. 
  2. Create the Environment – We will create an Elastic Beanstalk environment to deploy the application to AWS. Login to your AWS console and use the Create New Application wizard in the Elastic Beanstalk console to create the application environment. For Platform, choose .NET.

To launch an environment (console):

  1. Open the Elastic Beanstalk console using this preconfigured link:
  2. For Platform, choose the platform that matches the language used by your application.
  3. For Application code, choose Sample application.
  4. Choose Review and launch.
  5. Review the available options. When you’re satisfied with them, choose Create app.


When the environment is up and running, add an Amazon RDS database instance that the application can use to store data. For DB engine, choose sqlserver-ex.



Add a DB instance to your environment:

  1. Open the Elastic Beanstalk console.
  2. Navigate to the management page for your environment.
  3. Choose Configuration.
  4. On the Database configuration card, choose Modify.
  5. Choose a DB engine, and enter a user name and password.
  6. Choose Apply.


Modify the connection string in the application to use the new created RDS and verify that its working as expected. In most cases you want to migrate the existing database to RDS. For more information on this, see here, here, or here.

Deploy the application to AWS using AWS Elastic Beanstalk

  1. In Visual Studio, open sln.
    1. Note: If you haven’t done so already, you can get the sample here.
  2. On the View menu, choose Solution Explorer.
  3. Expand Solution ‘BeanstalkDotNetSample’ (2 projects).
  4. Open the context (right-click) menu for MVC5App, and then choose Publish to AWS.
  5. Add your AWS account credentials by selecting Account profile to use.
  6. Select redeploy to an exiting environment. You should see the created AWS Beanstalk profile.
  7. On the Application Options page, accept all of the defaults, and then choose Next.
  8. On the Review page, choose Deploy.
  9. Monitor the deploying status on the output box. 
  10. When the application has successfully been deployed, the Output box displays completed successfully
  11. Return to the AWS Elastic Beanstalk console and choose the name of the application, which appears next to the environment name. 


If you follow these simple steps, you can easily migrate your .NET applications to AWS using AWS Elastic Beanstalk.

When you are finished working with AWS Elastic Beanstalk, you can terminate your .NET environment.

How to Terminate your AWS Elastic Beanstalk environment:

  1. Open the Elastic Beanstalk console.
  2. Navigate to the management page for your environment.
  3. Choose Actions and then choose Terminate Environment.

Elastic Beanstalk cleans up all of the AWS resources associated with your environment. This includes EC2 instances, DB instance, load balancer, security groups, CloudWatch alarms, and more.


In this tutorial, we created a new .NET application and RDS SQL server and deployed it to AWS using AWS Elastic Beanstalk. Following these steps, you can deploy your existing .NET application and migrate the DB to AWS RDS to get the benefits of high availability and scalability. Alternatively, you can use your existing on-premise database servers and gain the benefits of AWS scalability and highly available EC2 Instances with AWS Elastic Load Balancer, security, live monitoring, and more.

Using AWS Elastic Beanstalk you can easily deploy your applications and monitor them afterwards by viewing the logs. You can then scale up or down based on your application needs.

Did you try this? Was it helpful? Let us know in the comments!





Security is often the number one concern of our clients, especially when moving their data and applications to the cloud. The public cloud operates on a shared responsibility model. This means that the customer’s cloud provider (for example, AWS) is responsible for security of the cloud, and the customer is responsible for security within the cloud. This distinction can get confusing for new customers, leaving them wondering what they are really responsible for when it comes to security. To help, we have walked through seven simple ways to secure your RDS architecture below.

  1.  Build your database instance in Amazon Virtual Private Cloud

Amazon Virtual Private Cloud (VPC) give you the greatest possible network access control. With Amazon VPC, you have control over your virtual networking environment. For example, you can create subnets, select our own IP address range, and configure routing and access controls. Amazon RDS functionality is the same whether your DB instance is running in an Amazon VPC or not and there is no additional cost.


  1. Encrypt your RDS Resources

You can use RDS Encryption to secure your RDS instances and snapshots at rest. RDS encryption uses the industry standard AES-256 encryption algorithm to encrypt your data on the server that hosts your RDS instance. Data that is encrypted at rest includes the underlying storage for DB instances, its automated backups, Read Replicas, and snapshots.


  1. Encrypting Data at Transit using Secure Socket Layer

You can use Secure Socket Layer (SSL) connections with DB instances running the MySQL, MariaDB, PostgreSQL, Oracle, or Microsoft SQL Server database engines. Each database instance has a different process for implementing SSL, but you can see step by step instructions for each DB instance here.


  1. Use AWS Identity and Access Management

AWS Identity and Access Management (IAM) policies are used to assign permissions. These determine who is allowed to manage RDS resources. You can set different permissions for who can create, describe, modify, and delete DB instances, as well as tag resources or modify security groups.


  1. Assign Proper Security Groups

You should use security groups to manage what Amazon EC2 instances or IP addresses can connect to your databases on a DB instance. When a DB instance is first created, its firewall prevents any database access except through rules made by an associated security group.


  1. Implement Network Encryption

Network encryption and transparent data encryption with Oracle database instances can be used to improve security of your RDS Architecture. With native network encryption, you can encrypt data as it moves to and from a DB instance. Oracle transparent data encryption automatically encrypts data before it is written to storage and automatically decrypts data when the data is read from storage.


We hope this blog gave you some fresh ideas on how to secure your RDS architecture! Let us know if you have any questions or issues.


A common scenario that companies face is controlling the usage of the relational databases (RDS) that they have for development or testing environments. Stopping the RDS when it’s not being used can significantly lower the company’s AWS costs.

One of the ways to stop an RDS Instance is by using AWS Cloud Watch Events in conjunction with AWS Lambda Function written in Python 2.7.

In this example, we have RDS development databases that we need to stop every day at 10 PM EST.

Below are the four steps required to do this:

Step 1 – Create IAM Role/Policy: AWS Lambda will need to assume a role that has access to AWS services like RDS and Cloud Watch.

IAM Policy: First we need to create a new policy that we will attach to the IAM role later:

  • From AWS console, choose IAM service and click on Policies
  • Click on the “Create Policy” button
  • Click on JSON Tab and use below JSON to allow access to some actions in RDS and Cloud Watch Logs Services

  • Click on the “Review Policy” button.
  • Provide a name for the policy “policy_stop_rds” and some description
  • Click “Create Policy”


IAM Role:

  • From AWS console, choose IAM service and click on Roles
  • Click on “Create Role” button
  • Choose “Lambda” as the service that will use the role and click “Next: Permissions”
  • Search for the policy we created on previous steps “policy_stop_rds,” check it, and then click “Next: Tags”
  • Add any tag key value pairs needed and then click “Next: Review”
  • Choose a Role name “lambda-stop-rds” and then click “Create role”

Step 2 – Create AWS Lambda Function:

  • From AWS console, choose Lambda service. Make sure that you’re in the same region where your RDS resides
  • Click on “Create Function” and choose “Author from scratch”.
    • Name: rdsInstanceStop
    • Runtime: choose python2.7
    • Role: choose existing role “lambda-stop-rds”

  • The lambda function will be created, resulting in this page:

  • In another tab, open the IAM role we created in previous steps “lambda-stop-rds”
  • Add the Lambda Function ARN to the inline policies in IAM role we created in first step (you can get this from Lambda Function page in the right top corner)
    • Click on “add inline policy” and add the following JSON, replace the resource with the ARN from above step, and save the policy with name “policy_inline_stop_rds”

  • Designer: From Lambda function page, make sure that resources section has “AWS lambda” added.

  • Function Code: Add the following function code to the Lambda function:

  • Environment variables: To use this Lambda function in other environments, we used the environment variables to define the db instance name

  • Tags: Add a key value pairs of tags as needed
  • Execution Role: use the existing execution role “lambda-stop-rds”
  • Keep all other settings with default values


Step 3 – Test the Lambda function:

  • On the top right corner of the screen select Testand Configure Test events
  • Select Create New Test Eventand select the Hello World event template
  • When you click Save, the execution should succeed
    • If your DB is not started, there is nothing to stop and hence you will get an error message similar to “Instance <> is not in available state.”


Step 4 – Schedule the run of the Lambda function:

  • From AWS console, choose Cloud Watch and click on Rules
  • Click on “Create rule”
  • Choose “Schedule” and use cron expression “0 10 * * ? *” to run the event every day at 10 PM EST.

  • Add the Lambda function we create previously as a target by clicking “Add target” and then click on “Configure details”

  • Fill in the rule name ie. “stop rds” and the description
  • Click Create rule
  • After this, wait until 10 PM EST to see if the Lambda function gets triggered, this can be checked by going to the Cloud Watch Logs under the log group “/aws/lambda/rdsInstanceStop”


We hope this was helpful for you! Let us know if you have any issues or recommendations.



As part of the digital transformation, companies are moving their infrastructure and applications to the cloud at a faster rate than ever.

There are many approaches to migrating to the cloud – each with their own benefits and drawbacks. It’s important to be knowledgeable on each option in order to make the best decision possible for your organization.

The three primary choices are Rehost, Replatform, and Refactor, which we will walk through now.





Rehosting, often also called lift and shift, is the simplest of the migration options. Applications are simply moved from on-premise to the cloud without any code modification. This is considered a beginner’s approach to migration. Some of the benefits include that it’s a very fast option and requires very little resources. There is also minimal application disruption and it is cheaper than maintaining an on-premises environment. Because this migration is so simple, companies don’t typically benefit from cloud-native features like elasticity, which can be achieved from the other migration techniques.

Overall, if a company is looking for a quick and easy migration that doesn’t disrupt the existing application workflow, Rehosting is the best choice. This is a fast solution for organizations that need to reduce their on-premises physical infrastructure costs as soon as possible. Thankfully, companies can always re-architect and optimize their application once they are already in the cloud.



Replatforming involves moving a company’s assets to the cloud with a little up-versioning. A portion of the application is changed or optimized before moving to the cloud. Even a small amount of cloud optimization (without changing the core application structure) can lead to significant benefits. This approach takes advantage of containers and VMs, only changing application code if needed to use base platform services.

Replatform provides a suitable middle ground between rehosting and refactoring. It allows companies to take advantage of cloud functionality and cost optimization without using the resources required for refactoring. This approach also allows developers to use the resources they are used to working with, including development frameworks and legacy programming languages. This approach is slower than rehosting and doesn’t provide as many benefits as refactoring.

Organizations should choose this approach if they are looking to leverage more cloud benefits and if minor changes won’t change their applications functioning. Also, if a company’s on-premises infrastructure is complex and is preventing scalability and performance, some slight modifications that would allow them to harness these features in the cloud would be very worthwhile.



The most complex option is refactoring, which includes a more advanced process of rearchitecting and recoding some portion of an existing application. Unlike Replatforming, this option makes major changes in the application configuration and the application code in order to best utilize cloud-native frameworks and functionality. Due to this, refactoring typically offers the lowest monthly cloud costs. Customers who refactor are maximizing operational cost efficiency in the cloud. Unfortunately, this approach is also very time consuming and resource-intensive.

Companies should choose to refactor when there is a strong business need to add features and performance to the application that is only available in the cloud, including scalability and elasticity. Refactoring puts a company in the best position to boost agility and improve business continuity.


There is no migration approach that is always the best option for every case. Rather, companies should take into consideration their short- and long-term business goals and choose what is right for their current situation. If you need help deciding, contact us to discuss options – we’re always happy to talk!



If you’re here, you’re probably experiencing a common issue: trying to access a certain port on an EC2 Instance located in a private subnet of the Virtual Private Cloud (VPC). A couple of months ago, we got a call from one of our customers that was experiencing the same issue. They wanted to open up their API servers on the VPC to one of their customers, but they didn’t know how. In particular, they were looking for a solution that wouldn’t compromise the security of their environment. We realized this issue is not unique to our customer, so we thought a blog post explaining how we solved it would be helpful!

To provide some context, once you have an API server within your VPC, it is closed to the outside world. No one can access or reach that server because of the strong firewall around it. There are a few ways around this, including Virtual Private Network (VPN) connections to your VPC, which allows you to open up private access. Unfortunately, this is not a viable solution if you need to open up your API server to the world, which was the case with our customer. The goal was to provide direct access from the internet outside the VPC for any user without VPN connection.

In order to solve this issue for our customer, one of the architecture changes we recommended was adding an internet-facing AWS TCP Network Load Balancer on the public subnet of the VPC. In addition to this load balancer, we also needed to create an instance-based target group.

Keep reading to learn how you can do this – we even included an architecture diagram to make things easier! Please note that our example includes fake IP addresses.

Problem: Accessing an API endpoint in an EC2 Instance in a Private Subnet from the Internet.

Suggested AWS Architecture Diagram:



Features of our diagram:

  • Multi AZ: we used a private and public subnet in the same VPC in two different availability zones.
  • Multi EC2 (API Servers): we deployed an API server in each private subnet in each availability zone.
  • Multi NAT Gateways: a NAT gateway will allow the EC2 instances in the private subnets to connect to the internet and achieve high availability. We deployed one NAT gateway in the public subnets in each availability zone.
  • TCP Load balancer health checks: a TCP load balancer will always redirect any user’s requests to the healthy API servers. In case one AZ goes down, there will be another AZ that can handle any user’s requests.

Although we did not make this change, you can also implement Multi-Region to handle a region failure scenario and enable higher availability.

VPC Configurations:






EC2 Configuration:

NameAZSubnetPrivate IPSecurity Group
API Server1us-east-1aprivate-subnet-a172.16.1.**Allow inbound traffic to TCP port 5000 from or any specific source IP address on internet.
API Server2us-east-1bprivate-subnet-b172.16.2.**Allow inbound traffic to TCP port 5000 from or any specific source IP address on internet.



  • Create a TCP network load balancer:
    • Internet facing
    • Add listener on TCP port 5000
    • Choose public subnets with same availability zone (AZ) as your private subnets
  • Create an instance based target group:
    • Use TCP protocol on port 5000
    • For health check, either use TCP on port 5000 or HTTP health check path
    • Add the two API servers to the target instances to achieve high availability and balance the request load between different servers
  • Once the target instances (API servers) become healthy, you will be able to access the API endpoints from the public internet directly using the new TCP load balancer DNS name or elastic IP address on port 5000




IDC recently produced a report analyzing the Windows Server Market (access a summary here). The report discovered that more and more organizations are transitioning their Windows workloads to the public cloud. Windows Server Deployments in the cloud more than doubled from 8.8% in 2014 to 17.6% in 2017. Migrating to the cloud allows organizations to grow and work past the limitations of on-premises data centers, providing improved scalability, flexibility, and agility.

The report found that of all Windows Cloud Deployments, 57.7% were hosted on Amazon Web Services (AWS). AWS’s share of the market was nearly 2x that of the nearest cloud provider in 2017. Second in line was Microsoft Azure, which hosted 30.9% of Windows instances deployed on the public cloud IaaS market. Over time, the Windows public cloud IaaS market will continue to expand because of the growing usage of public cloud IaaS among enterprises and the movement of Windows workloads into public cloud IaaS.

When it comes to Windows, AWS helps you build, deploy, scale, and manage Microsoft applications quickly and easily. AWS also has top-notch security and a cost-effective pay-as-you-go model.  On top of this, AWS provides customers with a fully-managed database service to run Microsoft SQL Server. These services give customers the ability to build web, mobile, and custom business applications.

This begs one to ask the question: why is AWS the most sought-after cloud hosting service, considering there are other options in the market? We believe it is because of the plethora of services AWS offers. As a software company in particular, if you’re looking to migrate and rearchitect your legacy system, AWS provides an excellent set of services that companies can utilize to make their application cloud native. The richness of these services often allows companies to decouple their legacy architecture without needing to overhaul the entire legacy application within their Windows platform.

Let’s walk through a few examples. For starters, take an application that would traditionally host storage-intensive data such as video, pictures, and documents. This application can keep its Windows platform as is and move the files into Amazon S3 with minimal architecture change. AWS allows its customers to decouple their storage and move it into S3 in a cost-effective, secure, and efficient manner. Next, applications with heavy back-end processing can benefit from services such as lambda, EMR, and Spark. AWS allows customers to decouple the compute needs into relevant services for their application. Lastly, imagine an application that hosts significant historical data with requirements to search or query that data. Traditionally, some customers would need to keep all the historical and archived data in a database, but with AWS they can maintain the same architecture and move the archived data to S3 and use services like Amazon Athena and Elasticsearch to query and search this data without a need for a database. This helps to reduce their footprint on a database and cut down on costs.

Examples like these demonstrate why we believe AWS is a superior cloud-hosting service for Windows workloads. If a company is looking at their architecture holistically, AWS provides a comprehensive solution for compute, networking, storage, security, analytics, and deployment. This has been proven to us time and time again through customer migrations.

dbSeer has a strong track record of helping customers successfully migrate to the AWS Cloud. Contact us anytime to learn more!