Blog

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

AWS customers gain numerous benefits for moving or building new applications using Elastic Beanstalk. These include capacity provisioning, load balancing, auto-scaling, and application health monitoring. At the same time, customers retain full control over the AWS resources powering their application and are able to access the underlying resources at any time.

AWS Elastic Beanstalk if free to use and AWS customers only pay for the underlying AWS resources used to store and run the application.

AWS customers have two options for getting started with Elastic Beanstalk:

  • Re-host: This is the fastest option. No code changes are required, and it needs less testing and migration time. Re-hosting does not use all the features of the cloud like multi-AZ for high availability. The DB stays on-premise and the .NET app is moved to EC2 using Elastic Beanstalk.
  • Re-platform: In this method, the DB server is migrated to the cloud using manual backup and RDS SQL. A manual backup for the SQL server must be created first and then restored into the newly created RDS. Next, the RDS connection string will be provided to the Application and deployed using Elastic Beanstalk. This approach requires more testing and comparison between the old and new environment.

In this blog we are going to introduce how to move an existing .Net application to AWS using Elastic Beanstalk.

This tutorial will use the following:

Create a .NET application:

You can use your existing .NET application or create a new one. In this tutorial we are using a sample .NET application that you can download here.

  1. Download and Verify that the application is running. 
  2. Create the Environment – We will create an Elastic Beanstalk environment to deploy the application to AWS. Login to your AWS console and use the Create New Application wizard in the Elastic Beanstalk console to create the application environment. For Platform, choose .NET.

To launch an environment (console):

  1. Open the Elastic Beanstalk console using this preconfigured link: aws.amazon.com/elasticbeanstalk/home#/newApplication?applicationName=tutorials&environmentType=LoadBalanced
  2. For Platform, choose the platform that matches the language used by your application.
  3. For Application code, choose Sample application.
  4. Choose Review and launch.
  5. Review the available options. When you’re satisfied with them, choose Create app.

 

When the environment is up and running, add an Amazon RDS database instance that the application can use to store data. For DB engine, choose sqlserver-ex.

 

 

Add a DB instance to your environment:

  1. Open the Elastic Beanstalk console.
  2. Navigate to the management page for your environment.
  3. Choose Configuration.
  4. On the Database configuration card, choose Modify.
  5. Choose a DB engine, and enter a user name and password.
  6. Choose Apply.

 

Modify the connection string in the application to use the new created RDS and verify that its working as expected. In most cases you want to migrate the existing database to RDS. For more information on this, see here, here, or here.

Deploy the application to AWS using AWS Elastic Beanstalk

  1. In Visual Studio, open sln.
    1. Note: If you haven’t done so already, you can get the sample here.
  2. On the View menu, choose Solution Explorer.
  3. Expand Solution ‘BeanstalkDotNetSample’ (2 projects).
  4. Open the context (right-click) menu for MVC5App, and then choose Publish to AWS.
  5. Add your AWS account credentials by selecting Account profile to use.
  6. Select redeploy to an exiting environment. You should see the created AWS Beanstalk profile.
  7. On the Application Options page, accept all of the defaults, and then choose Next.
  8. On the Review page, choose Deploy.
  9. Monitor the deploying status on the output box. 
  10. When the application has successfully been deployed, the Output box displays completed successfully
  11. Return to the AWS Elastic Beanstalk console and choose the name of the application, which appears next to the environment name. 

 

If you follow these simple steps, you can easily migrate your .NET applications to AWS using AWS Elastic Beanstalk.

When you are finished working with AWS Elastic Beanstalk, you can terminate your .NET environment.

How to Terminate your AWS Elastic Beanstalk environment:

  1. Open the Elastic Beanstalk console.
  2. Navigate to the management page for your environment.
  3. Choose Actions and then choose Terminate Environment.

Elastic Beanstalk cleans up all of the AWS resources associated with your environment. This includes EC2 instances, DB instance, load balancer, security groups, CloudWatch alarms, and more.

 

In this tutorial, we created a new .NET application and RDS SQL server and deployed it to AWS using AWS Elastic Beanstalk. Following these steps, you can deploy your existing .NET application and migrate the DB to AWS RDS to get the benefits of high availability and scalability. Alternatively, you can use your existing on-premise database servers and gain the benefits of AWS scalability and highly available EC2 Instances with AWS Elastic Load Balancer, security, live monitoring, and more.

Using AWS Elastic Beanstalk you can easily deploy your applications and monitor them afterwards by viewing the logs. You can then scale up or down based on your application needs.

Did you try this? Was it helpful? Let us know in the comments!

 

 

 

 Like

Security is often the number one concern of our clients, especially when moving their data and applications to the cloud. The public cloud operates on a shared responsibility model. This means that the customer’s cloud provider (for example, AWS) is responsible for security of the cloud, and the customer is responsible for security within the cloud. This distinction can get confusing for new customers, leaving them wondering what they are really responsible for when it comes to security. To help, we have walked through seven simple ways to secure your RDS architecture below.

  1.  Build your database instance in Amazon Virtual Private Cloud

Amazon Virtual Private Cloud (VPC) give you the greatest possible network access control. With Amazon VPC, you have control over your virtual networking environment. For example, you can create subnets, select our own IP address range, and configure routing and access controls. Amazon RDS functionality is the same whether your DB instance is running in an Amazon VPC or not and there is no additional cost.

 

  1. Encrypt your RDS Resources

You can use RDS Encryption to secure your RDS instances and snapshots at rest. RDS encryption uses the industry standard AES-256 encryption algorithm to encrypt your data on the server that hosts your RDS instance. Data that is encrypted at rest includes the underlying storage for DB instances, its automated backups, Read Replicas, and snapshots.

 

  1. Encrypting Data at Transit using Secure Socket Layer

You can use Secure Socket Layer (SSL) connections with DB instances running the MySQL, MariaDB, PostgreSQL, Oracle, or Microsoft SQL Server database engines. Each database instance has a different process for implementing SSL, but you can see step by step instructions for each DB instance here.

 

  1. Use AWS Identity and Access Management

AWS Identity and Access Management (IAM) policies are used to assign permissions. These determine who is allowed to manage RDS resources. You can set different permissions for who can create, describe, modify, and delete DB instances, as well as tag resources or modify security groups.

 

  1. Assign Proper Security Groups

You should use security groups to manage what Amazon EC2 instances or IP addresses can connect to your databases on a DB instance. When a DB instance is first created, its firewall prevents any database access except through rules made by an associated security group.

 

  1. Implement Network Encryption

Network encryption and transparent data encryption with Oracle database instances can be used to improve security of your RDS Architecture. With native network encryption, you can encrypt data as it moves to and from a DB instance. Oracle transparent data encryption automatically encrypts data before it is written to storage and automatically decrypts data when the data is read from storage.

 

We hope this blog gave you some fresh ideas on how to secure your RDS architecture! Let us know if you have any questions or issues.

 Like

A common scenario that companies face is controlling the usage of the relational databases (RDS) that they have for development or testing environments. Stopping the RDS when it’s not being used can significantly lower the company’s AWS costs.

One of the ways to stop an RDS Instance is by using AWS Cloud Watch Events in conjunction with AWS Lambda Function written in Python 2.7.

In this example, we have RDS development databases that we need to stop every day at 10 PM EST.

Below are the four steps required to do this:

Step 1 – Create IAM Role/Policy: AWS Lambda will need to assume a role that has access to AWS services like RDS and Cloud Watch.

IAM Policy: First we need to create a new policy that we will attach to the IAM role later:

  • From AWS console, choose IAM service and click on Policies
  • Click on the “Create Policy” button
  • Click on JSON Tab and use below JSON to allow access to some actions in RDS and Cloud Watch Logs Services

  • Click on the “Review Policy” button.
  • Provide a name for the policy “policy_stop_rds” and some description
  • Click “Create Policy”

 

IAM Role:

  • From AWS console, choose IAM service and click on Roles
  • Click on “Create Role” button
  • Choose “Lambda” as the service that will use the role and click “Next: Permissions”
  • Search for the policy we created on previous steps “policy_stop_rds,” check it, and then click “Next: Tags”
  • Add any tag key value pairs needed and then click “Next: Review”
  • Choose a Role name “lambda-stop-rds” and then click “Create role”

Step 2 – Create AWS Lambda Function:

  • From AWS console, choose Lambda service. Make sure that you’re in the same region where your RDS resides
  • Click on “Create Function” and choose “Author from scratch”.
    • Name: rdsInstanceStop
    • Runtime: choose python2.7
    • Role: choose existing role “lambda-stop-rds”

  • The lambda function will be created, resulting in this page:

  • In another tab, open the IAM role we created in previous steps “lambda-stop-rds”
  • Add the Lambda Function ARN to the inline policies in IAM role we created in first step (you can get this from Lambda Function page in the right top corner)
    • Click on “add inline policy” and add the following JSON, replace the resource with the ARN from above step, and save the policy with name “policy_inline_stop_rds”

  • Designer: From Lambda function page, make sure that resources section has “AWS lambda” added.

  • Function Code: Add the following function code to the Lambda function:

  • Environment variables: To use this Lambda function in other environments, we used the environment variables to define the db instance name

  • Tags: Add a key value pairs of tags as needed
  • Execution Role: use the existing execution role “lambda-stop-rds”
  • Keep all other settings with default values

 

Step 3 – Test the Lambda function:

  • On the top right corner of the screen select Testand Configure Test events
  • Select Create New Test Eventand select the Hello World event template
  • When you click Save, the execution should succeed
    • If your DB is not started, there is nothing to stop and hence you will get an error message similar to “Instance <> is not in available state.”

 

Step 4 – Schedule the run of the Lambda function:

  • From AWS console, choose Cloud Watch and click on Rules
  • Click on “Create rule”
  • Choose “Schedule” and use cron expression “0 10 * * ? *” to run the event every day at 10 PM EST.

  • Add the Lambda function we create previously as a target by clicking “Add target” and then click on “Configure details”

  • Fill in the rule name ie. “stop rds” and the description
  • Click Create rule
  • After this, wait until 10 PM EST to see if the Lambda function gets triggered, this can be checked by going to the Cloud Watch Logs under the log group “/aws/lambda/rdsInstanceStop”

 

We hope this was helpful for you! Let us know if you have any issues or recommendations.

 

 Like

As part of the digital transformation, companies are moving their infrastructure and applications to the cloud at a faster rate than ever.

There are many approaches to migrating to the cloud – each with their own benefits and drawbacks. It’s important to be knowledgeable on each option in order to make the best decision possible for your organization.

The three primary choices are Rehost, Replatform, and Refactor, which we will walk through now.

 

 

 

Rehost

Rehosting, often also called lift and shift, is the simplest of the migration options. Applications are simply moved from on-premise to the cloud without any code modification. This is considered a beginner’s approach to migration. Some of the benefits include that it’s a very fast option and requires very little resources. There is also minimal application disruption and it is cheaper than maintaining an on-premises environment. Because this migration is so simple, companies don’t typically benefit from cloud-native features like elasticity, which can be achieved from the other migration techniques.

Overall, if a company is looking for a quick and easy migration that doesn’t disrupt the existing application workflow, Rehosting is the best choice. This is a fast solution for organizations that need to reduce their on-premises physical infrastructure costs as soon as possible. Thankfully, companies can always re-architect and optimize their application once they are already in the cloud.

 

Replatform

Replatforming involves moving a company’s assets to the cloud with a little up-versioning. A portion of the application is changed or optimized before moving to the cloud. Even a small amount of cloud optimization (without changing the core application structure) can lead to significant benefits. This approach takes advantage of containers and VMs, only changing application code if needed to use base platform services.

Replatform provides a suitable middle ground between rehosting and refactoring. It allows companies to take advantage of cloud functionality and cost optimization without using the resources required for refactoring. This approach also allows developers to use the resources they are used to working with, including development frameworks and legacy programming languages. This approach is slower than rehosting and doesn’t provide as many benefits as refactoring.

Organizations should choose this approach if they are looking to leverage more cloud benefits and if minor changes won’t change their applications functioning. Also, if a company’s on-premises infrastructure is complex and is preventing scalability and performance, some slight modifications that would allow them to harness these features in the cloud would be very worthwhile.

 

Refactor

The most complex option is refactoring, which includes a more advanced process of rearchitecting and recoding some portion of an existing application. Unlike Replatforming, this option makes major changes in the application configuration and the application code in order to best utilize cloud-native frameworks and functionality. Due to this, refactoring typically offers the lowest monthly cloud costs. Customers who refactor are maximizing operational cost efficiency in the cloud. Unfortunately, this approach is also very time consuming and resource-intensive.

Companies should choose to refactor when there is a strong business need to add features and performance to the application that is only available in the cloud, including scalability and elasticity. Refactoring puts a company in the best position to boost agility and improve business continuity.

 

There is no migration approach that is always the best option for every case. Rather, companies should take into consideration their short- and long-term business goals and choose what is right for their current situation. If you need help deciding, contact us to discuss options – we’re always happy to talk!

 

 Like

If you’re here, you’re probably experiencing a common issue: trying to access a certain port on an EC2 Instance located in a private subnet of the Virtual Private Cloud (VPC). A couple of months ago, we got a call from one of our customers that was experiencing the same issue. They wanted to open up their API servers on the VPC to one of their customers, but they didn’t know how. In particular, they were looking for a solution that wouldn’t compromise the security of their environment. We realized this issue is not unique to our customer, so we thought a blog post explaining how we solved it would be helpful!

To provide some context, once you have an API server within your VPC, it is closed to the outside world. No one can access or reach that server because of the strong firewall around it. There are a few ways around this, including Virtual Private Network (VPN) connections to your VPC, which allows you to open up private access. Unfortunately, this is not a viable solution if you need to open up your API server to the world, which was the case with our customer. The goal was to provide direct access from the internet outside the VPC for any user without VPN connection.

In order to solve this issue for our customer, one of the architecture changes we recommended was adding an internet-facing AWS TCP Network Load Balancer on the public subnet of the VPC. In addition to this load balancer, we also needed to create an instance-based target group.

Keep reading to learn how you can do this – we even included an architecture diagram to make things easier! Please note that our example includes fake IP addresses.

Problem: Accessing an API endpoint in an EC2 Instance in a Private Subnet from the Internet.

Suggested AWS Architecture Diagram:

 

 

Features of our diagram:

  • Multi AZ: we used a private and public subnet in the same VPC in two different availability zones.
  • Multi EC2 (API Servers): we deployed an API server in each private subnet in each availability zone.
  • Multi NAT Gateways: a NAT gateway will allow the EC2 instances in the private subnets to connect to the internet and achieve high availability. We deployed one NAT gateway in the public subnets in each availability zone.
  • TCP Load balancer health checks: a TCP load balancer will always redirect any user’s requests to the healthy API servers. In case one AZ goes down, there will be another AZ that can handle any user’s requests.

Although we did not make this change, you can also implement Multi-Region to handle a region failure scenario and enable higher availability.

VPC Configurations:

SubnetAZCIDRIGW Route OutNAT GW

Route

Out

public-subnet-aus-east-1a172.16.0.0/24YesNo
public-subnet-bus-east-1b172.16.3.0/24YesNo
private-subnet-aus-east-1a172.16.1.0/24NoYes
private-subnet-bus-east-1b172.16.2.0/24NoYes

 

EC2 Configuration:

NameAZSubnetPrivate IPSecurity Group
API Server1us-east-1aprivate-subnet-a172.16.1.**Allow inbound traffic to TCP port 5000 from 0.0.0.0/0 or any specific source IP address on internet.
API Server2us-east-1bprivate-subnet-b172.16.2.**Allow inbound traffic to TCP port 5000 from 0.0.0.0/0 or any specific source IP address on internet.

 

Solution:

  • Create a TCP network load balancer:
    • Internet facing
    • Add listener on TCP port 5000
    • Choose public subnets with same availability zone (AZ) as your private subnets
  • Create an instance based target group:
    • Use TCP protocol on port 5000
    • For health check, either use TCP on port 5000 or HTTP health check path
    • Add the two API servers to the target instances to achieve high availability and balance the request load between different servers
  • Once the target instances (API servers) become healthy, you will be able to access the API endpoints from the public internet directly using the new TCP load balancer DNS name or elastic IP address on port 5000

 

References:

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/create-network-load-balancer.html

 Like

IDC recently produced a report analyzing the Windows Server Market (access a summary here). The report discovered that more and more organizations are transitioning their Windows workloads to the public cloud. Windows Server Deployments in the cloud more than doubled from 8.8% in 2014 to 17.6% in 2017. Migrating to the cloud allows organizations to grow and work past the limitations of on-premises data centers, providing improved scalability, flexibility, and agility.

The report found that of all Windows Cloud Deployments, 57.7% were hosted on Amazon Web Services (AWS). AWS’s share of the market was nearly 2x that of the nearest cloud provider in 2017. Second in line was Microsoft Azure, which hosted 30.9% of Windows instances deployed on the public cloud IaaS market. Over time, the Windows public cloud IaaS market will continue to expand because of the growing usage of public cloud IaaS among enterprises and the movement of Windows workloads into public cloud IaaS.

When it comes to Windows, AWS helps you build, deploy, scale, and manage Microsoft applications quickly and easily. AWS also has top-notch security and a cost-effective pay-as-you-go model.  On top of this, AWS provides customers with a fully-managed database service to run Microsoft SQL Server. These services give customers the ability to build web, mobile, and custom business applications.

This begs one to ask the question: why is AWS the most sought-after cloud hosting service, considering there are other options in the market? We believe it is because of the plethora of services AWS offers. As a software company in particular, if you’re looking to migrate and rearchitect your legacy system, AWS provides an excellent set of services that companies can utilize to make their application cloud native. The richness of these services often allows companies to decouple their legacy architecture without needing to overhaul the entire legacy application within their Windows platform.

Let’s walk through a few examples. For starters, take an application that would traditionally host storage-intensive data such as video, pictures, and documents. This application can keep its Windows platform as is and move the files into Amazon S3 with minimal architecture change. AWS allows its customers to decouple their storage and move it into S3 in a cost-effective, secure, and efficient manner. Next, applications with heavy back-end processing can benefit from services such as lambda, EMR, and Spark. AWS allows customers to decouple the compute needs into relevant services for their application. Lastly, imagine an application that hosts significant historical data with requirements to search or query that data. Traditionally, some customers would need to keep all the historical and archived data in a database, but with AWS they can maintain the same architecture and move the archived data to S3 and use services like Amazon Athena and Elasticsearch to query and search this data without a need for a database. This helps to reduce their footprint on a database and cut down on costs.

Examples like these demonstrate why we believe AWS is a superior cloud-hosting service for Windows workloads. If a company is looking at their architecture holistically, AWS provides a comprehensive solution for compute, networking, storage, security, analytics, and deployment. This has been proven to us time and time again through customer migrations.

dbSeer has a strong track record of helping customers successfully migrate to the AWS Cloud. Contact us anytime to learn more!

 Like

Earlier this year, we wrote a blog about how to use AWS Auto Scaling with Logi Analytics Applications. In that blog, we promised to release a step-by-step guide outlining the technical details of how a Logi Application can be configured to harness the scalability and elasticity features of AWS. If you were wondering when that would be released, the wait is over and you have come to the right place! Without further ado…

Enabling a multi-web server Logi application on AWS Windows instances requires the right configuration for some of the shared Logi files (cache files, secure key, bookmarks, etc.). To support these shared files, we need a shared network drive that can be accessed by the different Logi webservers. Currently EFS (Elastic File Storage) is not supported on Windows on AWS. Below we have defined how EFS can be mounted on Windows servers and setup so that you can utilize the scalability feature of Logi.

Setting Up the File Server
Overview:
In order for our distributed Logi application to function properly, it needs access to a shared file location. This can be easily implemented with Amazon’s Elastic File System (EFS). However, if you’re using a Windows server to run your Logi application, extra steps are necessary, as Windows does not currently support EFS drives. In order to get around this constraint, it is necessary to create Linux based EC2 instances to serve as an in-between file server. The EFS volumes will be mounted on these locations and then our Windows servers will access the files via the Samba (SMB) protocol.

Steps:

  1. Create EC2
    • Follow the steps as outlined in this AWS Get Started guide and choose:
      • Image: “Ubuntu Server 16.04 LTS (HVM), SSD Volume Type”
      • Create an Instance with desired type e.g.: “t2.micro”
  1. Create AWS EFS volume:
    • Follow the steps listed here and use same VPC and availability zone as used above
  2. Setup AWS EFS inside the EC2:
    • Connect to the EC2 instance we created in Step 1 using SSH
    • Mount the EFS to the EC2 using the following commands:
      sudo apt-get install -y nfs-common
      mkdir /mnt/efs
      mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 EFS_IP_ADDRESS_HERE:/ /mnt/efs
  3. Re-export NFS share to be used in Windows:
    • Give our Windows user access to its files. Let’s do this using samba. Again, drop the following to your shell for installing SMB(Samba) services in your Ubuntu EC2
    • Run the following commands:
      apt-get install -y samba samba-common python-glade2 system-config-samba
      cp -pf /etc/samba/smb.conf /etc/samba/smb.conf.bak
      cat /dev/null -> /etc/samba/smb.conf
      nano /etc/samba/smb.conf
    • And then, paste the text below inside the smb.conf file:
      [global]
      workgroup = WORKGROUP
      server string = AWS-EFS-Windows
      netbios name = ubuntu
      dns proxy = no
      socket options = TCP_NODELAY[efs]
      path = /mnt/efs
      read only = no
      browseable = yes
      guest ok = yes
      writeable = yes
    • Create a Samba user/password. Use the same credentials as your EC2 user
      sudo smbpasswd –a ubuntu
    • Give Ubuntu user access to the mounted folder:
      sudo chown ubuntu:ubuntu /mnt/efs/
    • And finally, restart the samba service:
      sudo /etc/init.d/smbd restart

 

Setting up the Application Server

Overview:
Logi applications require setup in the form of settings, files, licenses, and more. In order to accommodate the elastic auto-scaling, we’ll set up one server – from creation to connecting to our shared drive to installing and configuring Logi – and then make an Amazon Machine Image (AMI) for use later.

Steps:

  1. Create EC2:
    • Follow the steps as outlined in this AWS Get Started guide and choose:
      • Image: “Microsoft Windows Server 2016 Base”
      • Instance type: “t2.micro” or whatever type your application requires
  2. Deploy code:
    • Clone your project repository and deploy the code in IIS
  3. Set Up User Access:
    • Allow your application in IIS to access the shared folder (EFS) that we created inside the File server
    • From the control panel, choose users accounts → manage another account → add a user account
    • Use same username and password we created for the samba user in Ubuntu file server
    • In IIS, add the new Windows user we created above to the application connection pool, IIS → Application Pools → right click on your project application pool → identity → custom account → fill in the new username and password we created earlier.
  4. Test EFS (shared folder) connection:
    • To test the connection between Windows application server and Ubuntu file server, go to:
      • This PC → computer tap → map network drive → in folder textbox type in “\\FILE_SERVER_IP_ADDRESS\efs” → If credentials window appears for you, just use the new username and password we created earlier.

 

Configuring the Logi Application

Sticky and Non-Sticky Sessions
In a standard environment with one server, a session is established with the first HTTP request and all subsequent requests, for the life of the session, will be handled by that same server. However, in a load-balanced or clustered environment, there are two possibilities for handling requests: “sticky” sessions (sometimes called session affinity) and “non-sticky” sessions.

Use a sticky session to handle HTTP requests by centralizing the location of any shared resources and managing session state. You must create a centralized, shared location for cached data (rdDataCache folder), saved Bookmark files, _metaData folder, and saved Dashboard files because they must be accessible to all servers in the cluster.

Managing Session State
IIS is configured by default to manage session information using the “InProc” option. For both standalone and load-balanced, sticky environments, this option allows a single server to manage the session information for the life of the session.

Centralization of Application Resources
In a load-balanced environment, each web server must have Logi Server installed and properly licensed, and must have its own copy of the Logi application with its folder structure, system files, etc. This includes everything in the _SupportFiles folder such as images, style sheets, XML data files, etc., any custom themes, and any HTML or script files. We will achieve this by creating one instance with all the proper configurations, and then using an AMI.

Some application files should be centralized, which also allows for easier configuration management. These files include:

Definitions: Copies of report, process, widget, template, and any other necessary definitions (except _Settings) can be installed on each web server as part of the application, or centralized definitions may be used for easier maintenance (if desired).

The location of definitions is configured in _Settings definition, using the Path element’s Alternative Definition Folder attribute, as shown above. This should be set to the UNC path to a shared network location accessible by all web servers, and the attribute value should include the _Definitions folder. Physically, within that folder, you should create the folders _Reports, _Processes, _Widgets, and _Templates as necessary. Do not include the _Settings definition in any alternate location; it must remain in the application folder on the web server as usual.

“Saved” Files: Many super-elements, such as the Dashboard and Analysis Grid, allow the user to save the current configuration to a file for later reuse. The locations of these files are specified in attributes of the elements.

As shown in the example above, the Save File attribute value should be the UNC path to a shared network location (with file name, if applicable) accessible by all web servers.

Bookmarks: If used in an application, the location of these files should also be centralized:

As shown above, in the _Settings definition, configure the General element’s Bookmark Folder Location attribute, with a UNC path to a shared network folder accessible by all web servers.

 

Using SecureKey security:
If you’re using Logi SecureKey security in a load-balanced environment, you need to configure security to share requests.

In the _Settings definition, set the Security element’s SecureKey Shared Folder attribute to a network path, as shown above. Files in the SecureKey folder are automatically deleted over time, so do not use this folder to store other files. It’s required to create the folder rdSecureKey under myProject shared folder, since it’s not auto created by Logi.

Note
: “Authentication Client Addresses” must be replaced later with subnet IP addresses ranges of the load balancer VPC after completing the setup for load balancer below.
You can Specify ranges of IP addresses with wildcards. To use wildcards, specify an IP address, the space character, then the wildcard mask. For example to allow all addresses in the range of 172.16.*.*, specify:
172.16.0.0 0.0.255.255

Centralizing the Data Cache
The data cache repository is, by default, the rdDataCache folder in a Logi application’s root folder. In a standalone environment, where all the requests are processed by the same server, this default cache configuration is sufficient.

In a load-balanced environment, centralizing the data cache repository is required.

This is accomplished in Studio by editing a Logi application’s _Settings definition, as shown above. The General element’s Data Cache Location attribute value should be set to the UNC path of a shared network location accessible by all web servers. This change should be made in the _Settings definition for each instance of the Logi application (i.e. on each web server).

Note: “mySharedFileServer” IP/DNS address should be replaced later with file servers load balancer dns after completing the setup for load balancer below.

 

Creating and Configuring Your Load-Balancer

Overview:
You’ll need to set up load balancers for both the Linux file server and the Windows application/web server. This process is relatively simple and is outlined below, and in the Getting Started guide here.

Steps:

  1. Windows application/web servers load balancer:
    • Use classic load balancers.
    • Use the same VPC that our Ubuntu file server’s uses.
    • Listener configuration: Keep defaults.
    • Health check configuration: Keep defaults and make sure that ping path is exists, i.e. “/myProject/rdlogon.aspx”
    • Add Instances: Add all Windows web/application servers to the load balancer, and check the status. All servers should give “InService” in 20-30 seconds.
    • To enable stickiness, select ELB > port configuration > edit stickiness > choose “enable load balancer generated cookie stickiness”, set expiration period for the same as well.
  2. Linux file servers load balancer:
    • Use classic load balancers.
    • Use the same VPC that the EFS volume uses.
    • Listener configuration: 
    • Health check configuration: Keep defaults and make sure that ping path is exists, i.e. “/index.html” 
      • NOTE: A simple web application must be deployed to the Linux file servers, in order to set the health check. It should be running inside a web container like tomcat, then modify the ping path for the health checker to the deployed application path.
    • Add Instances: Add all Ubuntu file servers to the load balancer, and check the status, all servers should give “InService” in 20-30 seconds.

 

Using Auto-Scaling

Overview:
In order to achieve auto-scaling, you need to set up a Launch Template and an Auto-Scaling Group. You can follow the steps in the link here, or the ones outlined below.

Steps:

  1. Create Launch Configuration:
    • Search and select the AMI that you created above. 
    • Use same security group you used in your app server EC2 instance. (Windows)
  2. Create an Auto Scaling Group
    • Make sure to select the launch configuration that we created above.
    • Make sure to set the group size, aka how many EE2 instances you want to have in the auto scaling group at all times.
    • Make sure to use same VPC we used for the Windows application server EC2s.
    • Set the Auto scaling policies:
      • Set min/max size of the group:
        • Min: minimum number of instances that will be launched at all times.
        • Max: maximum number of instances that will be launched once a metric condition is met.
    • Click on “Scale the Auto Scaling group using step or simple scaling policies” 
    • Set the required values for:
      • Increase group size
        • Make sure that you create a new alarm that will notify your auto scaling group when the CPU utilization exceeds certain limits.
        • Make sure that you specify the action “add” and the number of instances that we want to add when the above alarm triggered.
      • Decrease group size
        • Make sure that you create a new alarm that will notify your auto scaling group when the CPU utilization is below certain limits.
        • Make sure that you specify the action and the number of instances that we want to add when the above alarm is triggered.
    • You can set the warm up time for the EC2, if necessary. This will depend on whether you have any initialization tasks that run after launching the EC2 instance, and if you want to wait for them to finish before starting to use the newly created instance.
    • You can also add a notification service to know when any instance is launched, terminated, failed to launch or failed to terminate by the auto scaling process. 
    • Add tags to the auto scaling group. You can optionally choose to apply these tags to the instances in the group when they launch. 
    • Review your settings and then click on Create Auto Scaling Group.

 

We hope this detailed how-to guide was helpful in helping you set up your Logi Application on AWS. Please reach out if you have any questions or have any other how-to guide requests. We’re always happy to hear from you!

 

References:

https://en.wikipedia.org/wiki/Load_balancing_(computing)

https://devnet.logianalytics.com/rdPage.aspx?rdReport=Article&dnDocID=2222

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/EC2_GetStarted.html

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancer-getting-started.html

https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html

https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-load-balancer.html

https://docs.aws.amazon.com/toolkit-for-visual-studio/latest/user-guide/tkv-create-ami-from-instance.html

 

 

 Like

We recently set up a Spark SQL (Spark) and decided to run some tests to compare the performance of Spark and Amazon Redshift. For our benchmarking, we ran four different queries: one filtration based, one aggregation based, one select-join, and one select-join with multiple subqueries. We ran these queries on both Spark and Redshift on datasets ranging from six million rows to 450 million rows.

These tests led to some interesting discoveries. First, when performing filtration and aggregation based queries on datasets larger than 50 million rows, our Spark cluster began to out-perform our Redshift database. As the size of the data increased, so did the performance difference. However, when performing the table joins – and in particular the multiple table scans involved in our subqueries – Redshift outperformed Spark (even on a table with hundreds of millions of rows!).

 

 

The Details

Our largest Spark cluster utilized an m1.Large master node, and six m3.xLarge worker nodes. Our aggregation test ran between 2x and 3x faster on datasets larger than 50 million rows. Our filtration test had a similar level of gains, arriving at a similar level of data size.

However, our table scan testing showed the opposite effect at all levels of data size. Redshift generally ran slightly faster than our Spark cluster, the difference in performance increasing as the volume increased, until it capped out at 3x faster on a 450 million row dataset. This result was further corroborated by what we know about Spark. Even with its strengths over Map Reduce computing models, data shuffles are slow and CPU expensive. 

We also discovered that if you attempt to cache a table into RAM and there’s overflow, Spark will write the extra blocks to disk, but the read time when coming off of disk is slower than reading the data directly from S3. On top of this, when the cache was completely full, we started to experience memory crashes at runtime because Spark didn’t have enough execution memory to complete the query. When the executors crashed, things slowed to a standstill. Spark scrambled to create another executor, move the necessary data, recreate the result set, etc. Even though uncached queries were slower than the cached ones, this ceases to be true when Spark runs out of RAM.

Takeaway

In terms of power, Spark is unparalleled at performing filtration and aggregation of large datasets. But when it comes to scanning and re-ordering data, Redshift still has Spark beat. This is just our two cents – let us know what you guys think!

 Like

Spark SQL (Spark) provides powerful data querying capabilities without the need for a traditional relational database. While AWS Elastic Map Reduce (EMR) service has made it easier to create and manage your own cluster, it can still be difficult to set up your data and configure your cluster properly so that you get the most out of your EC2 instance hours.

This blog will discuss some of the lessons that we learned when setting up our Spark. Hopefully we can spare you some of the time we experienced when setting ours up! While we believe these best practices should apply to the majority of users, bear in mind that each use case is different and you may need to make small adaptations to fit your project.

 

 

First Things First: Data Storage

Unsurprisingly, one of the most important questions when querying your data is how that data is stored. We learned that Parquet files perform significantly faster than CSV file format. Parquet is a columnar data storage format commonly used within the Hadoop ecosystem. Spark reads Parquet files much faster than text files due to its capability to perform low-level filtration and create more efficient execution plans. When running on 56 million rows of our data stored as Parquet instead of CSV, speed was dramatically increased while storage space was decreased (see the table below).

Storing Data as Parquet instead of CSV

Partitioning the Data

Onto the next step: partitioning the data within your data store. When reading from S3, Spark takes advantage of partitioning systems and requests only the data that is required for the query. Assuming that you can partition your data along a frequently used filter (such as year, month, date, etc.), you can significantly reduce the amount of time Spark must spend requesting your dataset. There is a cost associated with this, though – the data itself must be partitioned according to the same scheme within your S3 bucket. For example, if you’re partitioning by year, all of your data for 2017 must be located at s3://your-bucket/path/to/data/2017, and all of your data for 2016 must be located at s3://your-bucket/path/to/data/2016, and so on.

You can also form multiple hierarchical layers of partitioning this way (s3://…/2016/01/01 for January 1st, 2016). The downside is that you can’t partition your data in more than one way within one table/bucket and you have to physically partition the data along those lines within the actual filesystem.

Despite these small drawbacks, partitioning your data is worth the time and effort. Although our aggregation query test ran in relatively the same time on both partitioned and unpartitioned data, our filtration query ran 7x faster, and our table scan test ran 6x faster.

Caching the Data

Next, you should cache your data in Spark when you can fit it into the RAM of your cluster. When you run the command ‘CACHE TABLE tablename’ in Spark, it will attempt to load the entire table into the storage memory of the cluster. Why into the storage memory? Because Spark doesn’t have access to 100% of the memory on your machines (something you likely would not want anyways), and because Spark doesn’t use all of the accessible memory for storing cached data. Spark on Amazon EMR is only granted as much RAM as a YARN application, and then Spark reserves about one third of the RAM that it’s given for execution memory, (i.e. temporary storage used while performing reads, shuffles, etc.). As a rule of thumb, the actual cache storage capacity is up to 1/3 of the total memory in your RAM.

Furthermore, when Spark caches data, it doesn’t use the serialization options available to a regular Spark job. As a result, the size of the data actually increases by about two to three times when you load it into memory. Any overflow that doesn’t fit into Spark storage does get saved to disk, but as we will discuss later, this leads to worse performance than if you read the data directly from S3.

 

Configuring Spark – The Easy Way and the Hard (the Right) Way 

The Easy Way

Spark has a bevy of configuration options to control how it uses the resources on the machines in your cluster. It is important to spend time with the documentation to understand how Spark executors and resource allocation works if you want to get the best possible performance out of your cluster. However, when just starting out, Amazon EMR’s version of Spark allows you to use the configuration parameter “maximizeResourceAllocation” to do some of the work for you. Essentially, this configures your cluster to make one executor per worker node, with each executor having full access to all the resources (RAM, cores, etc.) on that node. To do this, simply add “maximizeResourceAllocation true” to your “/usr/lib/spark/conf/spark-defaults.conf” file.

While this may be an ‘acceptable’ way to configure your cluster, it’s rarely the best way. When it comes to cluster configuration, there are no hard and fast rules. How you want to set up your workers will come down to the hardware of your machines, the size and shape of your data, and what kind of queries you expect to run. That being said, there are some guidelines that you can follow.

 The Right Way: Guidelines

First, each executor should have between two and five cores. While it’s possible to jam as many cores into each executor as your machine can handle, developers found that you start to lose efficiency with a core count higher than five or six. Also, while it is possible to make an individual executor for each core, this causes you to lose the benefits of running multiple tasks inside a single JVM. For these reasons, its best to keep the number of cores per executor between two and five, making sure that your cluster as a whole uses all available cores. For example, on a cluster with six worker machines, each machine having eight cores, you would want to create twelve executors, each with four cores. The number of cores is set in spark-defaults.conf, while the number of executors is set at runtime by the “–num-exectors #” command line parameter.

Memory is the next thing you must consider when configuring Spark. When setting the executors’ memory, it’s important to first look at the YARN master’s maximum allocation, which will be displayed on your cluster’s web UI (more on that later). This number is determined by the instance size of your workers and limits the amount of memory each worker can allocate for YARN applications (like your Spark program).

Your executor memory allocation needs to split that amount of memory between your executors, leaving a portion of memory for the YARN manager (usually about 1GB), and remaining within the limit for the YARN container (about 93% of the total).

The following equation can be used for shorthand:

One Last Thing – the Spark Web UI

Earlier in this post we mentioned checking the YARN memory allocation through the Spark Web UI. The Spark Web UI is a browser-based portal for seeing statistics on your Spark application. This includes everything from the status of the executors, to the job/stage information, to storage details for the cluster. It’s an incredibly useful page for debugging and checking the effects of your different configurations. However, the instructions on Amazon’s page for how to connect are not very clear.

In order to connect, follow Amazon’s instructions – located here –  but in addition to opening the local port 8157 to a dynamic port, open a second local port to port 80 (to allow http connections), as well as a third local port to port 8088. This port – the one connected to your cluster’s port 8088 – will give you access the YARN resource manager, and through that page, the Spark Web UI.

Note: the Spark Web UI will only be accessible when your Spark application is running. You can always access the YARN manager, but it will show an empty list if there is no Spark application to display.

 

Now it’s Your Turn!

Unlocking the power of Spark SQL can seem like a daunting task, but performing optimizations doesn’t have to be a major headache. Focusing on data storage and resource allocation is a great starting point and will be applicable for any project. We hope that this post will be helpful to you in your SparkSQL development, and don’t forget to check out the Spark Web UI!

 Like

Hey there! This summer, dbSeer has been keeping pretty busy. We completed a database migration project with one of our customers, Subject7, and then turned it into a case study to share with our great supporters like you.

In the project, our certified AWS architects (and all-around awesome people) designed a new network architecture from the ground-up and moved 50 database instances to Amazon RDS. They did all this while still reducing Subect7’s costs by 45%. If that’s not amazing, tell me what is…I’m waiting.

We know you want to learn more, so you can see the full case study here.

 

If you’re short on time, check out below to see the project at a glance:

Who was the client?

Subject7, who created a no-coding, cloud-based automated testing solution for web and mobile applications.

What was the opportunity?

Subject 7 sought to enhance their back-end architecture with the most optimal resource allocation to prepare for future expansion.

What was dbSeer’s solution?

dbSeer designed a new network architecture from the ground-up, which included moving to Amazon RDS. Once on AWS, dbSeer found the most optimal resource allocation.

What were the results?

dbSeer migrated nearly 50 database instances to RDS with minimum downtime. Subject 7 is now able to scale the back-end server to any size without impact to users. AWS costs decreased by nearly 45% and Subjet7 achieved a positive ROI in only 2 months.

 

If you’re interested in learning more, or have specific questions, or just want to say hi, we always love connecting with our readers. Don’t hesitate to reach out, which you can do here.

 Like