The Customer
Abt Associates (Abt) is a global leader in research, evaluation and program implementation, driving innovation and measurable impact for over 50 years. Working in 60 countries world-wide, Abt is a mission-driven company with a staff of over 2,600. Both domestic and international teams take an evidence-based and multi-disciplinary approach to solving tough challenges in health, social and environmental policy, and international development.
The Opportunity
Abt wanted a single view of all its operational metrics to make optimal and data-driven decisions. Within large organizations like Abt, individual departments produce valuable data that often remains in a data silo, which is a repository of fixed data that remains isolated under the control of one department or division. Abt is not unique in this practice – in fact, these data silos are very common. Abt saw the opportunity to democratize their data, making it globally available to all of their employees. This solution would allow executives, analysts, project managers, and researchers to generate and consume more insight.
Abt initially created a consolidated view of their disparate data sources at smaller functional levels. But, in order to provide the “single version of truth” to their global user community, they decided that a more comprehensive solution was necessary. Therefore, they made the executive decision to create a cloud-based, consolidated data platform that would include data from many departments ranging from HR to Sales.
Abt decided on a multi-phase approach for this project. The main goal of phase one, which is described in this case study, was to build the foundation for an environment that allows Abt to democratize all of their data. This included consolidating some of their data silos into one platform. Respecting Abt’s security policies, this platform is only accessible to authorized employees in the organization. Democratizing the data allows for more employees to gain insight, which drives efficiency and self-service capabilities.
The Solutions
Abt’s Requirements for this project: Requirement:
- Abt wanted to keep historical data for long-term storage. They wanted the ability to execute ad hoc queries against historical/archived data, allowing them to extend the solution to advanced analytics like machine learning.
- Security was a high priority. Abt only wanted the data platform to be accessed by authorized Abt staff. This required additional security authorizations as well as confirming that data access followed Abt’s internal policies.
- Abt wanted the platform to support multiple data formats (i.e. excel, csv, etc.) from a variety of sources including HR, Procurement, and Sales in order to enable increased automation and analytical capabilities. They envisioned a solution that could read all formats, allowing business analysts and non-technical employees to access the data with their own queries.
Requirement 1: Retaining all historical data
dbSeer chose to use Amazon S3 (S3) because of its high redundancy and ability to run ad hoc queries and machine learning. S3 is object storage built to store and retrieve any amount of data from anywhere. It is made to deliver 99.999999999% durability, making it the perfect storage option for Abt’s historical data. S3 also allows Abt to run sophisticated Big Data analytics on their data without moving it to a separate analytics system.
Amazon Redshift was the optimal choice for a data warehouse because of its highspeed read capabilities. Redshift delivers ten times faster performance than other data warehouses by using machine learning, massively parallel query execution, and columnar storage on high-performance disks.
Using Redshift and S3 allows Abt to create a data lake in the future, which can generate unique insight that they would not gain from querying independent data silos. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. Data lakes can import an unlimited amount of data that can come in real-time from multiple sources. Abt is positioned to use machine learning, which can build models to forecast likely outcomes and suggest a range of prescribed actions to achieve an optimal result. Abt now can also conduct advanced analytics over sources such as logs, financial records, sales records, HR records, and so on.
Requirement 2: Security
Securing this platform was essential as a matter of Abt Policy. dbSeer worked with Abt IT to leverage Amazon’s Virtual Private Cloud (VPC), Virtual Private Network (VPN), and Web Application Firewall (WAF). Amazon VPC allowed Abt to provision a logically isolated section of the AWS Cloud to launch their AWS resources in their own virtual network over which they had complete control. Abt was able to select its own IP address range, create subnets, and configure route tables and network gateways.
dbSeer also helped Abt to create a public-facing subnet for its web servers with access to the internet and a private-facing subnet for back-end systems like databases or application servers. Because Abt’s data was stored in S3, dbSeer restricted access so that it is only accessible from instances on Abt’s VPC. WAF gave an added level of protection from common web exploits for Abt’s web applications.
Abt chose to utilize Okta and Check Point Firewall for additional security. Okta provides secure identity management and single sign-on to Abt’s applications. Check Point architecture delivers consolidated Gen V cyber security across Abt’s networks and their cloud environment.
Requirement 3: An accessible and extendable analytic platform
In order to generate reports and transfer the data, Abt’s platform needed compute power which Windows EC2 instances were able to deliver. Amazon EC2 for Windows Server provides highly scalable, easily managed, flexible compute infrastructure that supports all of Abt’s Microsoft applications. Abt chose to use Talend for data transformation. Talend is an open source data transformation tool that allows for an efficient data transfer. Talend supports a variety of data sources including mainframes, relational and non-relational databases, files of various formats, web services, and packaged applications like ERP and CRM. Because of this, Talend was the ideal choice to combine all of Abt’s data sources. The data platform allows employees of many different roles, including data scientists and business analysts, to access data with their choice of analytic tools and frameworks. An added benefit: it is not necessary to move data to a separate analytics system because all analytics can be run directly on the platform. For analytics, Abt decided to use Qlik, a self-service analytic tool that lets business users generate insight off of their data.
Abt chose to use Talend for data transformation. Talend is an open source data transformation tool that allows for an efficient data transfer. Talend supports a variety of data sources including mainframes, relational and non-relational databases, files of various formats, web services, and packaged applications like ERP and CRM. Because of this, Talend was the ideal choice to combine all of Abt’s data sources. The data platform allows employees of many different roles, including data scientists and business analysts, to access data with their choice of analytic tools and frameworks. An added benefit: it is not necessary to move data to a separate analytics system because all analytics can be run directly on the platform. For analytics, Abt decided to use Qlik, a self-service analytic tool that lets business users generate insight off of their data.
AWS has the services to help you build sophisticated applications with increased flexibility, reliability, and scalability
Architecture Diagram
The Results
Over 10 data sources are already integrated in the data platform. All employees with billability targets receive regular reports from the global metrics platform. 110 active self-service users are on the platform developing and consuming additional reporting and insight from this consolidated data. Additionally, 100% of planned data sources were seamlessly deployed in AWS.
Phase one of the project was completed in only nine months using dbSeer staff on site at Abt and remotely. All of the data consolidation, processing, and storing is done on the AWS cloud. This new data platform gives Abt employees access to more data on a near real-time basis. For example, sales and pipeline data used to only be updated monthly, but is now refreshed twice a day and is accessible instantaneously through the analytic platform. Having access to this data is incredibly useful when making business decisions.
dbSeer architected this data platform on the cloud because it provides performance, scalability, reliability, availability, a variety of analytic engines, and impressive economies of scale. AWS in particular provides the most secure and scalable resources that enable customers to build and analyze their data in the cloud.
As a result of this project, Abt Associates’ employees now have universal and near realtime access to their data. This platform has provided a base foundation on which Abt can continue to build. Over time, additional data sources can be added to facilitate greater business insights and drive more optimal decisions. In a mission-driven organization like Abt, this means greater ability to address the tough issues of the day in health, social, and environmental policy as well as international development.