From “Lake” to Insight: How Federal Agencies Can Get More from Their Big Data Platforms

These questions can guide the types of data you capture, how it’s secured, your cloud infrastructure, and more.

We’ve found this step to be so critical, in fact, that we’ve integrated it into our ODP.

For example, the platform creates indices for data storage based on mission/business questions and requirements.

Define and secure the data   The vast amounts of structured and unstructured data in today’s federal data lakes come from a variety of sources—often in a state that’s far from ready for advanced analytics.

“If enterprise data—with its large volumes, varied formats and types—is to be managed strategically, its metadata must be suitably defined and used,” according to global research and advisory firm Gartner2.

For federal agencies charged to comply with strict security and privacy regulations, metadata must also be effectively secured.

Here’s where metadata and attribute authorization schemes come in.

Metadata tags and unique identifiers allow organizations to quickly and easily query, process, analyze, aggregate, and present data of any variety.

From a security standpoint, attribute authorization schemes protect data at the source, record, or field level.

Make information easier to find Once tagged, data must be governed in a way that enables analysts to easily find and understand it.

In a survey by Erwin of 118 CIOs, CTOs, data center managers, IT staff, and consultants, most responded that data governance is critical to compliance, customer satisfaction, and better decision-making.

Yet nearly half (46 percent) lacked a formal strategy for data governance.

Cataloging—the continual management of information such as data set names, formats, tagging, releasability, retention, and more—is key.

So is governance of data access.

Our ODP includes four different data zones: raw (data that has not been touched), trusted (data that has been quality checked, tagged, and enriched), analytic (data that has been indexed, stored, and tuned to run advanced machine learning algorithms) and sandbox (data that has been segmented or quarantined to enable testing, prototyping, and exploration).

Moving information to smaller data zones enables organizations to manage data efficiently and deliver more immediate results.

“Democratize” data analytics Such data zones can also help organizations spread the analytics workload out to team members beyond an organization’s data scientists, whose time ideally should be kept free for exploring new ideas and models.

Another tactic toward this goal: making certain data analytics functions self-service, such as clearing information and running through notebooks and analysis models.

Booz Allen’s ODP uses an automated installation system that allows developers to provision a new data platform from a single command line.

Furthermore, because the ODP uses open source products, it’s easy to customize it for specific objectives.

Use Agile to accelerate progress For many big data objectives, such as mission readiness, challenges are multifaceted and time is of the essence.

Through helping teams manage uncertainty and improve teamwork, transparency, and project commitment, Agile methodology can help organizations take on these more difficult technical goals3.

So can an Agile culture—one that encourages new ideas and pivoting quickly based on lessons learned.

In conclusion Overcoming data lake obstacles involves a combination of processes, culture, and appropriate technology choices.

By employing modern data management platforms and best practices from the outset, agencies can start flipping the 80-20 rule of data science—and start putting the valuable information they collect to work for their missions.

1https://www.

gao.

gov/assets/700/691959.

pdf 2https://www.

gartner.

com/doc/3878879/ways-use-metadata-management-deliver 3https://digital.

gov/2018/04/03/thinking-about-going-agile-5-benefits-your-office-will-reap-with-agile-methods/ About the Authors Chris Brown is a Chief Technologist at Booz Allen Hamilton providing customer expertise in big data analytics solutions, with over 20 years Information Technology Experience.

His experience ranges from large-scale, mission-critical applications in both the commercial and United States Federal Government markets.

Chris received a BS from William & Mary in Business Administration with a minor in Computer Science.

  David Cunningham is a Principal in Booz Allen’s Strategic Innovation Group (SIG) focused on delivering Cloud and Data platform capabilities to our Civil, Defense, and Intelligence Community clients.

He has been at the forefront of technology evolutions such as Service-Oriented Architecture, Enterprise Integration, Cloud Computing and Big Data.

David has over 18 years of professional experience in IT development and received a BS in Computing Sciences from Villanova University.

  Sign up for the free insideBIGDATA newsletter.

.

. More details

Leave a Reply