Welcome!

Mobile IoT Authors: Pat Romanski, Gopala Krishna Behara, Sridhar Chalasani, Tirumala Khandrika, LeanTaaS Blog

Related Topics: @DXWorldExpo

@DXWorldExpo: Blog Post

Crowdsourcing Becomes Part of Data Handling for Alation | @BigDataExpo #BigData #MachineLearning

Alation centralizes data knowledge by employing machine learning and crowdsourcing

The next BriefingsDirect Voice of the Customer big-data case study discussion focuses on the Tower of Babel problem for disparate data, and explores how Alation manages multiple data types by employing machine learning and crowdsourcing.

We'll explore how Alation makes data more actionable via such innovative means as combining human experts and technology systems.

To learn more about how enterprises and small companies alike can access more data for better analytics, please join Stephanie McReynolds, Vice-President of Marketing at Alation in Redwood City, California. The discussion is moderated by me, Dana Gardner, Principal Analyst at Interarbor Solutions.

Here are some excerpts:

Gardner: I've heard of crowdsourcing for many things, and machine learning is more-and-more prominent with big-data activities, but I haven't necessarily seen them together. How did that come about? How do you, and why do you need to, employ both machine learning and experts in crowdsourcing?

McReynolds: Traditionally, we've looked at data as a technology problem. At least over the last 5-10 years, we’ve been pretty focused on new systems like Hadoop for storing and processing larger volumes of data at a lower cost than databases could traditionally support. But what we’ve overlooked in the focus on technology is the real challenge of how to help organizations use the data that they have to make decisions. If you look at what happens when organizations go to apply data, there's often a gap between the data we have available and what decision-makers are actually using to make their decisions.

McReynolds

There was a study that came out within the last couple of years that showed that about 56 percent of managers have data available to them, but they're not using it . So, there's a human gap there. Data is available, but managers aren't successfully applying data to business decisions, and that’s where real return on investment (ROI) always comes from. Storing the data, that’s just an insurance policy for future use.

The concept of crowdsourcing data, or tapping into experts around the data, gives us an opportunity to bring humans into the equation of establishing trust in data. Machine-learning techniques can be used to find patterns and clean the data. But to really trust data as a foundation for decision making human experts are needed to add business context and show how data can be used and applied to solving real business problems.

Gardner: Usually, when you're employing people like that, it can be expensive and doesn't scale very well. How do you manage the fit-for-purpose approach to crowdsourcing where you're doing a service for them in terms of getting the information that they need and you want to evaluate that sort of thing? How do you balance that?

Using human experts

McReynolds: The term "crowdsourcing" can be interpreted in many ways. The approach that we’ve taken at Alation is that machine learning actually provides a foundation for tapping into human experts.

We go out and look at all of the log data in an organization. In particular, what queries are being used to access data and databases or Hadoop file structures. That creates a foundation of knowledge so that the machine can learn to identify what data would be useful to catalog or to enrich with human experts in the organization. That's essentially a way to prioritize how to tap into the number of humans that you have available to help create context around that data.

That’s a great way to partner with machines, to use humans for what they're good for, which is establishing a lot of context and business perspective, and use machines for what they're good for, which is cataloging the raw bits and bytes and showing folks where to add value.

Gardner: What are some of the business trends that are driving your customers to seek you out to accomplish this? What's happening in their environments that requires this unique approach of the best of machine and crowdsourcing and experts?

McReynolds: There are two broader industry trends that have converged and created a space for a company like Alation. The first is just the immense volume and variety of data that we have in our organizations. If it weren’t the case that we're adding additional data storage systems into our enterprises, there wouldn't be a good groundwork laid for Alation, but I think more interestingly perhaps is a second trend and that is around self-service business intelligence (BI).

So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible. That’s created this perfect storm for a system like Alation which helps catalog all the data in the organization and make it more accessible for humans to interpret in accurate ways.

So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible.

Gardner: And we often hear in the big data space the need to scale up to massive amounts, but it appears that Alation is able to scale down. You can apply these benefits to quite small companies. How does that work when you're able to help a very small organization with some typical use cases in that size organization?

McReynolds: Even smaller organizations, or younger organizations, are beginning to drive their business based on data. Take an organization like Square, which is a great brand name in the financial services industry, but it’s not a huge organization in and of itself, or Inflection or Invoice2go, which are also Alation customers.

We have many customers that have data analyst teams that maybe start with five people or 20 people. We also have customers like eBay that have closer to a thousand analysts on staff. What Alation provides to both of those very different sizes of organizations is a centralized place, where all of the information around their data is stored and made accessible.

Even if you're only collaborating with three to five analysts, you need that ability to share your queries, to communicate on which queries addressed which business problems, which tables from your HPE Vertica database were appropriate for that, and maybe what Hive tables on your Hadoop implementation you could easily join to those Vertica tables. That type of conversation is just as relevant in a 5-person analytics team as it is in a 1000-person analytics team.

Gardner: Stephanie, if I understand it correctly, you have a fairly horizontal capability that could apply to almost any company and almost any industry. Is that fair, or is there more specialization or customization that you apply to make it more valuable, given the type of company or type of industry?

Generalized technology

McReynolds: The technology itself is a generalized technology. Our founders come from backgrounds at Google and Apple, companies that have developed very generalized computing platforms to address big problems. So the way the technology is structured is general.

The organizations that are going to get the most value out of an Alation implementation are those that are data-driven organizations that have made a strategic investment to use analytics to make business decisions and incorporate that in the strategic vision for the company.

So even if we're working with very small organizations, they are organizations that make data and the analysis of data a priority. Today, it’s not every organization out there. Not every mom-and-pop shop is going to have an Alation instance in their IT organization.

Gardner: Fair enough. Given those organizations that are data-driven, have a real benefit to gain by doing this well, they also, as I understand it, want to get as much data involved as possible, regardless of its repository, its type, the silo, the platform, and so forth. What is it that you've had to do to be able to satisfy that need for disparity and variety across these data types? What was the challenge for being able to get to all the types of data that you can then apply your value to?

McReynolds: At Alation, we see the variety of data as a huge asset, rather than a challenge. If you're going to segment the customers in your organization, every event and every interaction with those customers becomes relevant to understanding who that individual is and how you might be able to personalize offerings, marketing campaigns, or product development to those individuals.

That does put some burden on our organization, as a technology organization, to be able to connect to lots of different types of databases, file structures, and places where data sits in an organization.

So we focus on being able to crawl those source systems, whether they're places where data is stored or whether they're BI applications that use that data to execute queries. A third important data source for us that may be a bit hidden in some organizations is all the human information that’s created, the metadata that’s often stored in Wiki pages, business glossaries, or other documents that describe the data that’s being stored in various locations.

We actually crawl all of those sources and provide an easy way for individuals to use that information on data within their daily interactions. Typically, our customers are analysts who are writing SQL queries. All of that context about how to use the data is surfaced to them automatically by Alation within their query-writing interface so that they can save anywhere from 20 percent to 50 percent of the time it takes them to write a new query during their day-to-day jobs.

Gardner: How is your solution architected? Do you take advantage of cloud when appropriate? Are you mostly on-premises, using your own data centers, some combination, and where might that head to in the future?

Agnostic system

McReynolds: We're a young company. We were founded about three years ago and we designed the system to be agnostic as to where you want to run Alation. We have customers who are running Alation in concert with Redshift in the public cloud. We have customers that are financial services organizations that have a lot of personally identifiable information (PII) data and privacy and security concerns, and they are typically running an on-premise Alation instance.

We architected the system to be able to operate in different environments and have an ability to catalog data that is both in the cloud and on-premise at the same time.

The way that we do that from an architectural perspective is that we don’t replicate or store data within Alation systems. We use metadata to point to the location of that data. For any analyst who's going to run a query from our recommendations, that query is getting pushed down to the source systems to run on-premise or on the cloud, wherever that data is stored.

Gardner: And how did HPE Vertica come to play in that architecture? Did it play a role in the ability to be agnostic as you describe it?

It gives the IT department insight. Day-to-day, Alation is typically more of a business person’s tool for interacting with data.

McReynolds: We use HP Vertica in one portion of our product that allows us to provide essentially BI on the BI that’s happening. Vertica is used as a fundamental component of our reporting capability called Alation Forensics that is used by IT teams to find out how queries are actually being run on data source systems, which backend database tables are being hit most often, and what that says about the organization and those physical systems.

It gives the IT department insight. Day-to-day, Alation is typically more of a business person’s tool for interacting with data.

Gardner: We've heard from HPE that they expect a lot more of that IT department specific ops efficiency role and use case to grow. Do you have any sense of what some of the benefits have been from your IT organization to get that sort of analysis? What's the ROI?

McReynolds: The benefits of an approach like Alation include getting insight into the behaviors of individuals in the organization. What we’ve seen at some of our larger customers is that they may have dedicated themselves to a data-governance program where they want to document every database and every table in their system, hundreds of millions of data elements.

Using the Alation system, they were able to identify within days the rank-order priority list of what they actually need to document, versus what they thought they had to document. The cost savings comes from taking a very data-driven realistic look at which projects are going to produce value to a majority of the business audience, and which projects maybe we could hold off on or spend our resources more wisely.

One team that we were working with found that about 80 percent of their tables hadn't been used by more than one person in the last two years. In that case, if only one or two people are using those systems, you don't really need to document those systems. That individual or those two individuals probably know what's there. Spend your time documenting the 10 percent of the system that everybody's using and that everyone is going to receive value from.

Where to go next

Gardner: Before we close out, any sense of where Alation could go next? Is there another use case or application for this combination of crowdsourcing and machine learning, tapping into all the disparate data that you can and information including the human and tribal knowledge? Where might you go next in terms of where this is applicable and useful?

McReynolds: If you look at what Alation is doing, it's very similar to what Google did for the Internet in terms of being available to catalog all of the webpages that were available to individuals and service them in meaningful ways. That's a huge vision for Alation, and we're just in the early part of that journey to be honest. We'll continue to move in that direction of being able to catalog data for an enterprise and make easily searchable, findable, and usable all of the information that is stored in that organization.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@ThingsExpo Stories
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
No hype cycles or predictions of a gazillion things here. IoT is here. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, an Associate Partner of Analytics, IoT & Cybersecurity at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He also discussed the evaluation of communication standards and IoT messaging protocols, data...
Recently, REAN Cloud built a digital concierge for a North Carolina hospital that had observed that most patient call button questions were repetitive. In addition, the paper-based process used to measure patient health metrics was laborious, not in real-time and sometimes error-prone. In their session at 21st Cloud Expo, Sean Finnerty, Executive Director, Practice Lead, Health Care & Life Science at REAN Cloud, and Dr. S.P.T. Krishnan, Principal Architect at REAN Cloud, discussed how they built...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
DevOps at Cloud Expo – being held June 5-7, 2018, at the Javits Center in New York, NY – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Among the proven benefits,...