Data Integration – How Text Analytics Can Help

February 8, 2011 Leave a comment

According to a Bloor Research report of some time ago, 80% of data integration projects fail.

I’m not sure if that’s been updated recently, but I’d be very surprised to learn the percentage has changed much one way or the other.

The question we should ask ourselves is not “why.”  But instead, as odd as it sounds, we should ask “what.”

Not “Why” Integrate

The reasons “why” to integrate data are well known and well understood – and there’s usually a solid ROI.  We integrate data to create data warehouses.  To create ODS’ for near-line operational BI.  We integrate to migrate systems (from one vendor to another, from one database to another, from one application to another).

Not “How” to Integrate

An alphabet soup of technologies exists to answer the “how” question.  Depending upon the “why” an organization might opt for any of the usual ETL, or EII, or EAI, or CDC suspects.  All excellent options with very well defined usage scenarios to meet whatever the business & IT requirements are.

It’s the “What” to Integrate

Figuring out what data to integrate is the single biggest challenge, and as more systems are implemented and data grows – it’s becoming a harder problem to solve, not easier.

In the dark old days, data architects would sit around tables with printouts of data models and try and map them all together.  Today, vendors like Informatica, SAP and IBM have business user focused tools for helping to identify the relationships between data elements.

Unfortunately most of the technologies in place today rely on looking at column definitions (name, datatype, size, referential integrity definitions) and try and create a logical mapping across systems through that.

When dealing with the many applications in the marketplace which use application level code to enforce PK/FK relationships, the above modelling tools simply aren’t good enough.  If they were, our 80% would be much smaller.

Text Analytics Can Help

It’s not a panacea.  It’s not a magic wand.  But – text analytics can help data integration projects succeed.

Using text analytics, organizations can index the data across many different systems and infer relationships between columns/data sets.  Text analytics can provide a company with a view into columns that store exactly alike, or very similar data (such as company names, vendor names, product names).  Even going so far as to be able to recognize that in DB1.table9.column99 the data is of type “credit card” and provide a report of all the other databases, tables and columns that have the same data in them.

Text analytics is another way to get a view into your structured data assets that can help support successful data integration projects.  With an 80% failure rate – anything that can help turn a challenge into success is of critical importance.