Data Integration

Data integration is an automated process for merging, improving, enriching and cleansing data from various data sources and applications.

Unified data for new insights

Data integration is the process of collecting and structuring data with the aim of obtaining new insights from composite data.

In its most basic form, using Excel to collect data is a first step. More modern organisations that want to update this data automatically and with a higher frequency, opt for specialised ETL / ELT software and databases.

Full Orbit has extensive experience in developing ETL / ELT software and associated methodologies to structure data, such as Inmon, Kimball and Data Vault. 

 

Orbit SVG_Samengesteld beeld van de data
spacers

ETL/ELT

Conceptually, data integration takes place in three steps:

Extract

Extracting the data from the various source systems using adapters.

Transform

Converting data from one format or structure into another while being enriched, improved, merged and/or cleansed at the same time.

Load

Loading the transformed data into a new structure in a database or data warehouse, ready for use by reporting and analysis tools.

Depending on where the transformations take place, we speak of ETL or ELT. The first has a separate transformation engine and the other uses the power of the database to perform these transformations.

Orbit SVG_How to avoid spaghetti
spacers

Data integration software

One can manually create scripts for data integration, but this quickly results in a ‘spaghetti’ of data connections. With ever increasing amounts of data sources, this becomes impossible to maintain.

By using specific data integration software, data can be ingested from the various sources via connectors. Mappings are then used to model metadata (data about data). These models are subsequently executed by the transformation engine, which loads the integrated data into the target system.

The advantage is that once a combination of data sources from one or more systems has been made using an ELT process, adding a data source from another system becomes a lot easier.

Orbit SVG_Data integration software
spacers

The advantages of data integration

Elimination of Data Silos

Data silos refer to heterogeneous data sources that store data in specific, often disjointed locations and legacy systems. Data integration can help the cross functionality of data, bringing proprietary, legacy data into new systems that can easily be accessed by any team member within the organisation.

Accessibility

“Create once, deliver to many”. By creating a central data source, data users within the company will all be able to access the same information which can reduce the number of questions asked, increase the speed of data access, and limit the possibility of having erroneous replicated data. Authors can continue to use systems of their choice while end users can access what they need from a central location.

Semantic Integration

A common problem with using various systems to collect data or having many people collect data is ontology issues. This means having multiple types of data that describe the same thing but are organised differently. An example of this may be the way that dates are stored (“DD/MM/YYYY”, “MM/DD/YYYY”, “Month Day, Year”, etc.). By removing variations and creating a structured data warehouse, it will be easier to find data and analyse patterns more efficiently.

 

MB-Analyst’s-View-Silos-Getty-880x493