Q)ETL concepts?
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.
Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.
Glossary of ETL (Reference:www.Oracle.com)
Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.
Mapping
The definition of the relationship and data flow between source and target objects.
Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.
Staging Area
A place where data is processed before entering the warehouse.
Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
Transportation
The process of moving copied or transformed data from a source to a data warehouse.
Target System
A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
Q)Metadata tools
Metadata is data about data and Metadata tools are used for gathering, storing, updating, and for retrieving the business and technical metadata of an organization.
Data and Metadata
Data - The actual data that is stored in the database.
Business Metadata - The information of the data related with business and this data is used by functional team (business analysts, smart management experts, business managers etc).
Technical Metadata - The information about the data related with technology and this data is used by the technical team
(Developers, DBAs) etc.
Q)Popular Metadata Tools
Tool Name Company Name
Rochade ASG
Metatrieve Metatrieval
Datamapper Exeros
Metacenter Data Advantage Group
Meta Data
Integration
Framework Info Librarian
SuperGlue Informatica
Metastage Ascential
Q)What is the business Metadata?
MetaData:
Metadata is data about data. Metadata comes into picture when we need to know about how data is stored and where it is stored. Metadata tools are helpful in capturing business metadata and the following section explains business metadata.
Business MetaData:
In IT, Business Metadata is adding additional text or statement around a particular word that adds value to data. Business Metadata is about creating definitions, business rules. For example, when tables and columns are created the following business metadata would be more useful for generating reports to functional and technical team. The advantage is of this business metadata is whether they are technical or non-technical, everybody would understand what is going on within the organization.
Table’s Metadata: While creating a table, metadata for definition of a table, source system name, source entity names, business rules to transform the source table, and the usage of the table in reports should be added in order to make them available for taking metadata reports.
Column’s Metadata: Similarly for columns, source column name (mapping), business rules to transform the source column name, and the usage of the column in reports should be added for taking metadata reports.
Q)What is Technical Metadata?
Technical metadata describes information about technology such as the ownership of the database, physical characteristics of a database (in oracle, table space, extents, segments, blocks, partitions etc), performance tuning (processors, indexing), table name, column name, data type, relationship between the tables, constraints, abbreviations, derivation rules, glossary, data dictionary, etc., and is used by the technical team. In Technical metadata, derivation rules are important when formulae or calculations are applied on a column.
Q)Role of Metadata in ETL?
When you deal with a data warehouse, various phases like Business Process Modeling, Data Modeling, ETL, Reporting etc., are inter-related with each other and they do contain their own metadata. For example in ETL, it will be very difficult for one to extract, transform and load source data into a data warehouse, if there is no metadata available for the source like where and how to get the source data.
An organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc., for developing data models. Functional and technical team should have spent much time and effort in creating the data model's data structures(tables, columns, data types, procedures, functions, triggers etc.,). By using matadata capturing tools, these data structures can be imported into metadata repository which we call it as metadata.
For example when you work with Informatica's MetaData Exchange, it captures the metadata present in these tools and loads into the repository. There is no need for informatica developer to create these data structures once again since metadata(data definitions) have been already captured and stored. Similarly most of the ETL tools have that capability to capture metadata from RDBMS, files, ERP, Applications etc.
In ETL, Metadata Repository is where all the metadata information about source, target, transformations, mapping, wokflows, sessions etc., are stored. From this repository, metadata can be manipulated, queried and retrieved with the help of wizards provided by metadata capturing tools. During the ETL process, when we are mapping source and target systems, we are actually mapping their metadata.
In any organization, a useful metadata often stored in a repository can be a handy resource to know about the organization's information systems. Assume that each department in an organization may have a different business definitions, data types, attribute names for the same attribute or they may have a single business definition for many attributes. These anomalies can be overcome by properly maintaining metadata for these attributes in the centralized repository.
Thus metadata plays a vital role in explaining about how, why, where data can be found, retrieved, stored and used efficiently in an information management system.
Q)Metadata Reports ?
Metadata stored in a repository can be produced in the form of reports for easy understanding and these reports are very useful in explaining about the various objects or data structures and the relationship between these objects. The following products like Schema Logic Enterprise Suite, Rochade, Metatrieve, Datamapper, Metacenter, Metadata Integration Frame work stores and handles metadata in an efficient and effective manner.
Following are few examples for metadata reports.
A metadata report on business process activities, data flow.
A metadata report on source and target columns can be generated to find out all the source and target field names, data types, definitions etc.
A metadata report on workflow can be generated to find out all the workflows, workflow properties, workflow scheduling, workflow execution details etc.
A metadata report on mapping can be generated to find out all the sources and targets in the mapping, source fields and target fields participated in the mapping etc.
A metadata report on business intelligence can be generated to find out all the score cards and balanced score cards.
Q)What is Business Intelligence ?
Business Intelligence is a technology based on customer and profit oriented models that reduces operating costs and provide increased profitability by improving productivity, sales, service and helps to make decision making capabilities at no time. Business Intelligence Models are based on multi dimensional analysis and key performance indicators (KPI) of an enterprise.
Business Intelligence applications that are based on Business Intelligence Models are created by Business Intelligence software which provides the aggregated details about suppliers, customers, internal activities, business to business transactions to the managers or whoever needs it to take better corporate decisions.
Many business questions or situations need to be analyzed in order to achieve the target of an enterprise with the help of several managers or executives in each cadre. Below are some of the samples of these questions.
Business Intelligence: Finance:
What is the net income, expenses, gross profit, and net profit for this quarter, year?
Business Intelligence: Accounts:
What is the sales amount this month and what is the outstanding pending payment?
Business Intelligence: Purchase:
Who is the vendor to be contacted to purchase products?
Business Intelligence: Production:
How many products are manufactured in each production unit today, weekly, monthly?
Business Intelligence: Sales:
How many products have been sold in each area today, weekly, monthly?
Business Intelligence: Quality:
How many products have been defective today, weekly, monthly, quarterly, yearly?
Business Intelligence: Service:
Are the customers satisfied with the quality?
These business intelligence questions are related with why, what, how, when, and business intelligence reports(olap reports) are useful in providing solutions to the above questions by means of reporting, score cards, balance score cards that are helpful in managerial decisions.
Q)Business Intelligence Tools ?
Business Intelligence Tools help to gather, store, access and analyze corporate data to aid in decision-making. Generally these systems will illustrate business intelligence in the areas of customer profiling, customer support, market research, market segmentation, product profitability, statistical analysis, inventory and distribution analysis.
With Business Intelligence Tools, various data like customer related, product related, sales related, time related, location related, employee related etc. are gathered and analysed based on which important strategies or rules are formed and goals to achieve their target are set. These decisions are very efficient and effective in promoting an Organisation's growth.
Since the collected data can be sliced across almost all the dimensions like time,location, product, promotion etc., valuable statistics like sales profit in one region for the current year can be calculated and compared with the previous year statistics.
Popular Business Intelligence Tools
Tool Name Company Name
Business Objects Business Objects
Cognos Cognos
Hyperion Hyperion
Microstrategy Microstrategy
Microsoft Reporting Services Microsoft
Crystal Business Objects
Q)OLAP Analysis ?
Imagine an organization that manufactures and sells goods in several States of USA which employs hundreds of employees in its manufacturing, sales and marketing division etc. In order to manufacture and sell this product in profitable manner, the executives need to analyse(OLAP analysis) the data on the product and think about various possibilities and causes for a particular event like loss in sales, less productivity or increase in sales over a particular period of the year.
During the OLAP analysis, the top executives may seek answers for the following:
1. Number of products manufactured.
2. Number of products manufactured in a location.
3. Number of products manufactured on time basis within a location.
4. Number of products manufactured in the current year when compared to the previous year.
5. Sales Dollar value for a particular product.
6. Sales Dollar value for a product in a location.
7. Sales Dollar value for a product in a year within a location.
8. Sales Dollar value for a product in a year within a location sold or serviced by an employee.
OLAP tools help executives in finding out the answers, not only to the above mentioned measures, even for the very complex queries by allowing them to slice and dice, drill down from higher level to lower level summarized data, rank, sort, etc.
Q)OLAP Multidimensional database ?
OLAP Database - Multidimensional
This is a type of database that is optimized for data warehouse, data mart and online analytical processing (OLAP) applications. The main advantage of this database is query performance.
Relational databases make it easy to work with individual records, whereas multidimensional databases are designed for analyzing large groups of records. Relational database is typically accessed using a Structured Query Language (SQL) query. A multidimensional database allows a user to ask questions like "How many mortgages have been sold in New Jersey city" and "How many credit cards have been purchased in a particular county?".
Popular Multidimensional Databases
Database Name Company Name
Crystal Holos Business Objects
Hyperion Essbase Hyperion
Oracle Express Oracle Corporation
Oracle OLAP Option Oracle Corporation
Microsoft Analysis Services Microsoft
PowerPlay Enterprise Cognos
Q)What is Data Cleansing?
"Data cleansing ensures that undecipherable data does not enter the data warehouse. Undecipherable data will affect reports generated from the data warehouse via OLAP, Data Mining and KPI's."
A very simple example of where data cleansing would be utilized is how dates are stored in separate applications. Example: 11th March 2007 can be stored as '03/11/07' or '11/03/07' among other formats. A data warehousing project would require the different date formats to be transformed to a uniform standard before being entered in the data warehouse.
Q)Starting a Data Warehousing Project?
Data Warehousing have been voted the most complex and resource intensive software projects by CIOs of Fortune 1000 companies. Enterprise Resource Planning (ERP) implementations top the list.
Data Warehousing comes in all shapes and sizes, which bear a direct relationship to cost and time involved. The approach to starting a Data Warehousing project will vary and the steps listed below are summary of some of the points to consider.
1) Get Professional Advice
Data Warehousing makes a great difference is the lead has been there and done that. It will save a huge bundle to get professional advice upfront. Endless meeting times can be saved and the risk of an abandon data warehousing project can be reduced.
2) Plan the Data
Know what metrics you want to measure in the Data Warehouse and ensure that there is the appropriate data to provide for the analysis. If you wish to obtain periodic Key Performance Index (KPI) data for shipping logistics, make sure that the appropriate data is piped into the data warehouse.
3) Who will use the Data Warehouse
The power Data Warehouse consumers are business and financial managers. Data Warehouses are meant to deliver clear indications on how the business is performing. Plot out the expected users for the Data Warehouse in the enterprise. Gauge that they will have the appropriate reports in a format which is quickly understandable. Ensure that planning exercises are conducted in advance to accumulate scenarios on how the Data Warehouse will be used. Always remember that data has to be presented attractively and in a format business managers will feel comfortable. Text files with lines of numbers will not suffice!
4) Intergration to External Applications
Most Data Warehousing projects sink or swim by their ability to extract data from external applications. Enterprises have a slew of applications either developed inhouse or obtain from a vendor. Conceptually, your Data Warehouse will act as the heart to diverse applications running in the enterprise. All important data will flow in or out of the Data Warehouse.
5) Technology, What Technology?
At the bits and bytes level, a Data Warehouse is a souped up database. It will be built from one of the major Relational Database Management System (DBMS) vendors like Oracle, IBM, Microsoft. Open source databases, like mySQL, can also support Data Warehousing with the right support in place.
Q)Design Methodologies of Kimball and Inmon...
Data warehousing is more an art-form than cookie cutter science. The business variables and technical risks are very unique to each installation. The business users have different goals and expectations. Data warehousing is more often successful than not if there is a reservoir of data warehousing expertise in-house.
This article will focus on the data warehousing design methodologies most commonly proposed. These designs are in an evolving flux as business needs and technical cost change.
Quite often the design chosen will be a combination of the methodologies below and additional requirements - the data warehouse design third way. I am a proponent of the third way data warehousing design. Third way takes into account the business specifics and needs of the installing company and technical resources available. It uses the best design patterns of both methodologies plus additional requirements unique to the business.
The two major design methodologies of data warehousing are from Ralph Kimball and Bill Inmon. The design methodologies developed by Kimball and Inmon have lines drawn in the sand.
Both Kimball and Inmon view data warehousing as separate from OLTP and Legacy applications.
Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
The data mart is the creation of a data warehouse's subject area.
Q)Modelling for the Data Mining?
Data mining is conducted against data accumulated in OLTP repositories, data warehouses, data marts and archived data. The steps for data mining follows the following pattern:
• data extraction
• data cleansing
• modeling data
• applying data mining algorithm
• pattern discovery
• data visualization
Data extraction and data cleansing can be eased with good data lifecycle management policies. Very often a data warehousing project will ensure that data extraction and meta-data standards are pre-defined in an organization.
Data models for operational and archived data are different from data mining models. Data stored referentially in operational systems are designed for transactional speed.
Q)Evolution of Data in Enterprises
Data becomes active as soon as it is of interest to an organization. Data life cycle begins with a business need for acquiring data. Active data are referenced on a regular basis during day-to-day business operations. Over time, this data loses its importance and is accessed less often, gradually losing its business value, and ending with its archival or disposal.
Q)Active Data
Active data is of business use to an organization. The ease of access for business users to active data is an absolute necessity in order to run an efficient business.
The simple, but critical principle, that all data moves through life-cycle stages is key to improving data management. By understanding how data is used and how long it must be retained, companies can develop a strategy to map usage patterns to the optimal storage media, thereby minimizing the total cost of storing data over its life cycle.
The same principles apply when data is stored in a relational database, although the challenge of managing and storing relational data is compounded by complexities inherent in data relationships. Relational databases are a major consumer of storage and are also among the most difficult to manage because they are accessed on a regular basis. Without the ability to manage relational data effectively, relative to its use and storage requirements, runaway database growth will result in increased operational costs, poor performance, and limited availability for the applications that rely on these databases. The ideal solution is to manage data stored in relational databases as part of an overall enterprise data management solution.
Q)Inactive Data
Data are put out to pasture once they are no longer active. i.e. there are no longer needed for critical business tasks or analysis.
Prior to the mid-nineties, most enterprises achieved data in Microfilms and tape back-ups.
There are now technologies for data archival such as Storage Area Networks (SAN), Network Attached Storage (NAS) and Hierarchical Storage Management. These storage systems can maintain referential integrity and business context.
Q) What is Data Mining.
Data mining is the analysis of large data sets to discover patterns of interests. Data mining has come a long way from the early academic beginnings in the late seventies. Many of the early data mining software packages were based on one algorithm.
Until the mid-nineties data mining required considerable specialized knowledge and was mainly restricted to statisticians. Customer Relationship Management (CRM) software played a great part in popularizing data mining among corporate users. Data mining in CRMs are often hidden from the end users. The algorithms are packaged behind business functionality such as Churn analysis. Churn analysis is the process to predict which customers are the ones most likely to defect to a competitor.
Data mining algorithms are now freely available. Database vendors have started to incorporate data mining modules. Developers can now access data mining via open standards such as OLE-DB for data mining on SQL Server 2000. Data mining functionality can now be added directly to the application source code.
Q)The Future of Data Mining
The complexity of data mining must be hidden from end-users before it will take the true center stage in an organization. Business use cases can be designed, with tight constrains, around data mining algorithms.
Thursday, 7 January 2010
Erwin Data Modelling Data Warehousing Business Intelligence Oracle Database Dimensional Modeling Questions and Answers Part 5
Labels:
Data Architect,
Data Integration,
Data Migration,
Data Modelling,
Data Warehousing,
Database,
Oracle,
PL SQL,
SOA,
SQL
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment