Showing posts with label Logical Data Model. Show all posts
Showing posts with label Logical Data Model. Show all posts
Thursday, 28 January 2010
Thursday, 14 January 2010
My blog on steps required to convert logical data model to physical data model
Hi All,
Please find the URL of my blog which describes the steps to convert the Logical Data Model into Physical Data Model.
http://ldmtopdm.blogspot.com/
Regards,
Sandeep
Please find the URL of my blog which describes the steps to convert the Logical Data Model into Physical Data Model.
http://ldmtopdm.blogspot.com/
Regards,
Sandeep
Tuesday, 12 January 2010
Role and responsibilities of Data Architect/Integration Architect
Profile as Data Architect/Integration Architect
•Understanding the customer’s business requirements, business processes and system functionality to develop the data design and maintain the source of data.
•Responsible for developing and maintaining a formal description of the data and data structures - this can include data definitions, data models [Conceptual, Logical and Physical Data Model, data dictionaries, data flow diagrams, etc. (in short metadata). Data architecture includes topics such as metadata management, business semantics, data modeling and metadata workflow management.
•The architecture covers databases, data integration and the means to get to the data
•Resolve the end to end data issues, design issues and end to end test issues with the help of data dictionaries.
•Provides the means of system integration by understanding the source and target data and build the data mapping.
•High-level data architecture documentation, which illustrates the source and destination
of data to and/or from approved data sources
•Development of a data strategy and design specifications
•Design strategy and design for managing history data
•Selection of capabilities and systems to meet business information needs
•Evaluating new techniques for enhanced functionality
•Understanding the customer’s business requirements, business processes and system functionality to develop the data design and maintain the source of data.
•Responsible for developing and maintaining a formal description of the data and data structures - this can include data definitions, data models [Conceptual, Logical and Physical Data Model, data dictionaries, data flow diagrams, etc. (in short metadata). Data architecture includes topics such as metadata management, business semantics, data modeling and metadata workflow management.
•The architecture covers databases, data integration and the means to get to the data
•Resolve the end to end data issues, design issues and end to end test issues with the help of data dictionaries.
•Provides the means of system integration by understanding the source and target data and build the data mapping.
•High-level data architecture documentation, which illustrates the source and destination
of data to and/or from approved data sources
•Development of a data strategy and design specifications
•Design strategy and design for managing history data
•Selection of capabilities and systems to meet business information needs
•Evaluating new techniques for enhanced functionality
Friday, 8 January 2010
Table Partitioning in Oracle
What is Table Partitioning in Oracle ?
Answer 1: Now a days enterprises run databases of hundred of Gigabytes in size. These databases are known as
Very Large Databases (VLDB). From Oracle Ver. 8.0 Oracle has provided the feature of
table partitioning i.e. you can partition a table according to some criteria . For example you
have a SALES table with the following structure
Suppose this table contains millions of records, but all the records belong to four years only i.e. 1991, 1992, 1993
and 1994. And most of the time you are concerned about only one year i.e. you give queries like the following
select sum(amt) from sales where year=1991;
select product,sum(amt) from sales where year=1992
Group by product;
Now whenever you give queries like this Oracle will search the whole table. If you partition this table according to
year, then the performance is improve since oracle will scan only a single partition instead of whole table.
CREATING PARTITION TABLES
To create a partition table give the following statement
create table sales (year number(4),
product varchar2(10),
amt number(10,2))
partition by range (year)
partition p1 values less than (1992) tablespace u1,
partition p2 values less than (1993) tablespace u2,
partition p3 values less than (1994) tablespace u3,
partition p4 values less than (1995) tablespace u4,
partition p5 values less than (MAXVALUE) tablespace u5;
In the above example sales table is created with 5 partitions. Partition p1 will contain rows of year 1991 and
it will be stored in tablespace u1. Partition p2 will contain rows of year 1992 and it will be stored in tablespace u2.
Similarly p3 and p4.
In the above example if you don’t specify the partition p4 with values less than MAVALUE, then you will not be
able to insert any row with year above 1994.
Although not required, you can place partitions in different tablespaces. If you place partitions in different tablespaces
then you can isolate problems due to failures as only a particular partition will not be available and rest of the
partitions will still be available.
The above example the table is partition by range.
In Oracle you can partition a table by
• Range Partitioning
• Hash Partitioning
• List Partitioning
• Composite Partitioning
Range Partitioning
This type of partitioning is useful when dealing with data that has logical ranges into which it can be distributed;
for example, value of year. Performance is best when the data evenly distributes across the range
Hash partitioning
Use hash partitioning if your data does not easily lend itself to range partitioning, but you would like to partition for
performance and manageability reasons. Hash partitioning provides a method of evenly distributing data across a
specified number of partitions. Rows are mapped into partitions based on a hash value of the partitioning key
The following example shows how to create a hash partition table.
The following example creates a hash-partitioned table. The partitioning column is partno, four partitions are created
and assigned system generated names, and they are placed in four named tablespaces (tab1,tab2, ...).
CREATE TABLE products
(partno NUMBER,
description VARCHAR2 (60))
PARTITION BY HASH (partno)
PARTITIONS 4
STORE IN (tab1, tab2, tab3, tab4);
List Partitioning
Use list partitioning when you require explicit control over how rows map to partitions. You can specify a list of discrete
values for the partitioning column in the description for each partition. This is different from range partitioning, where a
range of values is associated with a partition, and from hash partitioning, where the user has no control of the row to
partition mapping.
List partitioning allows unordered and unrelated sets of data to be grouped and organized together very naturally
The following example creates a table with list partitioning
Create table customers (custcode number(5),
Name varchar2(20),
Addr varchar2(10,2),
City varchar2(20),
Bal number(10,2))
Partition by list (city),
Partition north_India values (‘DELHI’,’CHANDIGARH’),
Partition east_India values (‘KOLKOTA’,’PATNA’),
Partition south_India values (‘HYDERABAD’,’BANGALORE’,
’CHENNAI’),
Partition west India values (‘BOMBAY’,’GOA’);
If a row is inserted in the above table then oracle maps the value of city column and whichever partition list matches the
city column the row is stored in that partition.
COMPOSITE PARTITONING
Composite partitioning partitions data using the range method, and within each partition, subpartitions it using
the hash method. Composite partitions are ideal for both historical data and striping, and provide improved
manageability of range partitioning and data placement, as well as the parallelism advantages of hash partitioning.
When creating composite partitions, you specify the following:
• Partitioning method: range
• Partitioning column(s)
• Partition descriptions identifying partition bounds
• Subpartitioning method: hash
• Subpartitioning column(s)
• Number of subpartitions for each partition or descriptions of subpartitions
The following statement creates a composite-partitioned table. In this example, three range partitions are created, each
containing eight subpartitions. Because the subpartitions are not named, system generated names are assigned, but the
STORE IN clause distributes them across the 4 specified tablespaces (tab1, ...,tab4).
CREATE TABLE PRODUCTS (partno NUMBER,
description VARCHAR(32),
costprice NUMBER)
PARTITION BY RANGE (partno)
SUBPARTITION BY HASH(description)
SUBPARTITIONS 8 STORE IN (tab1, tab2, tab3, tab4)
(PARTITION p1 VALUES LESS THAN (100),
PARTITION p2 VALUES LESS THAN (200),
PARTITION p3 VALUES LESS THAN (MAXVALUE));
ALTERING PARTITION TABLES
To add a partition
You can add add a new partition to the "high" end (the point after the last existing partition). To add a partition
at the beginning or in the middle of a table, use the SPLIT PARTITION clause.
For example to add a partition to sales table give the following command.
alter table sales add partition p6 values less than (1996);
To add a partition to a Hash Partition table give the following command.
Alter table products add partition;
Then Oracle adds a new partition whose name is system generated and it is created in the default tablespace.
To add a partition by user define name and in your specified tablespace give the following command.
Alter table products add partition p5 tablespace u5;
To add a partition to a List partition table give the following command.
alter table customers add partition central_India
values (‘BHOPAL’,’NAGPUR’);
Any value in the set of literal values that describe the partition(s) being added must not exist in any of the other partitions
of the table.
Coalescing Partitions
Coalescing partitions is a way of reducing the number of partitions in a hash-partitioned table, or the number of subpartitions in a composite-partitioned table. When a hash partition is coalesced, its contents are redistributed into one or more remaining partitions determined by the hash function. The specific partition that is coalesced is selected by Oracle, and is dropped after its contents have been redistributed.
To coalesce a hash partition give the following statement.
Alter table products coalesce partition;
This reduces by one the number of partitions in the table products.
DROPPING PARTITIONS
To drop a partition from Range Partition table, List Partition or Composite Partition table give the following command.
Alter table sales drop partition p5;
Once you have drop the partition and if you have created a global index on the table. Then you have to rebuild the global index after dropping the partition by giving the following statement.
Alter index sales_ind rebuild;
To avoid rebuilding of indexes after dropping of the partitions you can also first delete all the records and then drop
the partition like this
Delete from sales where year=1994;
Alter table sales drop partition p4;
This method is most appropriate for small tables, or for large tables when the partition being dropped contains a small percentage of the total data in the table.
Another method of dropping partitions is give the following statement.
ALTER TABLE sales DROP PARTITION p5 UPDATE GLOBAL INDEXES;
This causes the global index to be updated at the time the partition is dropped.
Exchanging a Range, Hash, or List Partition
To exchange a partition of a range, hash, or list-partitioned table with a nonpartitioned table, or the reverse, use the ALTER TABLE ... EXCHANGE PARTITION statement. An example of converting a partition into a nonpartitioned table follows. In this example, table stocks can be range, hash, or list partitioned.
ALTER TABLE stocks
EXCHANGE PARTITION p3 WITH stock_table_3;
Merging Partitions
Use the ALTER TABLE ... MERGE PARTITIONS statement to merge the contents of two partitions into one partition. Te two original partitions are dropped, as are any corresponding local indexes.
You cannot use this statement for a hash-partitioned table or for hash subpartitions of a composite-partitioned table.
You can only merged two adjacent partitions, you cannot merge non adjacent partitions.
For example the merge the partition p2 and p3 into one partition p23 give the following statement.
Alter table sales merge partition p2 and p3 into
partition p23;
Modifying Partitions: Adding Values
Use the MODIFY PARTITION ... ADD VALUES clause of the ALTER TABLE statement to extend the value list of an existing partition. Literal values being added must not have been included in any other partition's value list. The partition value list for any corresponding local index partition is correspondingly extended, and any global index, or global or local index partitions, remain usable.
The following statement adds a new set of cities ('KOCHI', 'MANGALORE') to an existing partition list.
ALTER TABLE customers
MODIFY PARTITION south_india
ADD VALUES ('KOCHI', 'MANGALORE');
Modifying Partitions: Dropping Values
Use the MODIFY PARTITION ... DROP VALUES clause of the ALTER TABLE statement to remove literal values from the value list of an existing partition. The statement is always executed with validation, meaning that it checks to see if any rows exist in the partition that correspond to the set of values being dropped. If any such rows are found then Oracle returns an error message and the operation fails. When necessary, use a DELETE statement to delete corresponding rows from the table before attempting to drop values.
You cannot drop all literal values from the value list describing the partition. You must use the ALTER TABLE ... DROP PARTITION statement instead.
The partition value list for any corresponding local index partition reflects the new value list, and any global index, or global or local index partitions, remain usable.
The statement below drops a set of cities (‘KOCHI' and 'MANGALORE') from an existing partition value list.
ALTER TABLE customers
MODIFY PARTITION south_india
DROP VALUES (‘KOCHI’,’MANGALORE’);
SPLITTING PARTITIONS
You can split a single partition into two partitions. For example to split the partition p5 of sales table into two partitions give the following command.
Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));
TRUNCATING PARTITON
Truncating a partition will delete all rows from the partition.
To truncate a partition give the following statement
Alter table sales truncate partition p5;
LISTING INFORMATION ABOUT PARTITION TABLES
To see how many partitioned tables are there in your schema give the following statement
Select * from user_part_tables;
To see on partition level partitioning information
Select * from user_tab_partitions;
Answer 1: Now a days enterprises run databases of hundred of Gigabytes in size. These databases are known as
Very Large Databases (VLDB). From Oracle Ver. 8.0 Oracle has provided the feature of
table partitioning i.e. you can partition a table according to some criteria . For example you
have a SALES table with the following structure
Suppose this table contains millions of records, but all the records belong to four years only i.e. 1991, 1992, 1993
and 1994. And most of the time you are concerned about only one year i.e. you give queries like the following
select sum(amt) from sales where year=1991;
select product,sum(amt) from sales where year=1992
Group by product;
Now whenever you give queries like this Oracle will search the whole table. If you partition this table according to
year, then the performance is improve since oracle will scan only a single partition instead of whole table.
CREATING PARTITION TABLES
To create a partition table give the following statement
create table sales (year number(4),
product varchar2(10),
amt number(10,2))
partition by range (year)
partition p1 values less than (1992) tablespace u1,
partition p2 values less than (1993) tablespace u2,
partition p3 values less than (1994) tablespace u3,
partition p4 values less than (1995) tablespace u4,
partition p5 values less than (MAXVALUE) tablespace u5;
In the above example sales table is created with 5 partitions. Partition p1 will contain rows of year 1991 and
it will be stored in tablespace u1. Partition p2 will contain rows of year 1992 and it will be stored in tablespace u2.
Similarly p3 and p4.
In the above example if you don’t specify the partition p4 with values less than MAVALUE, then you will not be
able to insert any row with year above 1994.
Although not required, you can place partitions in different tablespaces. If you place partitions in different tablespaces
then you can isolate problems due to failures as only a particular partition will not be available and rest of the
partitions will still be available.
The above example the table is partition by range.
In Oracle you can partition a table by
• Range Partitioning
• Hash Partitioning
• List Partitioning
• Composite Partitioning
Range Partitioning
This type of partitioning is useful when dealing with data that has logical ranges into which it can be distributed;
for example, value of year. Performance is best when the data evenly distributes across the range
Hash partitioning
Use hash partitioning if your data does not easily lend itself to range partitioning, but you would like to partition for
performance and manageability reasons. Hash partitioning provides a method of evenly distributing data across a
specified number of partitions. Rows are mapped into partitions based on a hash value of the partitioning key
The following example shows how to create a hash partition table.
The following example creates a hash-partitioned table. The partitioning column is partno, four partitions are created
and assigned system generated names, and they are placed in four named tablespaces (tab1,tab2, ...).
CREATE TABLE products
(partno NUMBER,
description VARCHAR2 (60))
PARTITION BY HASH (partno)
PARTITIONS 4
STORE IN (tab1, tab2, tab3, tab4);
List Partitioning
Use list partitioning when you require explicit control over how rows map to partitions. You can specify a list of discrete
values for the partitioning column in the description for each partition. This is different from range partitioning, where a
range of values is associated with a partition, and from hash partitioning, where the user has no control of the row to
partition mapping.
List partitioning allows unordered and unrelated sets of data to be grouped and organized together very naturally
The following example creates a table with list partitioning
Create table customers (custcode number(5),
Name varchar2(20),
Addr varchar2(10,2),
City varchar2(20),
Bal number(10,2))
Partition by list (city),
Partition north_India values (‘DELHI’,’CHANDIGARH’),
Partition east_India values (‘KOLKOTA’,’PATNA’),
Partition south_India values (‘HYDERABAD’,’BANGALORE’,
’CHENNAI’),
Partition west India values (‘BOMBAY’,’GOA’);
If a row is inserted in the above table then oracle maps the value of city column and whichever partition list matches the
city column the row is stored in that partition.
COMPOSITE PARTITONING
Composite partitioning partitions data using the range method, and within each partition, subpartitions it using
the hash method. Composite partitions are ideal for both historical data and striping, and provide improved
manageability of range partitioning and data placement, as well as the parallelism advantages of hash partitioning.
When creating composite partitions, you specify the following:
• Partitioning method: range
• Partitioning column(s)
• Partition descriptions identifying partition bounds
• Subpartitioning method: hash
• Subpartitioning column(s)
• Number of subpartitions for each partition or descriptions of subpartitions
The following statement creates a composite-partitioned table. In this example, three range partitions are created, each
containing eight subpartitions. Because the subpartitions are not named, system generated names are assigned, but the
STORE IN clause distributes them across the 4 specified tablespaces (tab1, ...,tab4).
CREATE TABLE PRODUCTS (partno NUMBER,
description VARCHAR(32),
costprice NUMBER)
PARTITION BY RANGE (partno)
SUBPARTITION BY HASH(description)
SUBPARTITIONS 8 STORE IN (tab1, tab2, tab3, tab4)
(PARTITION p1 VALUES LESS THAN (100),
PARTITION p2 VALUES LESS THAN (200),
PARTITION p3 VALUES LESS THAN (MAXVALUE));
ALTERING PARTITION TABLES
To add a partition
You can add add a new partition to the "high" end (the point after the last existing partition). To add a partition
at the beginning or in the middle of a table, use the SPLIT PARTITION clause.
For example to add a partition to sales table give the following command.
alter table sales add partition p6 values less than (1996);
To add a partition to a Hash Partition table give the following command.
Alter table products add partition;
Then Oracle adds a new partition whose name is system generated and it is created in the default tablespace.
To add a partition by user define name and in your specified tablespace give the following command.
Alter table products add partition p5 tablespace u5;
To add a partition to a List partition table give the following command.
alter table customers add partition central_India
values (‘BHOPAL’,’NAGPUR’);
Any value in the set of literal values that describe the partition(s) being added must not exist in any of the other partitions
of the table.
Coalescing Partitions
Coalescing partitions is a way of reducing the number of partitions in a hash-partitioned table, or the number of subpartitions in a composite-partitioned table. When a hash partition is coalesced, its contents are redistributed into one or more remaining partitions determined by the hash function. The specific partition that is coalesced is selected by Oracle, and is dropped after its contents have been redistributed.
To coalesce a hash partition give the following statement.
Alter table products coalesce partition;
This reduces by one the number of partitions in the table products.
DROPPING PARTITIONS
To drop a partition from Range Partition table, List Partition or Composite Partition table give the following command.
Alter table sales drop partition p5;
Once you have drop the partition and if you have created a global index on the table. Then you have to rebuild the global index after dropping the partition by giving the following statement.
Alter index sales_ind rebuild;
To avoid rebuilding of indexes after dropping of the partitions you can also first delete all the records and then drop
the partition like this
Delete from sales where year=1994;
Alter table sales drop partition p4;
This method is most appropriate for small tables, or for large tables when the partition being dropped contains a small percentage of the total data in the table.
Another method of dropping partitions is give the following statement.
ALTER TABLE sales DROP PARTITION p5 UPDATE GLOBAL INDEXES;
This causes the global index to be updated at the time the partition is dropped.
Exchanging a Range, Hash, or List Partition
To exchange a partition of a range, hash, or list-partitioned table with a nonpartitioned table, or the reverse, use the ALTER TABLE ... EXCHANGE PARTITION statement. An example of converting a partition into a nonpartitioned table follows. In this example, table stocks can be range, hash, or list partitioned.
ALTER TABLE stocks
EXCHANGE PARTITION p3 WITH stock_table_3;
Merging Partitions
Use the ALTER TABLE ... MERGE PARTITIONS statement to merge the contents of two partitions into one partition. Te two original partitions are dropped, as are any corresponding local indexes.
You cannot use this statement for a hash-partitioned table or for hash subpartitions of a composite-partitioned table.
You can only merged two adjacent partitions, you cannot merge non adjacent partitions.
For example the merge the partition p2 and p3 into one partition p23 give the following statement.
Alter table sales merge partition p2 and p3 into
partition p23;
Modifying Partitions: Adding Values
Use the MODIFY PARTITION ... ADD VALUES clause of the ALTER TABLE statement to extend the value list of an existing partition. Literal values being added must not have been included in any other partition's value list. The partition value list for any corresponding local index partition is correspondingly extended, and any global index, or global or local index partitions, remain usable.
The following statement adds a new set of cities ('KOCHI', 'MANGALORE') to an existing partition list.
ALTER TABLE customers
MODIFY PARTITION south_india
ADD VALUES ('KOCHI', 'MANGALORE');
Modifying Partitions: Dropping Values
Use the MODIFY PARTITION ... DROP VALUES clause of the ALTER TABLE statement to remove literal values from the value list of an existing partition. The statement is always executed with validation, meaning that it checks to see if any rows exist in the partition that correspond to the set of values being dropped. If any such rows are found then Oracle returns an error message and the operation fails. When necessary, use a DELETE statement to delete corresponding rows from the table before attempting to drop values.
You cannot drop all literal values from the value list describing the partition. You must use the ALTER TABLE ... DROP PARTITION statement instead.
The partition value list for any corresponding local index partition reflects the new value list, and any global index, or global or local index partitions, remain usable.
The statement below drops a set of cities (‘KOCHI' and 'MANGALORE') from an existing partition value list.
ALTER TABLE customers
MODIFY PARTITION south_india
DROP VALUES (‘KOCHI’,’MANGALORE’);
SPLITTING PARTITIONS
You can split a single partition into two partitions. For example to split the partition p5 of sales table into two partitions give the following command.
Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));
TRUNCATING PARTITON
Truncating a partition will delete all rows from the partition.
To truncate a partition give the following statement
Alter table sales truncate partition p5;
LISTING INFORMATION ABOUT PARTITION TABLES
To see how many partitioned tables are there in your schema give the following statement
Select * from user_part_tables;
To see on partition level partitioning information
Select * from user_tab_partitions;
Thursday, 7 January 2010
Data Modelling Data Warehousing Database BI Questions and Answers Part 3
Q)What is Normalization, First Normal Form, Second Normal Form , Third Normal Form?
1.Normalization is process for assigning attributes to entities–Reduces data redundancies–Helps eliminate data anomalies–Produces controlled redundancies to link tables
2.Normalization is the analysis of functional dependency between attributes / data items of user views? I t reduces a complex user view to a set of small and stable subgroups of fields / relations
1NF:Repeating groups must be eliminated , Dependencies can be identified, All key attributes defined, No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portion of primary key, Still possible to exhibit transitive dependency, Attributes may be functionally dependent on non-key attributes
3NF: The Table is already in 2NF, Contains no transitive dependencies
1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key.
1. Eliminate Repeating Groups. - 1NF
Make a separate table for each set of related attributes, and give each table a primary key.
2. Eliminate Redundant Data - 2NF
If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3. Eliminate Columns Not Dependent On Key -3 NF
If attributes do not contribute to a description of the key, remove them to a separate table.
4 BCNF
If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.
5 Isolate Independent Multiple Relationships - 4 NF
This applies primarily to key-only associative tables, and appears as a ternary relationship, but has incorrectly merged 2 distinct, independent relationships.
6 . Isolate Semantically Related Multiple Relationships – 5 NF
Q)What are conformed dimensions?
Answer1:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
Answer2:
Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions
Q)What are the Different methods of loading Dimension tables?
Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.
Q)What are Conformed Facts?
Two facts are conformed if they have the same name, units, and definition
If two facts are do not represent the same thing to the business, then they must be given different names
Conformed fact is a fact which is used in more than one fact table.
In addition to conformed dimensions, you need conformed facts. Conforming a fact really amounts to standardizing the definitions of terms across individual marts. Often, different divisions or departments use the same term in different ways. Does “revenue” refer to “gross revenue” or “adjusted revenue”? Does “units shipped” refer to cases of items or individual items?
Q)How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension
Q)What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Q)What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
Q)What Snow Flake Schema?
Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.
Q)What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Q)What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
Q)While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
Differences between star and snowflake schemas?
Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.
What is a general purpose scheduling tool?
The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.
Q)Which columns go to the fact table and which columns go the dimension table?
The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.
Q)What are modeling tools available in the Market?
here are a number of data modeling tools
There are a number of data modeling tools to transform business requirements into logical data model, and logical data model to physical data model. From physical data model, these tools can be instructed to generate sql code for creating database.
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
Xcase RESolution LTD.
Q)How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.
Q)What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.
Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.
Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:
it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
Q)What are the vaious ETL tools in the Market?
ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.
These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method.
There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.
Various ETL tools used in market are:
1. Informatica
2. Data Stage
3. MS-SQL DTS(Integrated Services 2005)
4. Abinitio
5. SQL Loader
6. Sunopsis
7. Oracle Warehouse Bulider
8. Data Junction
Popular ETL Tools
Tool Name Company Name
Informatica Informatica Corporation
DT/Studio Embarcadero Technologies
DataStage IBM
Ab Initio Ab Initio Software Corporation
Data Junction Pervasive Software
Oracle Warehouse Builder Oracle Corporation
Microsoft SQL Server Integration Microsoft
TransformOnDemand Solonde
Transformation Manager ETL Solutions
Q)What is VLDB?
Answer 1:
VLDB stands for Very Large DataBase.
It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information.
Answer 2:
VLDB doesn’t refer to size of database or vast amount of information stored. It refers to the window of opportunity to take back up the database.
Window of opportunity refers to the time of interval and if the DBA was unable to take back up in the specified time then the database was considered as VLDB.
Answer 3
The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.
Q)What are the steps to build the datawarehouse ?
Gathering bussiness requiremnts
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attribues
Redefine Dimensions & Attributes
Organise Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional convetions:Cardinality/Adding ratios
Q)What is Difference between E-R Modeling and Dimentional Modeling.?
Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model.
E-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design
Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal form.
Q)What are the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc.
Traditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining.
Q)What is data validation strategies for data mart validation after loading process ?
Data validation is to make sure that the loaded data is accurate and meets the business requriments.
Strategies are different methods followed to meet the validation requriments
Q)what is the datatype of the surrgate key ?
Datatype of the surrgate key is either inteeger or numaric or number
Q)What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which r neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,empno
Q)What is a linked cube?
Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that the data in the cubes remain consistent.
Q)What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differed in the concept of building teh datawarehosue..
According to Kimball ...
Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
i.e.,
Kimball--First DataMarts--Combined way ---Datawarehouse
Inmon---First Datawarehouse--Later----Datamarts
Q)What is Data warehosuing Hierarchy?
Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
Levels
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.
Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.
Q)What is the main differnce between schema in RDBMS and schemas in DataWarehouse....?
RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modelled
DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model
Q)What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
Q)What are the different architecture of datawarehouse?
There are two main things
1. Top down - (bill Inmon)
2.Bottom up - (Ralph kimbol)
Q)What is an incremental loading ?
Incremental loading means loading the ongoing changes in the OLTP.
Q)What is a Degenerated Dimension ?
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..
Q)What are the possible data marts in Retail sales.?
Product information,sales information
Q)What is the definition of normalized and denormalized view and what are the differences between them?
Normalization is the process of removing redundancies.
Denormalization is the process of allowing redundancies.
Q)can a dimension table contains numeric values?
Yes.But those datatype will be char (only the values can numeric/char)
Q)What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.
Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.
Q)What is active data warehousing?
An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization's bottom line. The marketplace is coming of age as we progress from first-generation "passive" decision-support systems to current- and next-generation "active" data warehouse implementations
Q)What is the difference between
Datawarehousing and BusinessIntelligence?
Answer 1
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools.Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense.
Answer 2
Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a datawarehouse and to come into some business decision depending on the result of the analysis
What is the difference between OLAP and datawarehosue?
Datawarehouse is the place where the data is stored for analyzing
where as OLAP is the process of analyzing the data,managing aggregations,
partitioning information into cubes for indepth visualization.
What is the difference between ODS and OLTP?
ODS:- It is nothing but a collection of tables created in the Datawarehouse that maintains only current data
where as OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business
Q)Is OLAP databases are called decision support system ??? true/false?
True
Q)explain in detail about type 1, type 2(SCD), type 3 ?
Type-1
Most Recent Value
Type-2(full History)
i) Version Number
ii) Flag
iii) Date
Type-3
Current and one Perivies value
Q)What are non-additive facts in detail?
A fact may be measure, metric or a dollar value. Measure and metric are non additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
A non additive fact, for eg measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count'
Q)Types of facts ?
•Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
•Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
•Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
1.Normalization is process for assigning attributes to entities–Reduces data redundancies–Helps eliminate data anomalies–Produces controlled redundancies to link tables
2.Normalization is the analysis of functional dependency between attributes / data items of user views? I t reduces a complex user view to a set of small and stable subgroups of fields / relations
1NF:Repeating groups must be eliminated , Dependencies can be identified, All key attributes defined, No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No attribute dependent on a portion of primary key, Still possible to exhibit transitive dependency, Attributes may be functionally dependent on non-key attributes
3NF: The Table is already in 2NF, Contains no transitive dependencies
1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes, and give each table a primary key.
1. Eliminate Repeating Groups. - 1NF
Make a separate table for each set of related attributes, and give each table a primary key.
2. Eliminate Redundant Data - 2NF
If an attribute depends on only part of a multi-valued key, remove it to a separate table.
3. Eliminate Columns Not Dependent On Key -3 NF
If attributes do not contribute to a description of the key, remove them to a separate table.
4 BCNF
If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables.
BCNF covers very specific situations where 3NF misses inter-dependencies between non-key (but candidate key) attributes. Typically, any relation that is in 3NF is also in BCNF. However, a 3NF relation won't be in BCNF if (a) there are multiple candidate keys, (b) the keys are composed of multiple attributes, and (c) there are common attributes between the keys.
5 Isolate Independent Multiple Relationships - 4 NF
This applies primarily to key-only associative tables, and appears as a ternary relationship, but has incorrectly merged 2 distinct, independent relationships.
6 . Isolate Semantically Related Multiple Relationships – 5 NF
Q)What are conformed dimensions?
Answer1:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
Answer2:
Conformed dimentions are dimensions which are common to the cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are the Facts and Dimensions here D1,D2 are the Conformed Dimensions
Q)What are the Different methods of loading Dimension tables?
Conventional Load:
Before loading the data, all the Table constraints will be checked against the data.
Direct load:(Faster Loading)
All the Constraints will be disabled. Data will be loaded directly.Later the data will be checked against the table constraints and the bad data won't be indexed.
Q)What are Conformed Facts?
Two facts are conformed if they have the same name, units, and definition
If two facts are do not represent the same thing to the business, then they must be given different names
Conformed fact is a fact which is used in more than one fact table.
In addition to conformed dimensions, you need conformed facts. Conforming a fact really amounts to standardizing the definitions of terms across individual marts. Often, different divisions or departments use the same term in different ways. Does “revenue” refer to “gross revenue” or “adjusted revenue”? Does “units shipped” refer to cases of items or individual items?
Q)How are the Dimension tables designed?
Most dimension tables are designed using Normalization principles upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension
Q)What are non-additive facts?
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Q)What type of Indexing mechanism do we need to use for a typical datawarehouse?
On the fact table it is best to use bitmap indexes. Dimension tables can use bitmap and/or the other types of clustered/non-clustered, unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle supports bitmaps.
Q)What Snow Flake Schema?
Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.
Q)What is real time data-warehousing?
Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it.
Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Q)What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
Q)While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
Differences between star and snowflake schemas?
Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many relationship with other tables.
What is a general purpose scheduling tool?
The basic purpose of the scheduling tool in a DW Application is to stream line the flow of data from Source To Target at specific time or based on some condition.
Q)Which columns go to the fact table and which columns go the dimension table?
The Primary Key columns of the Tables(Entities) go to the Dimension Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as Foreign Keys.
Q)What are modeling tools available in the Market?
here are a number of data modeling tools
There are a number of data modeling tools to transform business requirements into logical data model, and logical data model to physical data model. From physical data model, these tools can be instructed to generate sql code for creating database.
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
Xcase RESolution LTD.
Q)How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.
Q)What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views.
Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects.
Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:
it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.
Q)What are the vaious ETL tools in the Market?
ETL Tools are meant to extract, transform and load the data into Data Warehouse for decision making. Before the evolution of ETL Tools, the above mentioned ETL process was done manually by using SQL code created by programmers. This task was tedious and cumbersome in many cases since it involved many resources, complex coding and more work hours. On top of it, maintaining the code placed a great challenge among the programmers.
These difficulties are eliminated by ETL Tools since they are very powerful and they offer many advantages in all stages of ETL process starting from extraction, data cleansing, data profiling, transformation, debuggging and loading into data warehouse when compared to the old method.
There are a number of ETL tools available in the market to do ETL process the data according to business/technical requirements. Following are some those.
Various ETL tools used in market are:
1. Informatica
2. Data Stage
3. MS-SQL DTS(Integrated Services 2005)
4. Abinitio
5. SQL Loader
6. Sunopsis
7. Oracle Warehouse Bulider
8. Data Junction
Popular ETL Tools
Tool Name Company Name
Informatica Informatica Corporation
DT/Studio Embarcadero Technologies
DataStage IBM
Ab Initio Ab Initio Software Corporation
Data Junction Pervasive Software
Oracle Warehouse Builder Oracle Corporation
Microsoft SQL Server Integration Microsoft
TransformOnDemand Solonde
Transformation Manager ETL Solutions
Q)What is VLDB?
Answer 1:
VLDB stands for Very Large DataBase.
It is an environment or storage space managed by a relational database management system (RDBMS) consisting of vast quantities of information.
Answer 2:
VLDB doesn’t refer to size of database or vast amount of information stored. It refers to the window of opportunity to take back up the database.
Window of opportunity refers to the time of interval and if the DBA was unable to take back up in the specified time then the database was considered as VLDB.
Answer 3
The perception of what constitutes a VLDB continues to grow. A one terabyte database would normally be considered to be a VLDB.
Q)What are the steps to build the datawarehouse ?
Gathering bussiness requiremnts
Identifying Sources
Identifying Facts
Defining Dimensions
Define Attribues
Redefine Dimensions & Attributes
Organise Attribute Hierarchy & Define Relationship
Assign Unique Identifiers
Additional convetions:Cardinality/Adding ratios
Q)What is Difference between E-R Modeling and Dimentional Modeling.?
Basic diff is E-R modeling will have logical and physical model. Dimensional model will have only physical model.
E-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design
Why fact table is in normal form?
Basically the fact table consists of the Index keys of the dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is in the normal form.
Q)What are the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business interms of Revenue (or) Employees (or) Cutomers (or) Orders etc.
Traditional approches use simple algorithms for estimating the future. But, it does not give accurate results when compared to Data Mining.
Q)What is data validation strategies for data mart validation after loading process ?
Data validation is to make sure that the loaded data is accurate and meets the business requriments.
Strategies are different methods followed to meet the validation requriments
Q)what is the datatype of the surrgate key ?
Datatype of the surrgate key is either inteeger or numaric or number
Q)What is degenerate dimension table?
Degenerate Dimensions : If a table contains the values, which r neither dimesion nor measures is called degenerate dimensions.Ex : invoice id,empno
Q)What is a linked cube?
Linked cube in which a sub-set of the data can be analysed into great detail. The linking ensures that the data in the cubes remain consistent.
Q)What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differed in the concept of building teh datawarehosue..
According to Kimball ...
Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
i.e.,
Kimball--First DataMarts--Combined way ---Datawarehouse
Inmon---First Datawarehouse--Later----Datamarts
Q)What is Data warehosuing Hierarchy?
Hierarchies
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.
When designing hierarchies, you must consider the relationships in business structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.
Levels
A level represents a position in a hierarchy. For example, a time dimension might have a hierarchy that represents data at the month, quarter, and year levels. Levels range from general to specific, with the root level as the highest or most general level. The levels in a dimension are organized into one or more hierarchies.
Level Relationships
Level relationships specify top-to-bottom ordering of levels from most general (the root) to most specific information. They define the parent-child relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex rewrites. For example, the database can aggregate an existing sales revenue on a quarterly base to a yearly aggregation when the dimensional dependencies between quarter and year are known.
Q)What is the main differnce between schema in RDBMS and schemas in DataWarehouse....?
RDBMS Schema
* Used for OLTP systems
* Traditional and old schema
* Normalized
* Difficult to understand and navigate
* Cannot solve extract and complex problems
* Poorly modelled
DWH Schema
* Used for OLAP systems
* New generation schema
* De Normalized
* Easy to understand and navigate
* Extract and complex problems can be easily solved
* Very good model
Q)What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
Q)What are the different architecture of datawarehouse?
There are two main things
1. Top down - (bill Inmon)
2.Bottom up - (Ralph kimbol)
Q)What is an incremental loading ?
Incremental loading means loading the ongoing changes in the OLTP.
Q)What is a Degenerated Dimension ?
Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table inorder eliminate unneccessary joins while retrieving order information..
Q)What are the possible data marts in Retail sales.?
Product information,sales information
Q)What is the definition of normalized and denormalized view and what are the differences between them?
Normalization is the process of removing redundancies.
Denormalization is the process of allowing redundancies.
Q)can a dimension table contains numeric values?
Yes.But those datatype will be char (only the values can numeric/char)
Q)What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.
Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results.
Q)What is active data warehousing?
An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization's bottom line. The marketplace is coming of age as we progress from first-generation "passive" decision-support systems to current- and next-generation "active" data warehouse implementations
Q)What is the difference between
Datawarehousing and BusinessIntelligence?
Answer 1
Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including meta data management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyze measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools.Think of the data warehouse as the back office and business intelligence as the entire business including the back office. The business needs the back office on which to function, but the back office without a business to support, makes no sense.
Answer 2
Simply speaking, BI is the capability of analyzing the data of a datawarehouse in advantage of that business. A BI tool analyzes the data of a datawarehouse and to come into some business decision depending on the result of the analysis
What is the difference between OLAP and datawarehosue?
Datawarehouse is the place where the data is stored for analyzing
where as OLAP is the process of analyzing the data,managing aggregations,
partitioning information into cubes for indepth visualization.
What is the difference between ODS and OLTP?
ODS:- It is nothing but a collection of tables created in the Datawarehouse that maintains only current data
where as OLTP maintains the data only for transactions, these are designed for recording daily operations and transactions of a business
Q)Is OLAP databases are called decision support system ??? true/false?
True
Q)explain in detail about type 1, type 2(SCD), type 3 ?
Type-1
Most Recent Value
Type-2(full History)
i) Version Number
ii) Flag
iii) Date
Type-3
Current and one Perivies value
Q)What are non-additive facts in detail?
A fact may be measure, metric or a dollar value. Measure and metric are non additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
A non additive fact, for eg measure height(s) for 'citizens by geographical location' , when we rollup 'city' data to 'state' level data we should not add heights of the citizens rather we may want to use it to derive 'count'
Q)Types of facts ?
•Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
•Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
•Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
Subscribe to:
Posts (Atom)