Q)What is the difference between Star Schema and Snowfleck Schema?
Star Schema:
1. Data is de-normalized
2. Cordinality is less
3. Need more data storage compare to snoflake schema
4. It's a star like structure, one central fact table surrounded by dimension tables, each dimension table is connected to fact table
5. More detailed data compare to snowflake schema
6. Performance more compare to snowflake schema
Snoflake Schema:
1. Data is normalized
2. Cordinality is more
3. Need less data storage compare to star schema
4. Here each dimension table may have child table
5. Less detailed data compare to star schema
6. Less performance compare to star schema
Q)What is a Data Warehouse?
Answer 1:
A Data Warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management decisions.
Subject Oriented: The data warehouse is oriented to the major subject areas of the corporation that have been defined in the data model. Examples of subject areas are: customer, product, activity, policy, claim, account. The major subject areas end up being physically implemented as a series of related tables in the data warehouse.
Integrated: The second salient characteristic of the data warehouse is that it is integrated. This is the most important aspect of a data warehouse. The different design decisions that the application designers have made over the years show up in a thousand different ways. Generally, there is no application consistency in encoding, naming conventions, physical attributes, measurements of attributes, key structure and physical characteristics of the data. Each application has been most likely been designed independently. As data is entered into the data warehouse, inconsistencies of the application level are undone.
Time Variant: The third salient characteristic of the data warehouse is that it is time-variant. A 5 to 10 year time horizon of data is normal for the data warehouse. Data Warehouse data is a sophisticated series of snapshots taken at one moment in time and the key structure always contains some time element.
Non Volatile: The last important characteristic of the data warehouse is that it is nonvolatile. Unlike operational data warehouse data is loaded en masse and is then accessed. Update of the data does not occur in the data warehouse environment.
Answer 2 :
A Data warehouse is a denormalized database, which stores historical data in summary level format. It is specifically meant for heavy duty querying and analysis
Answer 3 :
According to Inmon, famous author for several data warehouse books, "A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision making process".
Answer 4
Data Warehousing is open to an almost limitless range of definitions. Simply put, Data Warehouses store an aggregation of a company's data.
Data Warehouses are an important asset for organizations to maintain efficiency, profitability and competitive advantages. Organizations collect data through many sources - Online, Call Center, Sales Leads, Inventory Management. The data collected have degrees of value and business relevance. As data is collected, it is passed through a 'conveyor belt', call the Data Life Cycle Management.
An organization's data life cycle management's policy will dictate the data warehousing design and methodology.
Data Repositories
The Data Warehouse repository is the database that stores active data of business value for an organization. The Data Warehouse modeling design is optimized for data analysis.
There are variants of Data Warehouses - Data Marts and ODS. Data Marts are not physically any different from Data Warehouses. Data Marts can be though of as smaller Data Warehouses built on a departmental rather than on a company-wide level.
Data Warehouses collects data and is the repository for historical data. Hence it is not always efficient for providing up-to-date analysis. This is where ODS, Operational Data Stores, come in. ODS are used to hold recent data before migration to the Data Warehouse.
ODS are used to hold data that have a deeper history that OLTPs. Keep large amounts of data in OLTPs can tie down computer resources and slow down processing - imagine waiting at the ATM for 10 minutes between the prompts for inputs. .
Front-End Analysis
The last and most critical potion of the Data Warehouse overview are the front-end applications that business users will use to interact with data stored in the repositories.
Data Mining is the discovery of useful patterns in data. Data Mining are used for prediction analysis and classification - e.g. what is the likelihood that a customer will migrate to a competitor.
OLAP, Online Analytical Processing, is used to analyze historical data and slice the business information required. OLAPs are often used by marketing managers. Slices of data that are useful to marketing managers can be - How many customers between the ages 24-45, that live in New York state, buy over $2000 worth of groceries a month?
Reporting tools are used to provide reports on the data. Data are displayed to show relevancy to the business and keep track of key performance indicators (KPI).
Data Visualization tools is used to display data from the data repository. Often data visualization is combined with Data Mining and OLAP tools. Data visualization can allow the user to manipulate data to show relevancy and patterns.
Q)What is a dimension table?
A dimensional table is a collection of hierarchies and categories along which the user can drill down and drill up. it contains only the textual attributes.
It is a table which contains categorized data. Eg: Time dimension, Products dimension etc.
Q)Why should you put your data warehouse on a different system than your OLTP system?
Answer1:
A OLTP system is basically " data oriented " (ER model) and not " Subject oriented "(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system...
Moreover if a complex querry is fired on a OLTP system will cause a heavy overhead on the OLTP server that will affect the daytoday business directly.
Answer2:
The loading of a warehouse will likely consume a lot of machine resources. Additionally, users may create querries or reports that are very resource intensive because of the potentially large amount of data available. Such loads and resource needs will conflict with the needs of the OLTP systems for resources and will negatively impact those production systems.
Q)What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions.Retrieving the required data from the actual table, which have millions of records will take more time and also affects the server performance.To avoid this we can aggregate the table to certain required level and can use it.This tables reduces the load in the database server and increases the performance of the query and can retrieve the result very fastly.
Answer 2
Aggregate table contains the [measure] values ,aggregated /grouped/summed up to some level of hierarchy .
Q)What is Dimensional Modelling? Why is it important ?
Dimensional Modelling is a design concept used by many data warehouse desginers to build thier datawarehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measuremnets ie, the dimensions on which the facts are calculated.
Q) Why is Data Modeling Important?
Answer 1:
Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints.
The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users.
The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.
Data Modelling overview:
A Data model is a conceptual representation of data structures(tables) required for a database and is very powerful in expressing and communicating the business requirements.
A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database. A data model is comprised of two parts logical design and physical design.
Data model helps functional and technical team in designing the database. Functional team normally refers to one or more Business Analysts, Business Managers, Smart Management Experts, End Users etc., and Technical teams refers to one or more programmers, DBAs etc. Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.
Need for developing a Data Model:
A new application for OLTP(Online Transaction Processing), ODS(Operational Data Store), data warehouse and data marts.
Rewriting data models from existing systems that may need to change reports.
Incorrect data modeling in the existing systems.
A data base that has no data models.
Advantages and Importance of Data Model :
The goal of a data model is to make sure that all data objects provided by the functional team are completely and accurately represented.
Data model is detailed enough to be used by the technical team for building the physical database. The information contained in the data model will be used to define the significance of business, relational tables, primary and foreign keys, stored procedures, and triggers.Data Model can be used to communicate the business within and across businesses
Q)What is data mining?
Answer 1
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area.
Answer 2
Data Mining is the automated discovery of patterns in data. Data Mining can be used for predictive analysis in marketing to comparison of gene sequences in bio technology. Often Data Mining is used together with OLAP for data analysis.
Q)What is ETL?
Answer 1:
ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target mappings, transformation and job control parameter.
• Extraction
Take data from an external source and move it to the warehouse pre-processor database.
• Transformation
Transform data task allows point-to-point generating, modifying and transforming data.
• Loading
Load data task adds records to a database table in a warehouse.
Answer 2:
ETL is a process of extracting data (OLTP data) from source systems and transforming them into dimensional models and storing the same in a Data warehouse. ETL also involves data cleansing and data manipulation, based on the nature of the business requirement. It is often a long process, which takes around 70-80% time in a typical Data warehouse project.
Answer 3:
Extract, Transform and Load (ETL) refers to a category of tools that can assist in ensuring that data is cleansed, i.e. conforms to a standard, before being entered into the data warehouse. Vendor supplied ETL tools are considerably more easy to utilized for managing data cleansing on an ongoing basis. ETL sits in front of the data warehouse, listening for incoming data. If it comes across data that it has been programmed to transform, it will make the change before loading the data into the data warehouse.
ETL tools can also be utilized to extract data from remote databases either through automatically scheduled events or via manual intervention. There are alternatives to purchasing ETL tools and that will depend on the complexity and budget for your project. Database Administrators (DBAs) can write scripts to perform ETL functionality which can usually suffice for smaller projects. Microsoft's SQL Server comes with a free ETL tool called Data Transforming Service (DTS). DTS is pretty good for a free tool but it does has limitations especially in the ongoing administration of data cleansing.
Example of ETL vendors are Data Mirror, Oracle, IBM, Cognos and SAS. As with all product selections, list what you think you would require from an ETL tool before approaching a vendor. It may be worthwhile to obtain the services of consultants that can assist with the requirements analysis for product selection.
Q)How important is Data Cleansing and ETL to the success of Data Warehousing Projects?
What ETL is often out-of-sight and out-of-mind if the data warehouse is producing the results that match stakeholders expectations. As a results ETL has been dubbed the silent killer of data warehousing projects. Most data warehousing projects experience delays and budget overruns due to unforeseen circumstances relating to data cleansing
Q) How to Plan for Data Cleansing?
It is important is start mapping out the data that will be entered into the data warehouse as early as possible. This may change as the project matures but the documentation trail will come in extremely valuable as you will need to obtain commitments from data owners that they will not change data formats without prior notice.
Create a list of data that will require Extracting, Transforming and Loading. Create a separate list for data that has a higher likelihood of changing formats. Decide on whether you need to purchase ETL tools and set aside an overall budget. Obtain advice from experts in the field and evaluate if the product fits into the overall technical hierarchy of your organization.
Q)What is a Data mart?
Answer 1
A Data mart is a subset of a Data warehouse and is generally specific to a certain department or process, e.g. a finance data mart or a production data mart.
Answer 2
Data Marts are designed to help manager make strategic decisions about their business.
Data Marts are subset of the corporate-wide data that is of value to a specific group of users.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system, external providers or from data generated locally within a particular department or geographic area.
2.Dependent data mart – sources directly form enterprise data warehouses.
Answer 3
A data mart is a focused subset of a data warehouse that deals with a single area(like different department) of data and is organized for quick analysis
Q)What are the various approaches of building a Data warehouse?
This is a generic question: From a business perspective, it is very important to first get clarity on the end user requirements and a system study before commencing any Data warehousing project. From a technical perspective, it is important to first understand the dimensions and measures, determine quality and structure of source data from the OLTP systems and then decide which dimensional model to apply, i.e. whether we do a star or snowflake or a combination of both. From a conceptual perspective, we can either go the Ralph Kimball method (build data marts and then consolidate at the end to form an enterprise Data warehouse) or the Bill Inmon method (build a large Data warehouse and derive data marts from the same. In order to decide on the method, a strong understanding of the business requirement and data structure is needed as also consensus with the customer.
Q)What is OLAP?
Answer 1
OLAP is a multidimensional data structure from where we can view summary level information about a business, e.g. total sales in a year or total sales in a certain geography.
Answer 2
OLAP & its Hybrids
OLAP, an acronym for Online Analytical Processing is an approach that helps organization to take advantages of DATA. Popular OLAP tools are Cognos, Business Objects, Micro Strategy etc. OLAP cubes provide the insight into data and helps the topmost executives of an organization to take decisions in an efficient manner.
Technically, OLAP cube allows one to analyze data across multiple dimensions by providing multidimensional view of aggregated, grouped data. With OLAP reports, the major categories like fiscal periods, sales region, products, employee, promotion related to the product can be ANALYZED very efficiently, effectively and responsively. OLAP applications include sales and customer analysis, budgeting, marketing analysis, production analysis, profitability analysis and forecasting etc.
ROLAP:
ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database(RDBMS).
MOLAP :
MOLAP(Multidimensional OLAP), provides the analysis of data stored in a multi-dimensional data cube.
HOLAP:
HOLAP(Hybrid OLAP) a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database(RDBMS).
DOLAP:
DOLAP(Desktop OLAP or Database OLAP)provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.
Answer 3
Slice, Dice and Drill!
OLAP (or Online Analytical Processing) has been growing in popularity due to the increase in data volumes and the recognition of the business value of analytics. Until the mid-nineties, performing OLAP analysis was an extremely costly process mainly restricted to larger organizations.
The major OLAP vendor are Hyperion, Cognos, Business Objects, MicroStrategy. The cost per seat were in the range of $1500 to $5000 per annum. The setting up of the environment to perform OLAP analysis would also require substantial investments in time and monetary resources.
This has changed as the major database vendor have started to incorporate OLAP modules within their database offering - Microsoft SQL Server 2000 with Analysis Services, Oracle with Express and Darwin, and IBM with DB2.
OLAP allows business users to slice and dice data at will. Normally data in an organization is distributed in multiple data sources and are incompatible with each other. A retail example: Point-of-sales data and sales made via call-center or the Web are stored in different location and formats. It would a time consuming process for an executive to obtain OLAP reports such as - What are the most popular products purchased by customers between the ages 15 to 30?
Part of the OLAP implementation process involves extracting data from the various data repositories and making them compatible. Making data compatible involves ensuring that the meaning of the data in one repository matches all other repositories. An example of incompatible data: Customer ages can be stored as birth date for purchases made over the web and stored as age categories (i.e. between 15 and 30) for in store sales.
It is not always necessary to create a data warehouse for OLAP analysis. Data stored by operational systems, such as point-of-sales, are in types of databases called OLTPs. OLTP, Online Transaction Process, databases do not have any difference from a structural perspective from any other databases. The main difference, and only, difference is the way in which data is stored.
Examples of OLTPs can include ERP, CRM, SCM, Point-of-Sale applications, Call Center.
OLTPs are designed for optimal transaction speed. When a consumer makes a purchase online, they expect the transactions to occur instantaneously. With a database design, call data modeling, optimized for transactions the record 'Consumer name, Address, Telephone, Order Number, Order Name, Price, Payment Method' is created quickly on the database and the results can be recalled by managers equally quickly if needed.
Q)What is the Difference between OLTP and OLAP?
Answer 1
OLTP refers to transactional data or data captured at the point of transaction, whereas, OLAP is a multidimensional representation of data which throws out summary level information.
Answer 2
Main Differences between OLTP and OLAP are:-
1. User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals.
OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
2. Data Contents
OLTP: manages current data, very detail-oriented.
OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
3. Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design.
OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
Q)What is Metadata?
Answer 1
In simple words we can say that it is â data about data. It stores all information about data like where it is stored, how it is stored, how it is formatted, address of the stored location, size of the data, ownership of data, created and modified time of data etc.
Answer 2
Meta data is the data about data; Business Analyst or data modeler usually capture information about data –
the source (where and how the data is originated),
nature of data (char, varchar, nullable, existance, valid values etc) and
behavior of data (how it is modified / derived and the life cycle ) in data dictionary a.k.a
Ownership of data.
metadata. Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.
Q)What is Staging Area?
Staging area is also called â Operational Data Store (ODS). It is a data holding place where the data which is extracted from all the data sources are stored. From the Staging area, data is loaded to the data warehouse. Data cleansing takes place in this stage.
Q)What is Factless Fact Table?
Factless fact table is nothing but a fact table which doesnt have any measures..
Q)What is slowly changing dimension?
Answer 1
It is nothing but a dimension which stores historical data as well as data which changes with time. For example, in an organization, there are two departments (Dep1 and Dep2) and employee1 is in Dep1. After completing the 1st year, employee1 got promotion to Dep2. Here the data item department name of the employee dimension is changed with respect to time.
Answer 2:
SCD Stands for Slowly changing dimensions.
SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
SCD2: maintaining historical information and current information by using
A) Effective Date
B) Versions
C) Flags
or combination of these
SCD3: by adding new columns to target table we maintain historical information and current information.
Q)What is a Surrogate key?
Answer 1
Surrogate key is a unique identifier. It is used to identify each row of a table uniquely. It also helps to track slowly changing dimensions. It replaces all natural keys. Surrogate keys are system generated keys, they are not derived from any data sources.
Answer 2
surrogate key is a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Infa sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may display only the AIRPORT_NAME.
2. Adapted from response by Vincent on Thursday, March 13, 2003
Another benefit you can get from surrogate keys (SID) is :
Tracking the SCD - Slowly Changing Dimension.
Let me give you a simple, classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be in your Employee Dimension). This employee has a turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover have to belong to the new Business Unit 'BU2' but the old one should Belong to the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your datawarehouse everything would be allocated to Business Unit 'BU2' even what actualy belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1' in your Employee Dimension with a new surrogate key.
This way, in your fact table, you have your old data (before 2nd of June) with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1' + 'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your natural key: natural key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural key enlargement process, is that you might not have all part of your new key within your fact table, so you might not be able to do the join on the new enlarge key -> so you need another id.
Q)What is a Data source?
It is a place from where the data is extracted for the data warehouse. Data is Extracted, Transformed and Loaded to the data warehouse from the data source. The data source may be operational database, flat files, CVS (Comma Separated Version) files, VSAM (Virtual Storage Access method) files etc.
Q)What is Virtual Data Warehousing?
It is the process of accessing the operational database directly by the end users for decision making purposes. But in traditional data warehousing systems, we load the data from the source to the target first and then we analyse it. In virtual data warehousing concept, data warehouse is a virtual entity.
Q)What is the difference between star schema and snowflake schema?
Answer 1
1)The main difference between star schema and snowflake schema is that the star schema is highly denormalized and the snowflake schema is normalized.
2)So the data access latency is less in star schema in comparison to snowflake schema.
3)As the star schema is denormalized , the size of the data warehouse will be larger than that of snowflake schema. The schemas are selected as per the client requirements.Performance wise, star schema is good. But if memory utilization is a major concern, then snow flake schema is better than star schema.
Answer 2
Star schema contains the dimesion tables mapped around one or more fact tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.
contains indepth joins ,bcas the tbales r splitted in to many pieces.We can easily do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .
Q)What is Data Profiling?
Data Profiling is a process to find out the process information present in the source.The process or tool used for data profiling is called Data Discovery Phase.
Q)What is a Junk Dimension?
A junk dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. Junk Dimension is just a dimension that stores unwanted attributes.
Q)What is a Fact table?
Answer 1
It is a table which contains factual information of a business process. Fact table contains measures, foreign keys of all related dimension tables and attributes etc.
Answer 2
Fact Table contains the measurements or metrics or facts of business process. If your business process is "Sales" , then a measurement of this business process such as "monthly sales number" is captured in the Fact table. Fact table also contains the foriegn keys for the dimension tables.
Answer 3
The centralized table in a star schema is called as FACT table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
In the example fig 1.6 "Sales Dollar" is a fact(measure) and it can be added across several dimensions. Fact tables store different types of measures like additive, non additive and semi additive measures.
Measure Types
•Additive - Measures that can be added across all dimensions.
•Non Additive - Measures that cannot be added across all dimensions.
•Semi Additive - Measures that can be added across few dimensions and not with others.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables.
Steps in designing Fact Table
•Identify a business process for analysis(like sales).
•Identify measures or facts (sales dollar).
•Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension).
•List the columns that describe each dimension.(region name, branch name, region name).
•Determine the lowest level of summary in a fact table(sales dollar).
Q)What is fact constellation?
Fact constellation is also known as galaxy schema. It is nothing but a schema which contains multiple fact tables shares dimensions. It is a collection of star schemas which shares their dimension. So it is called as a galaxy schema.
Q)What is cube?
Answer 1
Cube is a multidimensional representation of data. It is used for analysis purpose. A cube gives multiple views of data
Answer 2
Cubes are logical representation of multidimensional data.The edge of the cube contains dimension members and the body of the cube contains data values.
Q)What is drill-down and drill-up?
Both drill-down and drill-up are used to explore different levels of dimensionally modeled data. Drill-down allows the users view lower level (i.e. more detailed level) of data and drill-up allows the users to view higher level (i.e. more summarized level) of data.
What is the need of building a data warehouse?
The need of building a data warehouse is that, it acts as a storage fill for a large amount of data. It also provides end user access to a wide varity of data, helps in analyzing data more effectively and also in generating reports. It acts as a huge repository for integrated information.
Q)What is Data Modeling? What are the different types of Data Modeling?
Data modeling is a process of creating data models. In other words, it is structuring and organizing data in a uniform manner where constraints are placed within the structure.The Data structure formed are maintained in a database management system. The Different types of Data Modeling are: 1. Dimension Modelling 2. E-R Modelling
Q)What are the different types of Data models?
Conceptual Data Model, Logical Data Model and Physical Data Model
Q)What are the different types of OLAP TECHNOLOGY?
Online Analytical process is of three types, they are MOLAP, HOLAP and ROLAP. MOLAP Mulidimensional online analytical process. It is used for fast retrival of data and also for slicing and dicing operations. It plays a vital role in easing complex calculations. ROLAP . Relational online analytical process. It has the ability to handle large amount of data. HOLAP. Hybrid online analytical process. It is a combination of both HOLAP and MOLAP.
Q)What is the difference between a Database and a Data warehouse ?
Database is a place where data is taken as base to data access to retrieve and load data, whereas, a data warehouse is a place where application data is managed for analysis and reporting services.
Database stores data in the form of tables and columns. On the contrary, in a data warehouse, data is subject oriented and stored in the form of dimensions and packages which are used for analysis purpose.
In short, we must understand that a database is used for running an enterprise but a data warehouse helps in how to run an enterprise.
Q)What does level of Granularity of a fact table signify?
Answer 1
Granularity
The first step in designing a fact table is to determine the granularity of the fact table. By granularity, we mean the lowest level of information that will be stored in the fact table. This constitutes two steps:
Determine which dimensions will be included.
Determine where along the hierarchy of each dimension the information will be kept.
The determining factors usually goes back to the requirements
Answer 2
Level of granularity means level of detail that you put into the fact table in a data warehouse. For example: Based on design you can decide to put the sales data in each transaction. Now, level of granularity would mean what detail are you willing to put for each transactional fact. Product sales with respect to each minute or you want to aggregate it upto minute and put that data.
Why are OLTP database designs not generally a good idea for a Data Warehouse?
Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.
Q)What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.
Q)What are the various Reporting tools in the Market?
1. MS-Excel
2. Business Objects (Crystal Reports)
3. Cognos (Impromptu, Power Play)
4. Microstrategy
5. MS reporting services
6. Informatica Power Analyzer
7. Actuate
8. Hyperion (BRIO)
9. Oracle Express OLAP
10. Proclarity
Links to my blogs related to Data Modeling.
DataModeling in Telecom
Steps to convert Logical Data Model to Physical Data Model
Application of Shared Information Data Model to create Common Data Model and Data Service
Tuesday, 13 May 2008
Subscribe to:
Posts (Atom)