Friday, 29 January 2010

Very Good Books on Data Modeling,DWH,BI

Data Model Resource Book Part 1
Data Model Resource Book Part 2
Data Modleing Essentials
Datawarehouse life cycle Toolkit

Monday, 25 January 2010

Master Data Management

Master Data Management is very important part of Data Modeling or Data Architecture.

MDM provides the single view about the entities in the Enterprise.

The candidate entities which are the part of MDM are as below:

Customer
Customer Address
Product Type
Product
Supplier/Vendor Information
Product Type
Payment Type
SLA Type
Country
State
City

In MDM we mainatin the data which is not frequently change and provide only the single view of the data which can be used by any business process of the organization.It also removes the data in consistency and provide more data quality to the applications and enterprise itself.

Thursday, 14 January 2010

Blog on Data Modeling in Telecom

Find the URL of my blog

Data Modeling in Telecom Blog

Blog on Erwin Concepts

Find the URL:

Erwin Concepts

My blog on steps required to convert logical data model to physical data model

Hi All,

Please find the URL of my blog which describes the steps to convert the Logical Data Model into Physical Data Model.

http://ldmtopdm.blogspot.com/

Regards,
Sandeep

Tuesday, 12 January 2010

Role and responsibilities of Data Architect/Integration Architect

Profile as Data Architect/Integration Architect

•Understanding the customer’s business requirements, business processes and system functionality to develop the data design and maintain the source of data.
•Responsible for developing and maintaining a formal description of the data and data structures - this can include data definitions, data models [Conceptual, Logical and Physical Data Model, data dictionaries, data flow diagrams, etc. (in short metadata). Data architecture includes topics such as metadata management, business semantics, data modeling and metadata workflow management.
•The architecture covers databases, data integration and the means to get to the data
•Resolve the end to end data issues, design issues and end to end test issues with the help of data dictionaries.
•Provides the means of system integration by understanding the source and target data and build the data mapping.
•High-level data architecture documentation, which illustrates the source and destination
of data to and/or from approved data sources
•Development of a data strategy and design specifications
•Design strategy and design for managing history data
•Selection of capabilities and systems to meet business information needs
•Evaluating new techniques for enhanced functionality

Friday, 8 January 2010

SQL PL SQL Oracle Database Questions Answers

EMP Table :

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN7698 22-FEB-81 1250 500 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7654 MARTIN SALESMAN7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7839 KING PRESIDENT 17-NOV-81 5000 10
7844 TURNER SALESMAN7698 08-SEP-81 1500 0 30
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7900 JAMES CLERK 7698 03-DEC-81 950 30
7902 FORD ANALYST 7566 03-DEC-81 3000 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10

Dept :

DEPTNO DNAME LOC
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON

SALGrade:

GRADE LOSAL HISAL
1 700 1200
2 1201 1400
3 1401 2000
4 2001 3000
5 3001 9999

Q)How do I eliminate the duplicate rows ?

delete from table_name where rowid not in (select max(rowid) from table group by duplicate_values_field_name);
select max(rowid) from scott.emp group by job

Looks like 5 rows will remain after the deletions

MAX(ROWID)
AAAMgHAAEAAAAAeAAG
AAAMgHAAEAAAAAeAAM
AAAMgHAAEAAAAAeAAI
AAAMgHAAEAAAAAeAAJ
AAAMgHAAEAAAAAeAAN

Then get rid of the remaining duplicate rows.
Delete from emp where rowed not in(select max(rowid) from scott.emp group by job )

Q)How to identify the duplicate rows ?

select job, count(*) from scott.emp group by job having count(*)>1

JOB COUNT(*)
MANAGER 3
ANALYST 2
SALESMAN4
CLERK 4

Q)How do I display row number with records?

To achive this use rownum pseudocolumn with query, like SQL> SQL> select rownum, ename from emp;
Output:
1 Scott
2 Millor
3 Jiyo
4 Smith

Q)if commission is null then the text “Not Applicable” want to display, instead of blank space. How do I write the query?

select nvl(to_char(comm),'NA') from scott.emp;

NVL(TO_CHAR(COMM),'NA')
NA
300
500
NA
1400
NA
NA
NA
NA
0
NA
NA
NA
NA


Q)Display the records between two range

select rownum, empno, ename from scott.emp where rowid in
(select rowid from scott.emp where rownum <=14
minus
select rowid from scott.emp where rownum<7);

ROWNUM EMPNO ENAME
1 7782 CLARK
2 7788 SCOTT
3 7839 KING
4 7844 TURNER
5 7876 ADAMS
6 7900 JAMES
7 7902 FORD
8 7934 MILLER

Q)Oracle cursor : Implicit & Explicit cursors

Oracle uses work areas called private SQL areas to create SQL statements.
PL/SQL construct to identify each and every work are used, is called as Cursor.
For SQL queries returning a single row, PL/SQL declares all implicit cursors.
For queries that returning more than one row, the cursor needs to be explicitly declared.

Q)Explicit Cursor attributes
There are four cursor attributes used in Oracle
cursor_name%Found, cursor_name%NOTFOUND, cursor_name%ROWCOUNT, cursor_name%ISOPEN

Q)Implicit Cursor attributes
Same as explicit cursor but prefixed by the word SQL
SQL%Found, SQL%NOTFOUND, SQL%ROWCOUNT, SQL%ISOPEN

Q)What is the maximum number of triggers, can apply to a single table?
12 triggers

Q)What are PL/SQL Cursor Exceptions?
Cursor_Already_Open, Invalid_Cursor

Q)Any three PL/SQL Exceptions?
Too_many_rows, No_Data_Found, Value_Error, Zero_Error, Others

Q)Display Odd/ Even number of records

Odd number of records:

select * from emp where (rowid,1) in (select rowid, mod(rownum,2) from emp);
Even number of records:
select * from emp where (rowid,0) in (select rowid, mod(rownum,2) from emp)

Q)Find out nth highest salary from emp table

SELECT DISTINCT (a.sal) FROM EMP A WHERE &N = (SELECT COUNT (DISTINCT (b.sal)) FROM EMP B WHERE a.sal<=b.sal);

For explanation let a table emp with sal column like below
sal
1200
1300
1500
1200
1250
1700
1250
2000

See DISTINCT word is there in the query
So, you have to find the distinct sal first.
sal
1200
1300
1500
1250
1700
2000
Now see the condition a.sal<= b.sal

This condition compares a.sal and b.sal. The COUNT counts
how many times the a.sal is less than or equal to b.sal,
and gives this value as the output of sub query.
(N.B. comparing to same value means EQUAL SO count is 1).
This count is the value of N.

So after comparision the value of N for different salaries
are like
sal N
1200 6
1300 4
1500 3
1250 5
1700 2
2000 1

Now on querying when you give a value of N the
corresponding value of salary is selected.
Like if you are giving N=2 , then 1700 is displayed.

Q)What is correlated sub-query ?

Correlated sub query is a sub query which has reference to the main query

Q)What is a Non- corelated subquery

Non corelated subquery, where the subquery has been executed once for the entire parent statement.

Q)What are the usage of SAVEPOINTS ?

SAVEPOINTS are used to subdivide a transaction into smaller parts. It enables rolling back part of a transaction. Maximum of five save points are allowed.

Q)What is meant by Scrollable cursor

A scrollable cursor, however, can move forward and backward, and can seek any desired record in the cursor. Such operations are common in applications that present results sets in scrolling windows. With a scrollable cursor, application developers do not need to create and manage their own buffer for the records.

Q)What are the more common pseudo-columns

SELECT rownum, rowid FROM sometable
pseudo column is a column that is not present in the database.
it is used for unique identification of rows like
ROWNUM,ROWID,LEVEL,NEXTVAL,CURVAL,NULL etc.

Q)Where the integrity constraints are stored in Data Dictionary ?

The integrity constraints are stored in USER_CONSTRAINTS.

Q)What is ON DELETE CASCADE ?

When ON DELETE CASCADE is specified ORACLE maintains referential integrity by automatically removing dependent foreign key values if a referenced primary or unique key value is removed.

Q)What is a join ? Explain the different types of joins ?

A join is used to combine rows from multiple tables. A join is performed whenever two or more tables is listed in the FROM clause of an SQL statement.

The types of joins that you learn are

EQUIJOINS
NATURAL JOINS
NON-EQUIJOINS
OUTER JOINS
SELF JOINS

Equijoin : Perhaps the most used and important of the joins is the EQUIJOIN, also referred to as an INNER JOIN. The EQUIJOIN joins two tables with a common column in which each is usually the primary key.

SELECT EMPLOYEE_TBL.EMP_ID,
EMPLOYEE_PAY_TBL.DATE_HIRE
FROM EMPLOYEE_TBL,
EMPLOYEE_PAY_TBL
WHERE EMPLOYEE_TBL.EMP_ID = EMPLOYEE_PAY_TBL.EMP_ID;

Natural Join :
A natural join is a join statement that compares the common columns of both tables with each other. One should check whether common columns exist in both tables before doing a natural join.

Natural joins may cause problems if columns are added or renamed. Also, no more than two tables can be joined using this method. So, it is best to avoid natural joins as far as possible.

Examples
This is the same as an equi join on (emp.deptno = dept.deptno):

SELECT dname, ename FROM dept NATURAL JOIN emp
Same query on the HR sample schema:
SELECT department_name, first_name||' '||last_name
FROM departments NATURAL JOIN employees;

Nonequi join:

An nonequi (or theta) join is a join statement that uses an unequal operation (i.e: <>, >, <, !=, BETWEEN, etc.) to match rows from different tables. The converse of an nonequi join is a equi join operation.

Using Oracle join syntax:

SELECT e.ename, e.sal, s.grade
FROM emp e, salgrade s
WHERE e.sal BETWEEN s.losal and s.hisal;
Using ANSI join syntax:
SELECT e.ename, e.sal, s.grade
FROM emp e INNER JOIN salgrade s
ON e.sal BETWEEN s.losal and s.hisal;

Self join:

A self join is a join in which a table is joined with itself. For example, when you require details about an employee and his manager (also an employee).
Oracle join syntax:

SELECT e1.ename||' works for '||e2.ename "Employees and their Managers"
FROM emp e1, emp e2
WHERE e1.mgr = e2.empno;
ANSI join syntax (Oracle 9i and above):
SELECT e1.ename||' works for '||e2.ename "Employees and their Managers"
FROM emp e1
JOIN emp e2
ON (e1.mgr = e2.empno);

Outer Joins :

An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join condition and also returns some or all of those rows from one table for which no rows from the other satisfy the join condition.

To write a query that performs an outer join of tables A and B and returns all rows from A (a left outer join), use the ANSI LEFT [OUTER] JOIN syntax, or apply the outer join operator (+) to all columns of B in the join condition. For all rows in A that have no matching rows in B, Oracle returns null for any select list expressions containing columns of B.

To write a query that performs an outer join of tables A and B and returns all rows from B (a right outer join), use the ANSI RIGHT [OUTER] syntax, or apply the outer join operator (+) to all columns of A in the join condition. For all rows in B that have no matching rows in A, Oracle returns null for any select list expressions containing columns of A.
To write a query that performs an outer join and and returns all rows from A and B, extended with nulls if they do not satisfy the join condition (a full outer join), use the ANSI FULL [OUTER] JOIN syntax.

For example the following query returns all the employees and department names and even those department names where no employee is working.

select e.empno,e.ename,e.sal,e.deptno,d.dname,d.city from emp e, dept d where e.deptno(+)=d.deptno;
That is specify the (+) sign to the column which is lacking values.

Cartesian Products :

If two tables in a join query have no join condition, Oracle returns their Cartesian product. Oracle combines each row of one table with each row of the other. A Cartesian product always generates many rows and is rarely useful. For example, the Cartesian product of two tables, each with 100 rows, has 10,000 rows. Always include a join condition unless you specifically need a Cartesian product

Q)What is the Subquery ?

Sub query is a query whose return values are used in filtering conditions of the main query. Subqueries will be executed once for the entire parent statement.

Q)Can a view be updated/inserted/deleted? If Yes under what conditions ?
A View can be updated/deleted/inserted if it has only one base table if the view is based on columns from one or more tables then insert, update and delete is not possible.

Q)How many LONG columns are allowed in a table ? Is it possible to use LONG columns in WHERE clause or ORDER BY ?

Only one LONG columns is allowed. It is not possible to use LONG column in WHERE or ORDER BY clause.

Q)What is Referential Integrity ?

Maintaining data integrity through a set of rules that restrict the values of one or more columns of the tables based on the values of primary key or unique key of the referenced table.

Q)What is ROWID ?

ROWID is a pseudo column attached to each row of a table. It is 18 character long, blockno, rownumber are the components of ROWID.

Q)Drop the column
There are 2 ways of doing it :
alter table"table name" drop column "column name";

This drops the column immediately. However if there is huge data and you would like to postpone the task of dropping the columns you can make the columns unused and drop the unused columns during the weekend or less peak activity time.

Q)UNION,INTERSECT,MINUS,UNION ALL ?

Then, a few values are inserted:
alter session set nls_date_format='dd.mm.yyyy';

insert into table_1 values ( 3, 'hello' , to_date('28.08.1970'));
insert into table_1 values ( 42, 'galaxy', to_date('01.01.2001'));
insert into table_1 values (100, 'bye' , to_date('09.02.2004'));
insert into table_2 values ( 3, 'bye' , to_date('28.08.1970'));
insert into table_2 values ( 42, 'galaxy', to_date('01.01.2001'));
insert into table_2 values ( 60, 'bye' , to_date('09.02.2004'));
insert into table_2 values ( 3, 'hello' , to_date('05.05.2002'));

union all:

union all selects all rows from all select statements:

select col_1, col_2, col_3 from table_1 union all
select col_1, col_2, col_3 from table_2;

As can be seen, all records of both tables are returned:
COL_1 COL_2 COL_3
---------- ---------- ----------
3 hello 28.08.1970
42 galaxy 01.01.2001
100 bye 09.02.2004
3 bye 28.08.1970
42 galaxy 01.01.2001
60 bye 09.02.2004
3 hello 05.05.2002

union :

union all is very similar to union, however, it dismisses duplicate rows found across different select statements:

select col_1, col_2, col_3 from table_1 union
select col_1, col_2, col_3 from table_2;

The galaxy record is a duplicate. Hence, it is returned only once:
COL_1 COL_2 COL_3
---------- ---------- ----------
3 bye 28.08.1970
3 hello 28.08.1970
3 hello 05.05.2002
42 galaxy 01.01.2001
60 bye 09.02.2004
100 bye 09.02.2004

intersect:

intersect only returns the rows that are found in all select statements:

select col_1, col_2, col_3 from table_1 intersect
select col_1, col_2, col_3 from table_2;

Only the galaxy record is returned. It's the only record that is stored in both tables:
COL_1 COL_2 COL_3
---------- ---------- ----------
42 galaxy 01.01.2001

minus:

minus returns all rows from the first select statements except those who are duplicated in a following select statement:

select col_1, col_2, col_3 from table_1 minus
select col_1, col_2, col_3 from table_2;

As the galaxy record is found in both tables, it is removed from the first table's record set:
COL_1 COL_2 COL_3
---------- ---------- ----------
3 hello 28.08.1970
100 bye 09.02.2004

The following query is valid:
SELECT 3 FROM DUAL
INTERSECT
SELECT 3f FROM DUAL;

This is implicitly converted to the following compound query:
SELECT TO_BINARY_FLOAT(3) FROM DUAL
INTERSECT
SELECT 3f FROM DUAL;

The following query returns an error:
SELECT '3' FROM DUAL
INTERSECT
SELECT 3f FROM DUAL;

Q)What is CYCLE/NO CYCLE in a Sequence ?

CYCLE specifies that the sequence continues to generate values after reaching either maximum or minimum value. After pan ascending sequence reaches its maximum value, it generates its minimum value. After a descending sequence reaches its minimum, it generates its maximum.NO CYCLE specifies that the sequence cannot generate more values after reaching its maximum or minimum value.

Q)How will you a activate/deactivate integrity constraints ?

The integrity constraints can be enabled or disabled by ALTER TABLE ENABLE constraint/DISABLE constraint.

Q)What is difference between CHAR and VARCHAR2 ? What is the maximum SIZE allowed for each type ?

CHAR pads blank spaces to the maximum length. VARCHAR2 does not pad blank spaces. For CHAR it is 255 and 2000 for VARCHAR2.

Q)Database link

Database Link is a named path through which a remote database can be accessed.

Q)What are the data types allowed in a table ?

CHAR,VARCHAR2,NUMBER,DATE,RAW,LONG and LONG RAW

Q)What is the maximum number of triggers, can apply to a single table?

Insert/Update/Delete :- 3
Before/After:- 2
Row Level/Statement Level:-2

Hence 3*2*2

Q)What is an integrity constraint?

Integrity constraint restricts values of a column so that only meaningful values as per the business logic of the table be entered into the column.

Integrity constraints are of the following types:

Not null
Unique
Primary Key
Foreign Key (referential integrity)
Check -- custom constraints e.g. check sal > 0, check joiningdate > date()

constraints; which enforces rules on a table columns which stops to enter unnecessary data

while inserting data or updating the data there are 5 types of constraints;
1) not null;This constraint doesn't allow null values.

2)Unique; The constraint associated with unique column keeps allow uniquely.

3)Primary key(primary+notnull) This constraint doesn't allow repeated values & null values.

4)check;This constraint checks the condition weather satisfied or not.

5)Foreign key;This constraint is used as reference key between the columns of two tables

Q)How do I replace a comma (,) with a blank in a select statement?

select empno||' '||ename from emp;

Display the number value in Words
select sal, (to_char(to_date(sal,'j'), 'jsp'))
from emp;

Q)If a view on a single base table is manipulated will the changes be reflected on the base table
an Insert,Update,delete can be done through views is
1. inserts/updates/deletes done only in one base table at a time
2. primary key columns should be part of view
3. columns which are going to be updated must be part of vew def.
4. it shouldn't voielet the contraints

Q)How to access the current value and next value from a sequence
I would like to give you a small example to use the sequence.currval and sequence.nextval

create sequence seq_name

start with 1

minvalue 1

maxvalue 999

increment by 1

nocycle

insert into table_name (sno,name) values (seqname.nextval,'abc');

select seqname.currval from dual

Q)If unique key constraint on DATE column is created, will it validate the rows that are inserted with SYSDATE

If there is a unique key defined on a column which has date as the data type then one can insert the same date more than once.The date is always stored in the format dd-mon-yyyy hh:mi:ss.The reason why it will accept the same date again is that the time taken for you to enter the date for the first time and entering the date again for the second time there will be a gap of atleast one secon which makes that date value unique. ONE CAN INSERT THE SAME DATE VALUE IN A UNIQUE COLUMN MORE THAN ONCE.

Q)Where the integrity constraints are stored in data dictionary

The integrity constraints are stored in USER_CONSTRAINTS.

Q)Is it possible to use LONG columns in WHERE clause or ORDER BY
No

Q)Difference between SQL and SQL* Plus

SQL*PLUS is a command line tool where as SQL and PL/SQL language interface and reporting tool. Its a command line tool that allows user to type SQL commands to be executed directly against an Oracle database. SQL is a language used to query the relational database(DML,DCL,DDL). SQL*PLUS commands are used to format query result, Set options, Edit SQL commands and PL/SQL.

Q)Difference between procedure and function.

Functions are named PL/SQL blocks that return a value and can be called with arguments procedure a named block that can be called with parameter. A procedure all is a PL/SQL statement by itself, while a Function call is called as part of an expression.

Q)What is an UTL_FILE.What are different procedures and functions associated with it?

UTL_FILE is a package that adds the ability to read and write to operating system files. Procedures associated with it are FCLOSE, FCLOSE_ALL and 5 procedures to
output data to a file PUT, PUT_LINE, NEW_LINE, PUTF, FFLUSH.PUT, FFLUSH.PUT_LINE,FFLUSH.NEW_LINE. Functions associated with it are FOPEN, ISOPEN
Cursor Oracle uses work area to execute SQL statements and store processing information PL/SQL construct called a cursor lets you name a work area and access its stored information A cursor is a mechanism used to fetch more than one row in a Pl/SQl block.
Cursor is a variable.

it is similar to a 2D array .
used for processing multiple rows.
used for storing data temporarily.

Q)Which is more faster - IN or EXISTS?
EXISTS is more faster than IN because EXISTS returns a Boolean value whereas IN returns a value.

Q)What are different Oracle database objects?

TABLES
VIEWS,
INDEXES,
SYNONYMS,
SEQUENCES,
TIGGERS,
PACKAGES,
PROCEDURE,
FUNCTION,

Q)Can a primary key contain more than one columns?

primary key not allow null values,unique key allow only one null value.In table only one primary is allowed,unique key as many as it can allow. In each table each column it can allow only one primary key the same table more than one primary key then it is called
composite primary key

Q)What are different modes of parameters used in functions and procedures?
-IN
-OUT
-INOUT

Q)Difference between NO DATA FOUND and %NOTFOUND

NO DATA FOUND is an exception raised only for the SELECT....INTO statements when the where clause of the query does not match any rows. When the where clause of the explicit cursor does not match any rows the %NOTFOUND attribute is set to TRUE instead.

________________________________________
NO_DATA_FOUND is pre defind exception of oracle.
when we can't get any data
from the table for any query.
In that case we will get error.
%NOT FOUND is one of the attribute
of explicit cursor.
when can't get any data from explicit cursor in that case
%NOT FOUND will returns true
otherwise it returns false.
Outer Join
OUTER join means fetched all the rows that is based on condition,as well as retrieve all rows that is not based on condition.

Q)What is the purpose of a cluster?

Oracle does not allow a user to specifically locate tables, since that is a part of the function of the RDBMS. However, for the purpose of increasing performance, oracle allows a developer to create a CLUSTER. A CLUSTER provides a means for storing data from different tables together for faster retrieval than if the table placement were left to the RDBMS.
it's a temperory and logical memory allocatiopn where we can put several tables.

Q)What is difference between SUBSTR and INSTR?

SUBSTR:-
returns a specified portion of a string
eg:- SUBSTR('GANGAA',5)
->output GANGA

INSTR:-
provides character position
found in a string. eg INSTR('KHAIRATABAD','A',3)
->output 8 (3RD occurrence of 'A')

SUBSTR returns a specified portion of a string eg SUBSTR('BCDEF',4) output BCDE
INSTR provides character position in which a pattern is found in a string. eg INSTR('ABC-DC-F','-',2) output 7 (2nd occurence of '-')

Q)What is difference between UNIQUE and PRIMARY KEY constraints?

A table can have only one PRIMARY KEY whereas there can be any number of UNIQUE keys. The columns that compose PK are automatically define NOT NULL, whereas a column that compose a UNIQUE is not automatically defined to be mandatory must also specify the column is NOT NULL.

Q)There is a '%' sign in one field of a column. What will be the query to find it ?

'\' Should be used before '%'.
You need to use INSTR function in the WHERE clause as follows :

select ename from emp
where instr(ename,'%',1,1) > 1

Q)What is OCI. What are its uses?

Oracle Call Interface is a method of accesing database from a 3GL program. Uses--No precompiler is required,PL/SQL blocks are executed like other DML statements. The OCI library provides
--functions to parse SQL statemets
--bind input variables
--bind output variables
--execute statements
--fetch the results

Q)Can a function take OUT parameters. If not why?

No. A function has to return a value,an OUT parameter cannot return a value.

Q)How you open and close a cursor variable.Why it is required?

OPEN cursor variable FOR SELECT...Statement
CLOSE cursor variable In order to associate a cursor variable with a particular SELECT statement OPEN syntax is used. In order to free the resources used for the query CLOSE statement is used.

Q)What a SELECT FOR UPDATE cursor represent.

SELECT......FROM......FOR......UPDATE[OF column-reference][NOWAIT] The processing done in a fetch loop modifies the rows that have been retrieved by the
cursor. A convenient way of modifying the rows is done by a method with two parts: the FOR UPDATE clause in the cursor declaration, WHERE CURRENT OF CLAUSE
in an UPDATE or declaration statement.

Q)Difference between foreign key and primary key?

1)Primary key is unique key but foriegn key always refers to
primary key.

2)Primary keys enforce entity integrity by uniquely
identifying entity instances. Foreign keys enforce.

3)Primary key is unique
primary key is not NULL and

foreign key is NULL
foreign key reference as Primary key in another table.

4) primary key is used to identify a row and it doesnot allow
null values. It avoids duplication of rows.

whereas foreign key refers to a column or combination of
columns included in the defenition of referential integrity.


referential integrity by completing an association between
two entities.

Q)What should be the return type for a cursor variable.Can we use a scalar data type as return type?

The return type for a cursor must be a record type.It can be declared explicitly as a user-defined or %ROWTYPE can be used. eg TYPE t_studentsref IS REF CURSOR
RETURN students%ROWTYPE

Q)Can you use a commit statement within a database trigger?
no,we can't use commit statement in the trigger,we are using trigger for checking purpose.

Q)What is difference between a formal and an actual parameter?

The variables declared in the procedure and which are passed, as arguments are called actual, the parameters in the procedure declaration. Actual parameters contain the values that are passed to a procedure and receive results. Formal parameters are the placeholders for the values of actual parameters

Q)What is a cursor for loop?

Cursor For Loop is a loop where oracle implicitly declares a loop variable, the loop index that of the same record type as the cursor's record.

________________________________________
When we use for loops,there is no need to declare explicit cursor.
Only case of for loop,cursor is implicitly open and after fetching the data cursor is implicitly closed.
Example given below
SET SERVEROUTPUT ON
BEGIN
FOR emp_record IN (SELECT last_name,department_id FROM employees) LOOP
IF emp_record.department_id=80 THEN
DBMS_OUTPUT.PUT_LINE('employees'||'emp_record.last_name'||'works for sales department');
END IF;
END LOOP;--implicit cursor closed
END;

Q)Difference between an implicit & an explicit cursor.

PL/SQL declares a cursor implicitly for all SQL data manipulation statements, including quries that return only one row. However,queries that return more than one row you must declare an explicit cursor or use a cursor FOR loop. Explicit cursor is a cursor in which the cursor name is explicitly assigned to a SELECT statement via the CURSOR...IS statement. An implicit cursor is used for all SQL
statements Declare, Open, Fetch, Close. An explicit cursors are used to process multirow SELECT statements An implicit cursor is used to process INSERT, UPDATE, DELETE and single row SELECT. .INTO statements.

________________________________________
The implicit cursor is used to process INSERT, UPDATE,
DELETE, and SELECT INTO statements. During the processing of
an implicit cursor,Oracle automatically performs the OPEN,
FETCH, and CLOSE operations.

Where as in explicit cursors,the process of its working is
done in 4 steps namely DECLARE a cursor,OPEN a cursor,
FETCH from cursor and CLOSE a
cursor.

IMPLICT CURSOR:- Automatically porvide by oracle which
perform DML statements. queries returns only one row.

EXPLICT CURSOR:- Defined by user. queries returns more
than rows.

Explicit Cursor:-We are not able to Handle NO_DATA_FOUND
Exception.

Implicit Cursor:-We are able to Handle NO_DATA_FOUND
Exception.

Q)There is a string 120000 12 0 .125 , how you will find the position of the decimal place?
INSTR('120000 12 0 .125',1,'.')

output 13

Q)Which datatype is used for storing graphics and images?
LONG RAW data type is used for storing BLOB's (binary large objects).


Q)When do you use WHERE clause and when do you use HAVING clause?

HAVING clause is used when you want to specify a condition for a group function and it is written after GROUP BY clause. The WHERE clause is used when you want to specify a condition for columns, single row functions except group functions and it is written before GROUP BY clause if it is used.
1.where clause is used to restrict the row ,basically where clause operate on single row function

2.Having clause is used to restrict the groups that means which groups to be displayed.
where is used to restrict the rows
having is used to restrict the group


Q)What is difference between Rename and Alias?

Rename is a permanent name given to a table or column whereas Alias is a temporary name given to a table or column which do not exist once the SQL statement is executed.
Rename;This is a DDL command which is used to the change the name of an object whereas alias is the alternate name given to the table while selecting this alias is used generally to join a same table.

Oracle SQL PL Sql Questions Answers

What are the indexes in Oracle ?

An index is a schema object that contains an entry for each value that appears in the indexed column(s) of the table or cluster and provides direct, fast access to rows. Oracle Database supports several types of index:
Normal indexes. (By default, Oracle Database creates B-tree indexes.)
Bitmap indexes, which store rowids associated with a key value as a bitmap
Partitioned indexes, which consist of partitions containing an entry for each value that appears in the indexed column(s) of the table
Function-based indexes, which are based on expressions. They enable you to construct queries that evaluate the value returned by an expression, which in turn may include built-in or user-defined functions.
Domain indexes, which are instances of an application-specific index of type indextype

Q)What is difference between IN and NOT IN ?

SQL: "IN" Function
________________________________________
The IN function helps reduce the need to use multiple OR conditions.
The syntax for the IN function is:
SELECT columns
FROM tables
WHERE column1 in (value1, value2, .... value_n);
This SQL statement will return the records where column1 is value1, value2..., or value_n. The IN function can be used in any valid SQL statement - select, insert, update, or delete.

Example #1
The following is an SQL statement that uses the IN function:
SELECT *
FROM suppliers
WHERE supplier_name in ( 'IBM', 'Hewlett Packard', 'Microsoft');
This would return all rows where the supplier_name is either IBM, Hewlett Packard, or Microsoft. Because the * is used in the select, all fields from the suppliers table would appear in the result set.
It is equivalent to the following statement:
SELECT *
FROM suppliers
WHERE supplier_name = 'IBM'
OR supplier_name = 'Hewlett Packard'
OR supplier_name = 'Microsoft';
As you can see, using the IN function makes the statement easier to read and more efficient.

Example #2
You can also use the IN function with numeric values.
SELECT *
FROM orders
WHERE order_id in (10000, 10001, 10003, 10005);
This SQL statement would return all orders where the order_id is either 10000, 10001, 10003, or 10005.
It is equivalent to the following statement:
SELECT *
FROM orders
WHERE order_id = 10000
OR order_id = 10001
OR order_id = 10003
OR order_id = 10005;

Example #3 using "NOT IN"
The IN function can also be combined with the NOT operator.
For example,
SELECT *
FROM suppliers
WHERE supplier_name not in ( 'IBM', 'Hewlett Packard', 'Microsoft');
This would return all rows where the supplier_name is neither IBM, Hewlett Packard, or Microsoft. Sometimes, it is more efficient to list the values that you do not want, as opposed to the values that you do want.

Q)Understanding how NULLS affect the IN,NOT IN , EXISTS and NOT EXISTS?

Answer 1:

On the surface, it may appear that the SQL clauses IN and EXISTS are interchangeable. However, they're quite different in how they handle NULL values and may give different results. The problem comes from the fact that, in an Oracle database, a NULL value means unknown, so any comparison or operation against a NULL value is also NULL, and any test that returns NULL is always ignored. For example, neither one of these queries return any rows:

select 'true' from dual where 1 = null; = No Result
select 'true' from dual where 1 != null; = No Result

The value 1 is neither equal nor not equal to NULL. Only IS NULL would return true on a NULL value and return a row.

select 'true' from dual where 1 is null; ; = No Result
select 'true' from dual where null is null; Return ‘true’

When you use IN, you're telling SQL to take a value and compare it against every value or set of values in a list using =. If any NULL values exist, a row will not be returned--even if both values are NULL.

select 'true' from dual where null in (null); = No Result
select 'true' from dual where (null,null) in ((null,null)); = No Result
select 'true' from dual where (1,null) in ((1,null)); = No Result

An IN is functionally equivalent to the = ANY clause:

select 'true' from dual where null = ANY (null); ); = No Result
select 'true' from dual where (null,null) = ANY ((null,null)); ); = No Result
select 'true' from dual where (1,null) = ANY ((1,null)); ); = No Result

When you use an equivalent form of EXISTS, SQL counts rows and ignores the value(s) in the subquery--even if you return NULL.

select 'true' from dual where exists (select null from dual); Return True
select 'true' from dual where exists (select 0 from dual where null is null); Return True

The IN and EXISTS are logically the same. The IN clause compares values returned by the subquery and filters out rows in the outer query; the EXISTS clause compares values and filters out rows inside the subquery. In the case of NULL values, the resulting set of rows is the same.

select ename from emp where empno in (select mgr from emp); Return Manager name
select ename from emp e where exists (select 0 from emp where mgr = e.empno); Return
Manager name

But problems arise when the logic is reversed to use NOT IN and NOT EXISTS, which return different sets of rows (the first query returns 0 rows; the second returns the intended data--they aren't the same query):
select ename from emp where empno not in (select mgr from emp); No rows returned because subquery return NULL

select ename from emp e where not exists (select 0 from emp where mgr =
e.empno);
The NOT IN clause is virtually equivalent to comparing each value with = and failing if any test is FALSE or NULL. For example:

select 'true' from dual where 1 not in (null,2);
select 'true' from dual where 1 != null and 1 != 2;
select 'true' from dual where (1,2) not in ((2,3),(2,null));
select 'true' from dual where (1,null) not in ((1,2),(2,3));

These queries don't return any rows. The second is more obvious, 1 != NULL is NULL, so the whole WHERE condition is false for that row. While these would work:
select 'true' from dual where 1 not in (2,3);
select 'true' from dual where 1 != 2 and 1 != 3;
You can still use the NOT IN query from before, as long as you prevent NULL from being returned in the results (again, these both work, but I'm assuming empno is not null, which is a good assumption in this case):
select ename from emp where empno not in (select mgr from emp where mgr is not
null);
select ename from emp where empno not in (select nvl(mgr,0) from emp);
By understanding the difference between IN, EXISTS, NOT IN, and NOT EXISTS, you can avoid a very common problem when NULLs appear in the data of a subquery.

Answer2 :

Resulting in:
EMP_NBR EMP_NAME MGR_NBR
1 DON 5
2 HARI 5
3 RAMESH 5
4 JOE 5
5 DENNIS NULL
6 NIMISH 5
7 JESSIE 5
8 KEN 5
9 AMBER 5
10 JIM 5
Now, the aim is to find all those employees who are not managers. Let’s see how we can achieve that by using the “NOT IN” vs the “NOT EXISTS” clause.
NOT IN
SQL> select count(*) from emp_master where emp_nbr not in ( select mgr_nbr from emp_master );
COUNT(*)
———-
0
This means that everyone is a manager…hmmm, I wonder whether anything ever gets done in that case
NOT EXISTS
SQL> select count(*) from emp_master T1 where not exists ( select 1 from emp_master T2 where t2.mgr_nbr = t1.emp_nbr );
COUNT(*)
———-
9
Now there are 9 people who are not managers. So, you can clearly see the difference that NULL values make and since NULL != NULL in SQL, the NOT IN clause does not return any records back. (in MS SQL Server, depending upon the ANSI NULLS setting, the behavior can be altered but this post only talks about the behavior that is same in Oracle, DB2 LUW and MS SQL Server).
Performance implications:
When using “NOT IN”, the query performs nested full table scans, whereas for “NOT EXISTS”, query can use an index within the sub-query.
Another Optional Method
Another way of doing this is to use an outer join and check for NULL values in the other table:
SELECT COUNT(*)
FROM EMP_MASTER T1
LEFT OUTER JOIN EMP_MASTER T2
ON T1.EMP_NBR = T2.MGR_NBR
WHERE T2.MGR_NBR IS NULL
/
Of course, there should be other selection criteria as well (possibly a range search criteria, an equality SARG (searchable argument) criteria etc.) to help improve the selectivity besides just the NOT EXISTS clause.

Answer 3:

What's the difference between NOT IN and NOT EXISTS in SQL? Which is better to use in SQL queries for the performance aspects?


> EXPERT RESPONSE

For the first question, let's assume we're talking about the difference in syntax between NOT IN and NOT EXISTS when both have subqueries.

Here are examples of the syntax:

select client_name
from clients
where client_id not in
( select client_id
from clientwarnings )
select client_name
from clients as x
where not exists
( select client_id
from clientwarnings
where client_id = x.client_id )
In the NOT IN example, the subquery identifies a set of client_ids, against which each client_id in the clients table can be compared. In the NOT EXISTS example, the subquery is correlated and can be thought of as doing a lookup for each client_id in the clients table.

Your second question asks which is preferred for performance reasons. My answer is: it's your call. Don't trust your instinct, because it might be wrong.

My guess would be that the NOT IN query has to build a list for the subquery, which is slow, but then it can do in-memory compares, which is fast, while the NOT EXISTS query can use matching indexes, which are faster than building a memory table ... As you can see, I am easily confused.

And what about the LEFT OUTER JOIN testing for IS NULL? A perfectly good alternate solution:

select c.client_name
from clients as c
left outer
join clientwarnings as cw
on c.client_id = cw.client_id
where cw.client_id is null
Answer 4 :
When coding a SQL statement with tables in master-detail relationships, it's common to have to decide whether to write the query using the WHERE EXISTS (. . .) clause or the WHERE value IN (. . .) clause. You may resist using WHERE EXISTS because it has the awkward syntax of returning a value, which you always ignore.
However, there's a difference when using rule-based optimization. You can determine the performance of a rule-based query by understanding which table is the driving table and how many rows each part returns.
When you write a query using the IN clause, you're telling the rule-based optimizer that you want the inner query to drive the outer query (think: IN = inside to outside). For example, to query the 14-row EMP table for the direct reports to the employee KING, you could write the following:
select ename from emp e
where mgr in (select empno from emp where ename = 'KING');
Here's the EXPLAIN PLAN for this query:
OBJECT OPERATION
---------- ----------------------------------------
SELECT STATEMENT()
NESTED LOOPS()
EMP TABLE ACCESS(FULL)
EMP TABLE ACCESS(BY INDEX ROWID)
PK_EMP INDEX(UNIQUE SCAN)
This query is virtually equivalent to this:
select e1.ename from emp e1,(select empno from emp where ename = 'KING') e2
where e1.mgr = e2.empno;
You can write the same query using EXISTS by moving the outer query column to a subquery condition, like this:
select ename from emp e
where exists (select 0 from emp where e.mgr = empno and ename = 'KING');
When you write EXISTS in a where clause, you're telling the optimizer that you want the outer query to be run first, using each value to fetch a value from the inner query (think: EXISTS = outside to inside).
The EXPLAIN PLAN result for the query is:
OBJECT OPERATION
---------- ----------------------------------------
SELECT STATEMENT()
FILTER()
EMP TABLE ACCESS(FULL)
EMP TABLE ACCESS(BY INDEX ROWID)
PK_EMP INDEX(UNIQUE SCAN)
This is virtually similar to the PL/SQL code:
set serveroutput on;
declare
l_count integer;
begin
for e in (select mgr,ename from emp) loop
select count(*) into l_count from emp
where e.mgr = empno and ename = 'KING';
if l_count != 0 then
dbms_output.put_line(e.ename);
end if;
end loop;
end;
To determine which clause offers better performance in rule-based optimization, consider how many rows the inner query will return in comparison to the outer query. In many cases, EXISTS is better because it requires you to specify a join condition, which can invoke an INDEX scan. However, IN is often better if the results of the subquery are very small. You usually want to run the query that returns the smaller set of results first.
Some people avoid the EXISTS clause because of the requirement to return a result from the query--even though the result is never used. Depending on personal style, people often use 'X,' 1, 0, or null. From looking at the EXPLAIN PLAN output, it appears that the optimizer throws out whatever value you enter and uses 0 all the time. Many developers get into the habit of always entering some constant value.

Partitioning in Oracle. What? Why? When? Who? Where? How?

Partitioning enables tables and indexes or index-organized tables to be subdivided into smaller manageable pieces and these each small piece is called a "partition". From an "Application Development" perspective, there is no difference between a partitioned and a non-partitioned table. The application need not be modified to access a partitioned table if that application was initially written on a non partitioned tables.

So now you know partitioning in oracle now the only thing that yo u need to know is little bit of syntax and that’s it, and you are a partitioning guru.

Oracle introduced partitioning with 8.0. With this version only, " Range Partitioning" was supported. I will come to details later about what that means. Then with Oracle 8i " Hash and Composite Partitioning" was also introduced and with 9i " List Partitioning", it was introduced with lots of other features with each upgrade. Each method of partitioning has its own advantages and disadvantages and the decision which one to use will depend on the data and type of application. Also one can MODIFY , RENAME, MOVE, ADD, DROP, TRUNCATE, SPLIT partitions. We will go thru the details now.

Advantages of using Partition’s in Table

1. Smaller and more manageable pieces of data ( Partitions )
2. Reduced recovery time
3. Failure impact is less
4. import / export can be done at the " Partition Level".
5. Faster access of data
6. Partitions work independent of the other partitions.
7. Very easy to use

Types of Partitioning Methods

1. RANGE Partitioning

This type of partitioning creates partitions based on the " Range of Column" values. Each partition is defined by a " Partition Bound" (non inclusive ) that basically limits the scope of partition. Most commonly used values for " Range Partition" is the Date field in a table. Lets say we have a table SAMPLE_ORDERS and it has a field ORDER_DATE. Also, lets say we have 5 years of history in this table. Then, we can create partitions by date for, lets say, every quarter.
So Every Quarter Data becomes a partition in the SAMPLE_ORDER table. The first partition will be the one with the lowest bound and the last one will be the Partition with the highest bound. So if we have a query that want to look at the Data of first quarter of 1999 then instead of going through the complete data it will directly go to the Partition of first quarter 1999.

This is example of the syntax needed for creating a RANGE PARTITION.

CREATE TABLE SAMPLE_ORDERS
(ORDER_NUMBER NUMBER,
ORDER_DATE DATE,
CUST_NUM NUMBER,
TOTAL_PRICE NUMBER,
TOTAL_TAX NUMBER,
TOTAL_SHIPPING NUMBER)
PARTITION BY RANGE(ORDER_DATE)
(
PARTITION SO99Q1 VALUES LESS THAN TO_DATE(‘01-APR-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q2 VALUES LESS THAN TO_DATE(‘01-JUL-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q3 VALUES LESS THAN TO_DATE(‘01-OCT-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q4 VALUES LESS THAN TO_DATE(‘01-JAN-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q1 VALUES LESS THAN TO_DATE(‘01-APR-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q2 VALUES LESS THAN TO_DATE(‘01-JUL-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q3 VALUES LESS THAN TO_DATE(‘01-OCT-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q4 VALUES LESS THAN TO_DATE(‘01-JAN-2001’, ‘DD-MON-YYYY’)
)
;

the above example basically created 8 partitions on the SAMPLE_ORDERS Table all these partitions correspond to one quarter. Partition SO99Q1 will contain the orders for only first quarter of 1999.


2. HASH Partitioning

Under this type of partitioning the records in a table, are partitions based of a Hash value found in the value of the column, that is used for partitioning. " Hash Partitioning" does not have any logical meaning to the partitions as do the range partitioning. Lets take one example.

CREATE TABLE SAMPLE_ORDERS
(ORDER_NUMBER NUMBER,
ORDER_DATE DATE,
CUST_NUM NUMBER,
TOTAL_PRICE NUMBER,
TOTAL_TAX NUMBER,
TOTAL_SHIPPING NUMBER,
ORDER_ZIP_CODE)
PARTITION BY HASH (ORDER_ZIP_CODE)
(PARTITION P1_ZIP TABLESPACE TS01,
PARTITION P2_ZIP TABLESPACE TS02,
PARTITION P3_ZIP TABLESPACE TS03,
PARTITION P4_ZIP TABLESPACE TS04)
ENABLE ROW MOVEMENT;

The above example creates four hash partitions based on the zip codes from where the orders were placed.
3. List Partitioning ( Only with 9i)

Under this type of partitioning the records in a table are partitioned based on the List of values for a table with say communities column as a defining key the partitions can be made based on that say in a table we have communities like ‘Government’ , ‘Asian’ , ‘Employees’ , ‘American’, ‘European’ then a List Partition can be created for individual or a group of communities lets say ‘American-partition’ will have all the records having the community as ‘American’

Lets take one example. In fact, we will modify the same example.

CREATE TABLE SAMPLE_ORDERS
(ORDER_NUMBER NUMBER,
ORDER_DATE DATE,
CUST_NUM NUMBER,
TOTAL_PRICE NUMBER,
TOTAL_TAX NUMBER,
TOTAL_SHIPPING NUMBER,
SHIP_TO_ZIP_CODE,
SHIP_TO_STATE)
PARTITION BY LIST (SHIP_TO_STATE)
(PARTITION SHIP_TO_ARIZONA VALUES (‘AZ’) TABLESPACE TS01,
PARTITION SHIP_TO_CALIFORNIA VALUES (‘CA’) TABLESPACE TS02,
PARTITION SHIP_TO_ILLINOIS VALUES (‘IL’) TABLESPACE TS03,
PARTITION SHIP_TO_MASACHUSETTES VALUES (‘MA’) TABLESPACE TS04,
PARTITION SHIP_TO_MICHIGAN VALUES (‘MI’) TABLESPACE TS05)
ENABLE ROW MOVEMENT;

The above example creates List partition based on the SHIP_TO_STATE each partition allocated to different table spaces.

4. Composite Range-Hash Partitioning

This is basically a combination of range and hash partitions. So basically, the first step is that the data is divided using the range partition and then each range partitioned data is further subdivided into a hash partition using hash key values. All sub partitions, together, represent a logical subset of the data.

Lets modify the above example again:

CREATE TABLE SAMPLE_ORDERS
(ORDER_NUMBER NUMBER,
ORDER_DATE DATE,
CUST_NUM NUMBER,
CUST_NAME VARCAHR2,
TOTAL_PRICE NUMBER,
TOTAL_TAX NUMBER,
TOTAL_SHIPPING NUMBER,
SHIP_TO_ZIP_CODE,
SHIP_TO_STATE)
TABLESPACE USERS
PARTITION BY RANGE (ORDER_DATE)
SUBPARTITION BY HASH(CUST_NAME)
SUBPARTITION TEMPLATE(
(SUBPARTITION SHIP_TO_ARIZONA VALUES (‘AZ’) TABLESPACE TS01,
SUBPARTITION SHIP_TO_CALIFORNIA VALUES (‘CA’) TABLESPACE TS02,
SUBPARTITION SHIP_TO_ILLINOIS VALUES (‘IL’) TABLESPACE TS03,
SUBPARTITION SHIP_TO_NORTHEAST VALUES (‘MA’, ‘NY’, ‘NJ’) TABLESPACE TS04,
SUBPARTITION SHIP_TO_MICHIGAN VALUES (‘MI’) TABLESPACE TS05)
(
PARTITION SO99Q1 VALUES LESS THAN TO_DATE(‘01-APR-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q2 VALUES LESS THAN TO_DATE(‘01-JUL-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q3 VALUES LESS THAN TO_DATE(‘01-OCT-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q4 VALUES LESS THAN TO_DATE(‘01-JAN-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q1 VALUES LESS THAN TO_DATE(‘01-APR-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q2 VALUES LESS THAN TO_DATE(‘01-JUL-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q3 VALUES LESS THAN TO_DATE(‘01-OCT-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q4 VALUES LESS THAN TO_DATE(‘01-JAN-2001’, ‘DD-MON-YYYY’)
)
ENABLE ROW MOVEMENT;

The above example shows that each range partition has been further sub-partitioned into smaller partitions based on the list value specified. SHIP_TO_ARIZONA is a sub-partition by a List value AZ. This partition will be present in the main partitions by range SO99Q1 etc.
5. Composite Range-List Partitioning ( Only with 9i)

This is also a combination of Range and List Partitions, basically first the data is divided using the Range partition and then each Range partitioned data is further subdivided into List partitions using List key values. Each sub partitions individually represents logical subset of the data not like composite Range-Hash Partition.

Index organized tables can be partitioned using Range or Hash Partitions

Lets modify the above partition once more.

CREATE TABLE SAMPLE_ORDERS
(ORDER_NUMBER NUMBER,
ORDER_DATE DATE,
CUST_NUM NUMBER,
CUST_NAME VARCAHR2,
TOTAL_PRICE NUMBER,
TOTAL_TAX NUMBER,
TOTAL_SHIPPING NUMBER,
SHIP_TO_ZIP_CODE,
SHIP_TO_STATE)
TABLESPACE USERS
PARTITION BY RANGE (ORDER_DATE)
SUBPARTITION BY LIST(SHIP_TO_STATE)
SUBPARTITION TEMPLATE(
SUBPARTITION SP1 TABLESPACE TS01,
SUBPARTITION SP2 TABLESPACE TS02,
SUBPARTITION SP3 TABLESPACE TS03,
SUBPARTITION SP4 TABLESPACE TS04,
SUBPARTITION SP5 TABLESPACE TS05)
(
PARTITION SO99Q1 VALUES LESS THAN TO_DATE(‘01-APR-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q2 VALUES LESS THAN TO_DATE(‘01-JUL-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q3 VALUES LESS THAN TO_DATE(‘01-OCT-1999’, ‘DD-MON-YYYY’),
PARTITION SO99Q4 VALUES LESS THAN TO_DATE(‘01-JAN-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q1 VALUES LESS THAN TO_DATE(‘01-APR-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q2 VALUES LESS THAN TO_DATE(‘01-JUL-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q3 VALUES LESS THAN TO_DATE(‘01-OCT-2000’, ‘DD-MON-YYYY’),
PARTITION SO00Q4 VALUES LESS THAN TO_DATE(‘01-JAN-2001’, ‘DD-MON-YYYY’)
)
ENABLE ROW MOVEMENT;

Table Partitioning in Oracle

What is Table Partitioning in Oracle ?

Answer 1: Now a days enterprises run databases of hundred of Gigabytes in size. These databases are known as
Very Large Databases (VLDB). From Oracle Ver. 8.0 Oracle has provided the feature of
table partitioning i.e. you can partition a table according to some criteria . For example you
have a SALES table with the following structure

Suppose this table contains millions of records, but all the records belong to four years only i.e. 1991, 1992, 1993
and 1994. And most of the time you are concerned about only one year i.e. you give queries like the following

select sum(amt) from sales where year=1991;
select product,sum(amt) from sales where year=1992
Group by product;
Now whenever you give queries like this Oracle will search the whole table. If you partition this table according to
year, then the performance is improve since oracle will scan only a single partition instead of whole table.

CREATING PARTITION TABLES

To create a partition table give the following statement

create table sales (year number(4),
product varchar2(10),
amt number(10,2))
partition by range (year)
partition p1 values less than (1992) tablespace u1,
partition p2 values less than (1993) tablespace u2,
partition p3 values less than (1994) tablespace u3,
partition p4 values less than (1995) tablespace u4,
partition p5 values less than (MAXVALUE) tablespace u5;
In the above example sales table is created with 5 partitions. Partition p1 will contain rows of year 1991 and
it will be stored in tablespace u1. Partition p2 will contain rows of year 1992 and it will be stored in tablespace u2.
Similarly p3 and p4.

In the above example if you don’t specify the partition p4 with values less than MAVALUE, then you will not be
able to insert any row with year above 1994.

Although not required, you can place partitions in different tablespaces. If you place partitions in different tablespaces
then you can isolate problems due to failures as only a particular partition will not be available and rest of the
partitions will still be available.

The above example the table is partition by range.

In Oracle you can partition a table by

• Range Partitioning
• Hash Partitioning
• List Partitioning
• Composite Partitioning

Range Partitioning

This type of partitioning is useful when dealing with data that has logical ranges into which it can be distributed;
for example, value of year. Performance is best when the data evenly distributes across the range

Hash partitioning

Use hash partitioning if your data does not easily lend itself to range partitioning, but you would like to partition for
performance and manageability reasons. Hash partitioning provides a method of evenly distributing data across a
specified number of partitions. Rows are mapped into partitions based on a hash value of the partitioning key

The following example shows how to create a hash partition table.
The following example creates a hash-partitioned table. The partitioning column is partno, four partitions are created
and assigned system generated names, and they are placed in four named tablespaces (tab1,tab2, ...).

CREATE TABLE products
(partno NUMBER,
description VARCHAR2 (60))
PARTITION BY HASH (partno)
PARTITIONS 4
STORE IN (tab1, tab2, tab3, tab4);


List Partitioning

Use list partitioning when you require explicit control over how rows map to partitions. You can specify a list of discrete
values for the partitioning column in the description for each partition. This is different from range partitioning, where a
range of values is associated with a partition, and from hash partitioning, where the user has no control of the row to
partition mapping.

List partitioning allows unordered and unrelated sets of data to be grouped and organized together very naturally

The following example creates a table with list partitioning

Create table customers (custcode number(5),
Name varchar2(20),
Addr varchar2(10,2),
City varchar2(20),
Bal number(10,2))
Partition by list (city),
Partition north_India values (‘DELHI’,’CHANDIGARH’),
Partition east_India values (‘KOLKOTA’,’PATNA’),
Partition south_India values (‘HYDERABAD’,’BANGALORE’,
’CHENNAI’),
Partition west India values (‘BOMBAY’,’GOA’);

If a row is inserted in the above table then oracle maps the value of city column and whichever partition list matches the
city column the row is stored in that partition.


COMPOSITE PARTITONING
Composite partitioning partitions data using the range method, and within each partition, subpartitions it using
the hash method. Composite partitions are ideal for both historical data and striping, and provide improved
manageability of range partitioning and data placement, as well as the parallelism advantages of hash partitioning.
When creating composite partitions, you specify the following:
• Partitioning method: range
• Partitioning column(s)
• Partition descriptions identifying partition bounds
• Subpartitioning method: hash
• Subpartitioning column(s)
• Number of subpartitions for each partition or descriptions of subpartitions
The following statement creates a composite-partitioned table. In this example, three range partitions are created, each
containing eight subpartitions. Because the subpartitions are not named, system generated names are assigned, but the
STORE IN clause distributes them across the 4 specified tablespaces (tab1, ...,tab4).
CREATE TABLE PRODUCTS (partno NUMBER,
description VARCHAR(32),
costprice NUMBER)
PARTITION BY RANGE (partno)
SUBPARTITION BY HASH(description)
SUBPARTITIONS 8 STORE IN (tab1, tab2, tab3, tab4)
(PARTITION p1 VALUES LESS THAN (100),
PARTITION p2 VALUES LESS THAN (200),
PARTITION p3 VALUES LESS THAN (MAXVALUE));


ALTERING PARTITION TABLES

To add a partition

You can add add a new partition to the "high" end (the point after the last existing partition). To add a partition
at the beginning or in the middle of a table, use the SPLIT PARTITION clause.

For example to add a partition to sales table give the following command.

alter table sales add partition p6 values less than (1996);

To add a partition to a Hash Partition table give the following command.

Alter table products add partition;

Then Oracle adds a new partition whose name is system generated and it is created in the default tablespace.
To add a partition by user define name and in your specified tablespace give the following command.

Alter table products add partition p5 tablespace u5;

To add a partition to a List partition table give the following command.

alter table customers add partition central_India
values (‘BHOPAL’,’NAGPUR’);
Any value in the set of literal values that describe the partition(s) being added must not exist in any of the other partitions
of the table.


Coalescing Partitions
Coalescing partitions is a way of reducing the number of partitions in a hash-partitioned table, or the number of subpartitions in a composite-partitioned table. When a hash partition is coalesced, its contents are redistributed into one or more remaining partitions determined by the hash function. The specific partition that is coalesced is selected by Oracle, and is dropped after its contents have been redistributed.
To coalesce a hash partition give the following statement.
Alter table products coalesce partition;
This reduces by one the number of partitions in the table products.
DROPPING PARTITIONS
To drop a partition from Range Partition table, List Partition or Composite Partition table give the following command.
Alter table sales drop partition p5;
Once you have drop the partition and if you have created a global index on the table. Then you have to rebuild the global index after dropping the partition by giving the following statement.
Alter index sales_ind rebuild;
To avoid rebuilding of indexes after dropping of the partitions you can also first delete all the records and then drop
the partition like this
Delete from sales where year=1994;
Alter table sales drop partition p4;
This method is most appropriate for small tables, or for large tables when the partition being dropped contains a small percentage of the total data in the table.
Another method of dropping partitions is give the following statement.
ALTER TABLE sales DROP PARTITION p5 UPDATE GLOBAL INDEXES;

This causes the global index to be updated at the time the partition is dropped.

Exchanging a Range, Hash, or List Partition
To exchange a partition of a range, hash, or list-partitioned table with a nonpartitioned table, or the reverse, use the ALTER TABLE ... EXCHANGE PARTITION statement. An example of converting a partition into a nonpartitioned table follows. In this example, table stocks can be range, hash, or list partitioned.
ALTER TABLE stocks
EXCHANGE PARTITION p3 WITH stock_table_3;

Merging Partitions
Use the ALTER TABLE ... MERGE PARTITIONS statement to merge the contents of two partitions into one partition. Te two original partitions are dropped, as are any corresponding local indexes.
You cannot use this statement for a hash-partitioned table or for hash subpartitions of a composite-partitioned table.
You can only merged two adjacent partitions, you cannot merge non adjacent partitions.
For example the merge the partition p2 and p3 into one partition p23 give the following statement.
Alter table sales merge partition p2 and p3 into
partition p23;

Modifying Partitions: Adding Values
Use the MODIFY PARTITION ... ADD VALUES clause of the ALTER TABLE statement to extend the value list of an existing partition. Literal values being added must not have been included in any other partition's value list. The partition value list for any corresponding local index partition is correspondingly extended, and any global index, or global or local index partitions, remain usable.
The following statement adds a new set of cities ('KOCHI', 'MANGALORE') to an existing partition list.
ALTER TABLE customers
MODIFY PARTITION south_india
ADD VALUES ('KOCHI', 'MANGALORE');

Modifying Partitions: Dropping Values
Use the MODIFY PARTITION ... DROP VALUES clause of the ALTER TABLE statement to remove literal values from the value list of an existing partition. The statement is always executed with validation, meaning that it checks to see if any rows exist in the partition that correspond to the set of values being dropped. If any such rows are found then Oracle returns an error message and the operation fails. When necessary, use a DELETE statement to delete corresponding rows from the table before attempting to drop values.
You cannot drop all literal values from the value list describing the partition. You must use the ALTER TABLE ... DROP PARTITION statement instead.
The partition value list for any corresponding local index partition reflects the new value list, and any global index, or global or local index partitions, remain usable.
The statement below drops a set of cities (‘KOCHI' and 'MANGALORE') from an existing partition value list.
ALTER TABLE customers
MODIFY PARTITION south_india
DROP VALUES (‘KOCHI’,’MANGALORE’);


SPLITTING PARTITIONS

You can split a single partition into two partitions. For example to split the partition p5 of sales table into two partitions give the following command.

Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));

TRUNCATING PARTITON

Truncating a partition will delete all rows from the partition.

To truncate a partition give the following statement

Alter table sales truncate partition p5;

LISTING INFORMATION ABOUT PARTITION TABLES

To see how many partitioned tables are there in your schema give the following statement

Select * from user_part_tables;

To see on partition level partitioning information

Select * from user_tab_partitions;

SQL PL SQL Oracle Database Data Modeling Questions and Answers

Q)What are Exclusive Arcs ?

Exclusive arcs are used to identify relationships where an entity is related to two
or more other entities, but only one relationship can exist for a specific entity
occurrence. The exclusive arc is represented by a curved line going through two
or more relationship lines. Figure 1.8 shows an example of an exclusive arc. The
relationships are read as "Each INVENTORY ITEM must be either located at one
and only FACILITY or must be located within one and only one CONTAINER,
but not both." This communicates that inventory items are stored at one of two
types of levels: They are either located at facilities such as a warehouse or stored
within containers such as a bin that is located within a facility.

Q) Different sort of Cardinality?

Cardinality defines the numeric relationships between occurrences of the entities on either end of the relationship line. Here are some examples:

LINK This is a 0:0 optional relationship basically stating that a person can occupy one parking space, that I don't need a person to have a space and I don't need a space to have a person. Although the concept is fairly simple, a database can't express it directly. You would need to nominate one entity to become the dominant table and use triggers or programs to limit the number of related records in the other table.

SubType This is a 1:0 relationship; optional only on one side. This would indicate that a person might be a programmer, but a programmer must be a person. It is assumed that the mandatory side of the relationship is the dominant. Again, triggers or programs must be used to control the database.

Physical Segment This is a 1:1 mandatory relationship and demonstrates a segmentation denormalization. A person must have one and only one DNA pattern and that pattern must apply to one and only one person. This is difficult to implement in a database, since declarative referential integrity will get caught in a "Chicken and the Egg" situation. Basically, this is a single entity.

Possession This is a 0:M (zero to many) optional relationship indicating that a person might have no phone, one phone or lots of phones, and that a phone might be un-owned, but can only be owned by a maximum of one person. This is implemented in a database as a nullable foreign key column in the phone table that references the person table.

Child This is a 1:M mandatory relationship, the most common one seen in databases. A person might be a member or might not, but could be found multiple times (if the member entity represents membership in multiple clubs, for instance). A member must be a person, no questions asked. The foreign key in the member table would be mandatory, or not-null.

Characteristic This is a 0:M relationship that is mandatory on the many side. It indicates that a person must have at least one name, but possibly many names, and that a name might be assigned to a person (might not) but at most to one person. In a database you would have the the name table with a nullable foreign key to the person table and triggers or programs to force a person to have at least one name.

Paradox This is a 1:M relationship mandatory on both sides. As with the physical segment situation, the "Chicken and the Egg" is involved since you have to have a person to have citizenship, but citizenship to have a person.

Association This is a M:M (many to many) optional relationship. Conceptually, it means that a person might or might not work for an employer, but could certainly moonlight for multiple companies. An employer might have no employees, but could have any number of them. Again, not hard to visualize, but hard to implement. Most solutions of this situation involve creating a third "Associative Entity" to resolve the M:M into two 0:M relationships. This might be an entity called employee because it does link the person to the employer the person works for.

Q)Unique Versus Non Unique index in Oracle ?

When a row is inserted into a table or when a PK or UK is modified, Oracle of course needs to ensure that either the PK or UK constraint is not violated. If the constraint is policed via a Unique index, as previously discussed, Oracle knows the value must and can only ever be unique and so performs the constraint validation before the Unique index is actually modified. If the PK or UK is violated, the Unique index can not possibly have been changed as all the associated index entries must always be unique and so only the undo (and redo) of the changes associated with the table data blocks are actually generated and need to be subsequently rolled back.
However, if the PK or UK constraint is policed via a Non-Unique index, the mechanism for applying the changes differs somewhat. As the index is Non-Unique, as previously discussed, Oracle is not quite so certain as to the state of play and performs the constraint validation after the associated changes are made to the Non Unique index. If the PK or UK constraint is violated, both undo and redo of the Non-Unique index has been generated and both changes to the table data blocks and the index blocks need to be rolled back.

This means there’s an extra cost associated with violating a constraint if the constraint is policed via a Non-Unique Index vs. a Unique index

Q)Difference between Delete and Truncate?

TRUNCATE is a DDL command and cannot be rolled back. All of the memory space is released back to the server.
DELETE is a DML command and can be rolled back.

Both commands accomplish identical tasks (removing all data from a table), but TRUNCATE is much faster
TRUNCATE commits after deleting entire table i.e., can not be rolled back. Database triggers do not fire on TRUNCATEDELETE allows the filtered deletion. Deleted records can be rolled back or committed.Database triggers fire on DELETE.
TRUNCATE deletes much faster than DELETE

TRUNCATE
It is a DDL statement
It is a one way trip,cannot ROLLBACK
Doesn't have selective features (where clause)
Doesn't fire database triggers

DELETE
It is a DML statement
One can Rollback
Has conditions
Does fire database triggers

Q)Select sysdate from dual where 5=5 ?
Answer : Return Sysdate only

Q)Select sysdate from dual where 5>NULL ?
Answer : NULL

Q)Difference between Union and Union All ?

The difference between UNION ALL and UNION is that, while UNION only selects distinct values, UNION ALL selects all values.

Q)Select 1 from dual union select a from dual ?
Answer : ORA-01790: expression must have same datatype as corresponding expression. So 1 should be captured as Char in query

Q)Select 1 from dual union all select a from union ?
Same result as above.

Thursday, 7 January 2010

Erwin Data Modelling Data Warehousing Business Intelligence Oracle Database Dimensional Modeling Questions and Answers Part 5

Q)ETL concepts?

Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.

The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.

Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.

Glossary of ETL (Reference:www.Oracle.com)

Source System

A database, application, file, or other storage facility from which the data in a data warehouse is derived.

Mapping

The definition of the relationship and data flow between source and target objects.

Metadata

Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.

Staging Area

A place where data is processed before entering the warehouse.

Cleansing

The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.

Transformation

The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.

Transportation

The process of moving copied or transformed data from a source to a data warehouse.

Target System

A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.

Q)Metadata tools

Metadata is data about data and Metadata tools are used for gathering, storing, updating, and for retrieving the business and technical metadata of an organization.

Data and Metadata

Data - The actual data that is stored in the database.

Business Metadata - The information of the data related with business and this data is used by functional team (business analysts, smart management experts, business managers etc).

Technical Metadata - The information about the data related with technology and this data is used by the technical team
(Developers, DBAs) etc.

Q)Popular Metadata Tools

Tool Name Company Name
Rochade ASG
Metatrieve Metatrieval
Datamapper Exeros
Metacenter Data Advantage Group
Meta Data
Integration
Framework Info Librarian
SuperGlue Informatica
Metastage Ascential

Q)What is the business Metadata?

MetaData:

Metadata is data about data. Metadata comes into picture when we need to know about how data is stored and where it is stored. Metadata tools are helpful in capturing business metadata and the following section explains business metadata.

Business MetaData:

In IT, Business Metadata is adding additional text or statement around a particular word that adds value to data. Business Metadata is about creating definitions, business rules. For example, when tables and columns are created the following business metadata would be more useful for generating reports to functional and technical team. The advantage is of this business metadata is whether they are technical or non-technical, everybody would understand what is going on within the organization.

Table’s Metadata: While creating a table, metadata for definition of a table, source system name, source entity names, business rules to transform the source table, and the usage of the table in reports should be added in order to make them available for taking metadata reports.

Column’s Metadata: Similarly for columns, source column name (mapping), business rules to transform the source column name, and the usage of the column in reports should be added for taking metadata reports.

Q)What is Technical Metadata?

Technical metadata describes information about technology such as the ownership of the database, physical characteristics of a database (in oracle, table space, extents, segments, blocks, partitions etc), performance tuning (processors, indexing), table name, column name, data type, relationship between the tables, constraints, abbreviations, derivation rules, glossary, data dictionary, etc., and is used by the technical team. In Technical metadata, derivation rules are important when formulae or calculations are applied on a column.

Q)Role of Metadata in ETL?

When you deal with a data warehouse, various phases like Business Process Modeling, Data Modeling, ETL, Reporting etc., are inter-related with each other and they do contain their own metadata. For example in ETL, it will be very difficult for one to extract, transform and load source data into a data warehouse, if there is no metadata available for the source like where and how to get the source data.

An organization may be using data modeling tools, such as Erwin, Embarcadero, Oracle designer, Sybase Power Designer etc., for developing data models. Functional and technical team should have spent much time and effort in creating the data model's data structures(tables, columns, data types, procedures, functions, triggers etc.,). By using matadata capturing tools, these data structures can be imported into metadata repository which we call it as metadata.
For example when you work with Informatica's MetaData Exchange, it captures the metadata present in these tools and loads into the repository. There is no need for informatica developer to create these data structures once again since metadata(data definitions) have been already captured and stored. Similarly most of the ETL tools have that capability to capture metadata from RDBMS, files, ERP, Applications etc.
In ETL, Metadata Repository is where all the metadata information about source, target, transformations, mapping, wokflows, sessions etc., are stored. From this repository, metadata can be manipulated, queried and retrieved with the help of wizards provided by metadata capturing tools. During the ETL process, when we are mapping source and target systems, we are actually mapping their metadata.

In any organization, a useful metadata often stored in a repository can be a handy resource to know about the organization's information systems. Assume that each department in an organization may have a different business definitions, data types, attribute names for the same attribute or they may have a single business definition for many attributes. These anomalies can be overcome by properly maintaining metadata for these attributes in the centralized repository.

Thus metadata plays a vital role in explaining about how, why, where data can be found, retrieved, stored and used efficiently in an information management system.

Q)Metadata Reports ?

Metadata stored in a repository can be produced in the form of reports for easy understanding and these reports are very useful in explaining about the various objects or data structures and the relationship between these objects. The following products like Schema Logic Enterprise Suite, Rochade, Metatrieve, Datamapper, Metacenter, Metadata Integration Frame work stores and handles metadata in an efficient and effective manner.

Following are few examples for metadata reports.

A metadata report on business process activities, data flow.

A metadata report on source and target columns can be generated to find out all the source and target field names, data types, definitions etc.

A metadata report on workflow can be generated to find out all the workflows, workflow properties, workflow scheduling, workflow execution details etc.

A metadata report on mapping can be generated to find out all the sources and targets in the mapping, source fields and target fields participated in the mapping etc.

A metadata report on business intelligence can be generated to find out all the score cards and balanced score cards.

Q)What is Business Intelligence ?

Business Intelligence is a technology based on customer and profit oriented models that reduces operating costs and provide increased profitability by improving productivity, sales, service and helps to make decision making capabilities at no time. Business Intelligence Models are based on multi dimensional analysis and key performance indicators (KPI) of an enterprise.

Business Intelligence applications that are based on Business Intelligence Models are created by Business Intelligence software which provides the aggregated details about suppliers, customers, internal activities, business to business transactions to the managers or whoever needs it to take better corporate decisions.

Many business questions or situations need to be analyzed in order to achieve the target of an enterprise with the help of several managers or executives in each cadre. Below are some of the samples of these questions.

Business Intelligence: Finance:

What is the net income, expenses, gross profit, and net profit for this quarter, year?

Business Intelligence: Accounts:

What is the sales amount this month and what is the outstanding pending payment?

Business Intelligence: Purchase:

Who is the vendor to be contacted to purchase products?

Business Intelligence: Production:

How many products are manufactured in each production unit today, weekly, monthly?

Business Intelligence: Sales:
How many products have been sold in each area today, weekly, monthly?

Business Intelligence: Quality:
How many products have been defective today, weekly, monthly, quarterly, yearly?

Business Intelligence: Service:
Are the customers satisfied with the quality?
These business intelligence questions are related with why, what, how, when, and business intelligence reports(olap reports) are useful in providing solutions to the above questions by means of reporting, score cards, balance score cards that are helpful in managerial decisions.

Q)Business Intelligence Tools ?

Business Intelligence Tools help to gather, store, access and analyze corporate data to aid in decision-making. Generally these systems will illustrate business intelligence in the areas of customer profiling, customer support, market research, market segmentation, product profitability, statistical analysis, inventory and distribution analysis.

With Business Intelligence Tools, various data like customer related, product related, sales related, time related, location related, employee related etc. are gathered and analysed based on which important strategies or rules are formed and goals to achieve their target are set. These decisions are very efficient and effective in promoting an Organisation's growth.

Since the collected data can be sliced across almost all the dimensions like time,location, product, promotion etc., valuable statistics like sales profit in one region for the current year can be calculated and compared with the previous year statistics.

Popular Business Intelligence Tools

Tool Name Company Name
Business Objects Business Objects
Cognos Cognos
Hyperion Hyperion
Microstrategy Microstrategy

Microsoft Reporting Services Microsoft
Crystal Business Objects

Q)OLAP Analysis ?

Imagine an organization that manufactures and sells goods in several States of USA which employs hundreds of employees in its manufacturing, sales and marketing division etc. In order to manufacture and sell this product in profitable manner, the executives need to analyse(OLAP analysis) the data on the product and think about various possibilities and causes for a particular event like loss in sales, less productivity or increase in sales over a particular period of the year.

During the OLAP analysis, the top executives may seek answers for the following:

1. Number of products manufactured.
2. Number of products manufactured in a location.
3. Number of products manufactured on time basis within a location.
4. Number of products manufactured in the current year when compared to the previous year.
5. Sales Dollar value for a particular product.
6. Sales Dollar value for a product in a location.
7. Sales Dollar value for a product in a year within a location.
8. Sales Dollar value for a product in a year within a location sold or serviced by an employee.

OLAP tools help executives in finding out the answers, not only to the above mentioned measures, even for the very complex queries by allowing them to slice and dice, drill down from higher level to lower level summarized data, rank, sort, etc.

Q)OLAP Multidimensional database ?

OLAP Database - Multidimensional

This is a type of database that is optimized for data warehouse, data mart and online analytical processing (OLAP) applications. The main advantage of this database is query performance.

Relational databases make it easy to work with individual records, whereas multidimensional databases are designed for analyzing large groups of records. Relational database is typically accessed using a Structured Query Language (SQL) query. A multidimensional database allows a user to ask questions like "How many mortgages have been sold in New Jersey city" and "How many credit cards have been purchased in a particular county?".



Popular Multidimensional Databases

Database Name Company Name
Crystal Holos Business Objects
Hyperion Essbase Hyperion
Oracle Express Oracle Corporation
Oracle OLAP Option Oracle Corporation
Microsoft Analysis Services Microsoft
PowerPlay Enterprise Cognos

Q)What is Data Cleansing?

"Data cleansing ensures that undecipherable data does not enter the data warehouse. Undecipherable data will affect reports generated from the data warehouse via OLAP, Data Mining and KPI's."

A very simple example of where data cleansing would be utilized is how dates are stored in separate applications. Example: 11th March 2007 can be stored as '03/11/07' or '11/03/07' among other formats. A data warehousing project would require the different date formats to be transformed to a uniform standard before being entered in the data warehouse.


Q)Starting a Data Warehousing Project?

Data Warehousing have been voted the most complex and resource intensive software projects by CIOs of Fortune 1000 companies. Enterprise Resource Planning (ERP) implementations top the list.

Data Warehousing comes in all shapes and sizes, which bear a direct relationship to cost and time involved. The approach to starting a Data Warehousing project will vary and the steps listed below are summary of some of the points to consider.

1) Get Professional Advice

Data Warehousing makes a great difference is the lead has been there and done that. It will save a huge bundle to get professional advice upfront. Endless meeting times can be saved and the risk of an abandon data warehousing project can be reduced.

2) Plan the Data

Know what metrics you want to measure in the Data Warehouse and ensure that there is the appropriate data to provide for the analysis. If you wish to obtain periodic Key Performance Index (KPI) data for shipping logistics, make sure that the appropriate data is piped into the data warehouse.

3) Who will use the Data Warehouse

The power Data Warehouse consumers are business and financial managers. Data Warehouses are meant to deliver clear indications on how the business is performing. Plot out the expected users for the Data Warehouse in the enterprise. Gauge that they will have the appropriate reports in a format which is quickly understandable. Ensure that planning exercises are conducted in advance to accumulate scenarios on how the Data Warehouse will be used. Always remember that data has to be presented attractively and in a format business managers will feel comfortable. Text files with lines of numbers will not suffice!

4) Intergration to External Applications

Most Data Warehousing projects sink or swim by their ability to extract data from external applications. Enterprises have a slew of applications either developed inhouse or obtain from a vendor. Conceptually, your Data Warehouse will act as the heart to diverse applications running in the enterprise. All important data will flow in or out of the Data Warehouse.

5) Technology, What Technology?

At the bits and bytes level, a Data Warehouse is a souped up database. It will be built from one of the major Relational Database Management System (DBMS) vendors like Oracle, IBM, Microsoft. Open source databases, like mySQL, can also support Data Warehousing with the right support in place.

Q)Design Methodologies of Kimball and Inmon...

Data warehousing is more an art-form than cookie cutter science. The business variables and technical risks are very unique to each installation. The business users have different goals and expectations. Data warehousing is more often successful than not if there is a reservoir of data warehousing expertise in-house.

This article will focus on the data warehousing design methodologies most commonly proposed. These designs are in an evolving flux as business needs and technical cost change.

Quite often the design chosen will be a combination of the methodologies below and additional requirements - the data warehouse design third way. I am a proponent of the third way data warehousing design. Third way takes into account the business specifics and needs of the installing company and technical resources available. It uses the best design patterns of both methodologies plus additional requirements unique to the business.

The two major design methodologies of data warehousing are from Ralph Kimball and Bill Inmon. The design methodologies developed by Kimball and Inmon have lines drawn in the sand.

Both Kimball and Inmon view data warehousing as separate from OLTP and Legacy applications.
Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtain from the dimension modeling on a local departmental level.

Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
The data mart is the creation of a data warehouse's subject area.

Q)Modelling for the Data Mining?

Data mining is conducted against data accumulated in OLTP repositories, data warehouses, data marts and archived data. The steps for data mining follows the following pattern:
• data extraction
• data cleansing
• modeling data
• applying data mining algorithm
• pattern discovery
• data visualization

Data extraction and data cleansing can be eased with good data lifecycle management policies. Very often a data warehousing project will ensure that data extraction and meta-data standards are pre-defined in an organization.

Data models for operational and archived data are different from data mining models. Data stored referentially in operational systems are designed for transactional speed.

Q)Evolution of Data in Enterprises

Data becomes active as soon as it is of interest to an organization. Data life cycle begins with a business need for acquiring data. Active data are referenced on a regular basis during day-to-day business operations. Over time, this data loses its importance and is accessed less often, gradually losing its business value, and ending with its archival or disposal.

Q)Active Data

Active data is of business use to an organization. The ease of access for business users to active data is an absolute necessity in order to run an efficient business.
The simple, but critical principle, that all data moves through life-cycle stages is key to improving data management. By understanding how data is used and how long it must be retained, companies can develop a strategy to map usage patterns to the optimal storage media, thereby minimizing the total cost of storing data over its life cycle.
The same principles apply when data is stored in a relational database, although the challenge of managing and storing relational data is compounded by complexities inherent in data relationships. Relational databases are a major consumer of storage and are also among the most difficult to manage because they are accessed on a regular basis. Without the ability to manage relational data effectively, relative to its use and storage requirements, runaway database growth will result in increased operational costs, poor performance, and limited availability for the applications that rely on these databases. The ideal solution is to manage data stored in relational databases as part of an overall enterprise data management solution.

Q)Inactive Data

Data are put out to pasture once they are no longer active. i.e. there are no longer needed for critical business tasks or analysis.

Prior to the mid-nineties, most enterprises achieved data in Microfilms and tape back-ups.

There are now technologies for data archival such as Storage Area Networks (SAN), Network Attached Storage (NAS) and Hierarchical Storage Management. These storage systems can maintain referential integrity and business context.

Q) What is Data Mining.

Data mining is the analysis of large data sets to discover patterns of interests. Data mining has come a long way from the early academic beginnings in the late seventies. Many of the early data mining software packages were based on one algorithm.
Until the mid-nineties data mining required considerable specialized knowledge and was mainly restricted to statisticians. Customer Relationship Management (CRM) software played a great part in popularizing data mining among corporate users. Data mining in CRMs are often hidden from the end users. The algorithms are packaged behind business functionality such as Churn analysis. Churn analysis is the process to predict which customers are the ones most likely to defect to a competitor.

Data mining algorithms are now freely available. Database vendors have started to incorporate data mining modules. Developers can now access data mining via open standards such as OLE-DB for data mining on SQL Server 2000. Data mining functionality can now be added directly to the application source code.

Q)The Future of Data Mining

The complexity of data mining must be hidden from end-users before it will take the true center stage in an organization. Business use cases can be designed, with tight constrains, around data mining algorithms.