ETL Testing

Friday, May 2, 2014

IMPORTANT QUERIES - PART 2

1. Get duplicate rows from the table:

Select empno, count (*) from EMP group by empno having count (*)>1;

2. Remove duplicates in the table:

Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);

3. Below query transpose columns into rows.

Name	No	Add1	Add2
abc	100	Hyd	bang
xyz	200	Mysore	pune

Select name, no, add1 from A

UNION

Select name, no, add2 from A;

4. Below query transpose rows into columns.

select

emp_id,

max(decode(row_id,0,address))as address1,

max(decode(row_id,1,address)) as address2,

max(decode(row_id,2,address)) as address3

from (select emp_id,address,mod(rownum,3) row_id from temp order by emp_id )

group by emp_id

Other query:

select

emp_id,

max(decode(rank_id,1,address)) as add1,

max(decode(rank_id,2,address)) as add2,

max(decode(rank_id,3,address))as add3

from

(select emp_id,address,rank() over (partition by emp_id order by emp_id,address )rank_id from temp )

group by

emp_id

5. Rank query:

Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal desc) r from EMP);

6. Dense rank query:

The DENSE_RANK function works acts like the RANK function except that it assigns consecutive ranks:

Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by sal desc) r from emp);

7. Top 5 salaries by using rank:

Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal desc) r from emp) where r<=5;

Select * from (select * from EMP order by sal desc) where rownum<=5;

8. 2 nd highest Sal:

Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by sal desc) r from EMP) where r=2;

9. Top sal:

Select * from EMP where sal= (select max (sal) from EMP);

10. How to display alternative rows in a table?

SQL> select *from emp where (rowid, 0) in (select rowid,mod(rownum,2) from emp);

11. Hierarchical queries

Starting at the root, walk from the top down, and eliminate employee Higgins in the result, but

process the child rows.

SELECT department_id, employee_id, last_name, job_id, salary

FROM employees

WHERE last_name! = ’Higgins’

START WITH manager_id IS NULL

CONNECT BY PRIOR employee_id = menagerie;

What is Hints and Why hints Require?

It is a perfect valid question to ask why hints should be used. Oracle comes with an optimizer that promises to optimize a query's execution plan. When this optimizer is really doing a good job, no hints should be required at all.

Sometimes, however, the characteristics of the data in the database are changing rapidly, so that the optimizer (or more accuratly, its statistics) are out of date. In this case, a hint could help.

You should first get the explain plan of your SQL and determine what changes can be done to make the code operate without using hints if possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints can take a wild optimizer and give you optimal performance

Tables analyze and update Analyze Statement

The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

Automatic Optimizer Statistics Collection

By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called GATHER_STATS_JOB. By default this job runs within maintenance windows between 10 P.M. to 6 A.M. week nights and all day on weekends. The job calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which gathers statistics for tables with either empty or stale statistics, similar to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is that the internal job prioritizes the work such that tables most urgently requiring statistics updates are processed first.

Hint categories:

Hints can be categorized as follows:

ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.

(/*+ ALL_ROWS */)

FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/*+ FIRST_ROWS */)

CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics gathered.

Hints for Join Orders,
Hints for Join Operations,
Hints for Parallel Execution, (/*+ parallel(a,4) */) specify degree either 2 or 4 or 16
Additional Hints
HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.

/*+ use_hash */

Use Hint to force using index

SELECT /*+INDEX (TABLE_NAME INDEX_NAME) */ COL1,COL2 FROM TABLE_NAME

Select ( /*+ hash */ ) empno from

ORDERED-à This hint forces tables to be joined in the order specified. If you know table X has fewer rows, then ordering it first may speed execution in a join.

PARALLEL (table, instances)àThis specifies the operation is to be done in parallel.

If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and update example---in where clase like st,not in ,>,< ,<> then we will use.

Explain plan

Explain plan will tell us whether the query properly using indexes or not.whatis the cost of the table whether it is doing full table scan or not, based on these statistics we can tune the query.

The explain plan process stores data in the PLAN_TABLE. This table can be located in the current schema or a shared schema and is created using in SQL*Plus as follows:

SQL> CONN sys/password AS SYSDBA

Connected

SQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sql

SQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;

What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?

If query taking long time then First will run the query in Explain Plan, The explain plan process stores data in the PLAN_TABLE.

it will give us execution plan of the query like whether the query is using the relevant indexes on the joining columns or indexes to support the query are missing.

If joining columns doesn’t have index then it will do the full table scan if it is full table scan the cost will be more then will create the indexes on the joining columns and will run the query it should give better performance and also needs to analyze the tables if analyzation happened long back. The ANALYZE statement can be used to gather statistics for a specific table, index or cluster using

ANALYZE TABLE employees COMPUTE STATISTICS;

If still have performance issue then will use HINTS, hint is nothing but a clue. We can use hints like

ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.

(/*+ ALL_ROWS */)

FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/*+ FIRST_ROWS */)

CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statistics gathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table and uses hash index to find corresponding records. Therefore not suitable for < or > join conditions.

/*+ use_hash */

Hints are most useful to optimize the query performance.

What is the difference between sub-query & co-related sub query?

A sub query is executed once for the parent statement

whereas the correlated sub query is executed once for each

row of the parent query.

Sub Query:

Example:

Select deptno, ename, sal from emp a where sal in (select sal from Grade where sal_grade=’A’ or sal_grade=’B’)

Co-Related Sun query:

Example:

Find all employees who earn more than the average salary in their department.

SELECT last-named, salary, department_id FROM employees A

WHERE salary > (SELECT AVG (salary)

FROM employees B WHERE B.department_id =A.department_id

Group by B.department_id)

Sub-query	Co-related sub-query
A sub-query is executed once for the parent Query	Where as co-related sub-query is executed once for each row of the parent query.
Example: Select * from emp where deptno in (select deptno from dept);	Example: Select a.* from emp e where sal >= (select avg(sal) from emp a where a.deptno=e.deptno group by a.deptno);

View & Materialized View & Inline View

View:

Why Use Views?

• To restrict data access

• To make complex queries easy

• To provide data independence

A simple view is one that:

– Derives data from only one table

– Contains no functions or groups of data

– Can perform DML operations through the view.

A complex view is one that:

– Derives data from many tables

– Contains functions or groups of data

– Does not always allow DML operations through the view

A view has a logical existence but a materialized view has

a physical existence.Moreover a materialized view can be

Indexed, analysed and so on....that is all the things that

we can do with a table can also be done with a materialized

view.

We can keep aggregated data into materialized view. we can schedule the MV to refresh but table can’t.MV can be created based on multiple tables.

Materialized View:

In DWH materialized views are very essential because in reporting side if we do aggregate calculations as per the business requirement report performance would be de graded. So to improve report performance rather than doing report calculations and joins at reporting side if we put same logic in the MV then we can directly select the data from MV without any joins and aggregations. We can also schedule MV (Materialize View).

Inline view:

If we write a select statement in from clause that is nothing but inline view.

Ex:

Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno

From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) b

Where

a.sal=b.sal and

a.deptno=b.deptno

Differences between where clause and having clause

Where clause	Having clause
Both where and having clause can be used to filter the data.
Where as in where clause it is not mandatory.	But having clause we need to use it with the group by.
Where clause applies to the individual rows.	Where as having clause is used to test some condition on the group rather than on individual rows.
Where clause is used to restrict rows.	But having clause is used to restrict groups.
Restrict normal query by where	Restrict group by function by having
In where clause every record is filtered based on where.	In having clause it is with aggregate records (group by functions).

Order of where and having:

SELECT column, group_function

FROM table

[WHERE condition]

[GROUP BY group_by_expression]

[HAVING group_condition]

[ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use the

HAVING clause to restrict groups.