Testing: 2019

Wednesday, 29 May 2019

How to delete duplicate records in teradata table ?

For deleting duplicate records we have different ways.I have a Product_table.

Table name: product_table

product_id product_name

2 pqr

8 klm

99 qqqq

Using rowid remove the duplicate records.This query will work only SQL

Syntax: delete from where rowid not in( select max(rowid) from group by );

Query: delete from product_table where rowid not in(select max(rowid) from

product_table group by product_id);

This is another way to delete duplicates. This will work in teradata.

here first create a set table with the definition then insert the records into set table. Al ready we know the set table does not allow duplicate records.

create set table product_table_t as (select * from product_table) with no data;

insert into product_table_t select * from product_table;

drop table product_table;

rename table product_table_t as product_table

using distinct, eliminating duplicate records

create temporary table with no data

Query : create table product_t as (select * from product_table) with no data

insert records in to table product_t with using distinct

insert into product_t select distinct * from product_table

drop the the product_table

rename table

rename table product_t as product

Please comment here if you know any other method of removing duplicates.

Difference between ROW_NUMBER(),RANK(),DENSE_RANK

Let us assume, i have a emp table columns ename,sal. The emp table shows below.

ENAME	SAL
SMITH	800
ALLEN	1600
WARD	1250
JONES	2975
MARTIN	1250
BLAKE	2850
CLARK	2450
SCOTT	3000
KING	5000
TURNER	1500
ADAMS	1100
JAMES	950
FORD	3000
MILLER	1300

Now lets query the table to get the salaries of all employee name with their salaries in descending order.

Query like this.

SELECT ENAME,SAL,ROW_NUMBER() OVER(ORDER BY SAL DESC) ROW_NUMBER, RANK() OVER(ORDER BY SAL DESC)RANK, DENSE_RANK() OVER(ORDER BY SAL DESC) DENSE_RANK FROM EMP

Out put

ENAME	SAL	ROW_NUMBER	RANK	DENSE_RANK
KING	5000	1	1	1
SCOTT	3000	2	2	2
FORD	3000	3	2	2
JONES	2975	4	4	3
BLAKE	2850	5	5	4
CLARK	2450	6	6	5
ALLEN	1600	7	7	6
TURNER	1500	8	8	7
MILLER	1300	9	9	8
WARD	1250	10	10	9
MARTIN	1250	11	10	9
ADAMS	1100	12	12	10
JAMES	950	13	13	11
SMITH	800	14	14	12

So question is which one to use?

Its all depends on your requirement and business rule you are following.
1. Row_number to be used only when you just want to have serial number on result set. It is not as intelligent as RANK and DENSE_RANK.
2. Choice between RANK and DENSE_RANK depends on business rule you are following. Rank leaves the gaps between number when it sees common values in 2 or more rows. DENSE_RANK don't leave any gaps between ranks.
So while assigning the next rank to the row RANK will consider the total count of rows before that row and DESNE_RANK will just give next rank according to the value.
So If you are selecting employee’s rank according to their salaries you should be using DENSE_RANK and if you are ranking students according to there marks you should be using RANK(Though it is not mandatory, depends on your requirement.)

Sunday, 31 March 2019

Full Load vs. Incremental Load in Datawarehousing

Full Load vs. Incremental Load

Full Load:
Truncates all rows and loads from scratch.
Requires more time.
Can easily be guaranteed
Can be lost.

Incremental Load:
New records and updated ones are loaded.
Requires less time.
Difficult. ETL must check for new/updated rows.
Retained.

Pages

Wednesday, 29 May 2019

How to delete duplicate records in teradata table ?

How to delete duplicate records in teradata table ?

Difference between ROW_NUMBER(),RANK(),DENSE_RANK

Difference between ROW_NUMBER(),RANK(),DENSE_RANK

Sunday, 31 March 2019

Full Load vs. Incremental Load in Datawarehousing

Full Load vs. Incremental Load