Monday, August 29, 2016

Comparison of Space used between SQL Server and Vertica


Table Size in SQL Server


SELECT
    t.NAME AS TableName,
    s.Name AS SchemaName,
    p.rows AS RowCounts,
    SUM(a.total_pages) * 8 AS TotalSpaceKB,
    SUM(a.used_pages) * 8 AS UsedSpaceKB,
    (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
    sys.tables t
INNER JOIN    
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
    sys.schemas s ON t.schema_id = s.schema_id
WHERE
    t.NAME ='RETENTION_METRICS_Vertica'
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255
GROUP BY
    t.Name, s.Name, p.Rows
ORDER BY
    t.Name

 








Table Size in Vertica


SELECT t.table_name AS table_name, 
SUM(ps.wos_row_count + ps.ros_row_count) AS row_count,
SUM(ps.wos_used_bytes + ps.ros_used_bytes) AS byte_count,sum(ps.USED_BYTES) as sum_used_bytes
FROM tables t
JOIN projections p ON t.table_id = p.anchor_table_id
JOIN projection_storage ps on p.projection_name = ps.projection_name
WHERE t.table_name ='subscription'
GROUP BY t.table_name
ORDER BY byte_count DESC; 

Tuesday, July 12, 2016

How to load data from HDFS into Vertica

Through Copy Command

Data:(hdfsdata.txt) ( delimited by space)

1 amazon
2 google
3 microsoft

 

drop table if exists test.hdfsdata;

create table  test.hdfsdata(
col1 int, 
col2 varchar(100)
);

hadoop fs -put hdfsdata.txt  /user/tickmoh/test/.


vsql> COPY test.hdfsdata SOURCE Hdfs(url='localhost:50070/webhdfs/v1/user/tickmoh/test/hdfsdata.txt', username='tickmoh')  DELIMITER  AS ' ' NULL AS 'null';

 

Through External table

 
drop table if exists test.hdfsdata_ext;
create external table  test.hdfsdata_ext(
col1 int,
col2 varchar(100)
)
AS COPY SOURCE Hdfs(url='localhost:50070/webhdfs/v1/user/tickmoh/test/hdfsdata.txt', username='tickmoh')  DELIMITER  AS ' ' NULL AS 'null';

 

localhost should be your hadoop host

Thursday, June 30, 2016

Important queries


vsql -U dbadmin -f test200agg_8_6.sql &

SELECT p.PROJECTION_NAME, p.NODE_id,p.IS_UP_TO_DATE, p.HAS_STATISTICS, p.is_super_projection,
ps.USED_BYTES, ps.wos_row_count , ps.ros_row_count ,
ps.wos_used_bytes , ps.ros_used_bytes, ps.ROS_COUNT
FROM tables t
JOIN projections p ON t.table_id = p.anchor_table_id
JOIN projection_storage ps on p.projection_name = ps.projection_name
WHERE t.table_name ilike 'subscription';

drop table if exists test.tt1;
create external table  test.tt1(
col1 int
)
AS COPY SOURCE Hdfs(url='hdfs://sjvqnhdpl010.mcafee.int:50070/user/smohapa1/testfile.txt', username = 'smohapa1',password='smohapa1');


COPY testTable SOURCE Hdfs(url='http://sjvqnhdpl010.mcafee.int:8888/user/smohapa1/testfile.txt',   username='dbadmin');





DROP PROJECTION IF EXISTS sales_p1;
CREATE PROJECTION sales_p1(code,amount,product_id,sales_date)
as
(
select code,amount,product_id,sales_date
from sales
where sales_date>now()-30
) UNSEGMENTED ALL NODES ;
select refresh('sales');
 [Error Code: 5600, SQL State: 42601]  [Vertica][VJDBC](5600) ERROR: Invalid predicate in projection-select. Only PK=FK equijoins are allowed

DROP PROJECTION IF EXISTS sales_p1;
CREATE PROJECTION sales_p1(code,amount,product_id,sales_date)
as
(
select sales_date, count(code) ,sum(amount) 
from sales
group by sales_date
) UNSEGMENTED ALL NODES ;

 ERROR: SEGMENTED BY / UNSEGMENTED is not allowed in aggregate projection. The aggregate projection is automatically segmented on group by columns

Exporting the Sales table script
dbadmin=> select export_tables('/home/dbadmin/scr/sql_tables_empty.sql','sales');

Exporting the Sales table script
dbadmin=> select export_tables('/home/dbadmin/scr/sql_tables_empty.sql','');



Check the table script 



SELECT query_profiles.transaction_id, 
          query_profiles.statement_id, 
          query_profiles.query,
          resource_acquisitions.open_file_handle_count,
          resource_acquisitions.thread_count,
          resource_acquisitions.memory_inuse_kb,
          resource_acquisitions.duration_ms
   FROM query_profiles, resource_acquisitions 
   WHERE (
         resource_acquisitions.transaction_id = query_profiles.transaction_id AND
         resource_acquisitions.statement_id = query_profiles.statement_id
         ) 
         AND 
         (
         query_profiles.transaction_id =  49539595901078494 AND
         query_profiles.statement_id = 1
         );
SELECT transaction_id, statement_id, io_type, projection_name FROM projection_usage ORDER BY query_start_timestamp DESC;






Auto commit flag is on or off



<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: &quot;arial&quot; , &quot;helvetica&quot; , sans-serif;">

</span><br />
<br />
<pre style="background-color: #eeeeee; border: 1px dashed #999999; color: black; font-family: &quot;andale mono&quot; , &quot;lucida console&quot; , &quot;monaco&quot; , &quot;fixed&quot; , monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code style="color: black; word-wrap: normal;">



// scalastyle:off println
package org.apache.spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}

/**
 * Counts words in new text files created in the given directory
 * Usage: HdfsWordCount <directory>
 *   <directory> is the directory that Spark Streaming will use to find and read new text files.
 *
 * To run this on your local machine on directory `localdir`, run this example
 *    $ bin/run-example \
 *       org.apache.spark.examples.streaming.HdfsWordCount localdir
 *
 * Then create a text file in `localdir` and the words in the file will get counted.
 */
object HdfsWordCount {
  def main(args: Array[String]) {
    if (args.length < 1) {
      System.err.println("Usage: HdfsWordCount <directory>")
      System.exit(1)
    }

    StreamingExamples.setStreamingLogLevels()
    val sparkConf = new SparkConf().setAppName("HdfsWordCount")
    // Create the context
    val ssc = new StreamingContext(sparkConf, Seconds(2))

    // Create the FileInputDStream on the directory and use the
    // stream to count words in new files created
    val lines = ssc.textFileStream(args(0))
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()
    ssc.start()
    ssc.awaitTermination()
  }
}
// scalastyle:on println

</code> </pre>
<br /></div>

Export table script

To extract table script in vertica , use the following .
EXPORT_TABLES ( [ 'destination' ] , [ 'scope' ] ) Example:

 
dbadmin=> select EXPORT_TABLES ( '' , 'public.sales' );
 
dbadmin=> select EXPORT_TABLES ( '/home/dbadmin/scripts/sales.txt' , 'public.sales' );
 
 
default is '' means standard output , it will print on screen.

Tuesday, June 28, 2016

Create and drop foreign key

Vertica doesn't enforce the constraints by default.Here is an example that shows drop foreign key/tables , creating tables, primary key and then adding foreign key relationship to the table.

Drop foreign key:


ALTER TABLE sales  DROP CONSTRAINT IF EXISTS FK_Sales_product;


DROP TABLE IF EXISTS product cascade;
Drop table if exists sales cascade;

create table product(
product_id int NOT NULL ,
description  varchar(100) ,
date_added datetime);

ALTER TABLE product ADD PRIMARY KEY (product_id) ;


Drop table if exists sales;
CREATE TABLE sales (
sales_id integer NOT NULL,
code varchar(20),
quantity smallint,
amount money,
product_id int NOT NULL,
sales_date datetime);

ALTER TABLE sales ADD PRIMARY KEY (sales_id) ;

INSERT INTO sales
select 1,'RC012',1,50,1001,now();


INSERT INTO product
select 1001,'Wagon',now();
 

Vertica 7.1 does not enforce constraint yet. Let us try to violate the primary key constraint. Then detect the constraint violation using analyse_constraints.




Insert into sales
select 1,'RC999',1,50,1001,now();

 12:21:52  [INSERT - 1 row(s), 0.026 secs]  Command processed
... 1 statement(s) executed, 1 row(s) affected, exec/fetch time: 0.026/0.000 sec  [1 successful, 0 warnings, 0 errors]


dbadmin=> SELECT ANALYZE_CONSTRAINTS('sales');
 Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-------------+------------+--------------+-----------------+-----------------+---------------
 public      | sales      | sales_id     | C_PRIMARY       | PRIMARY         | ('1')
(1 row)


 

Creating foreign key:



dbadmin=> ALTER TABLE sales    ADD CONSTRAINT fk_sales_product FOREIGN KEY(product_id)     REFERENCES product(product_id);

WARNING 4887:  Table sales has data. Queries using this table may give wrong results if the data does not satisfy this constraint
HINT:  Use analyze_constraints() to check constraint violation on data
ALTER TABLE

 

Now let us try to violate integrity constraint .



Insert into sales
select 2,'RC999',1,50,1002,now();

It doesnt give any error.


dbadmin=>  SELECT ANALYZE_CONSTRAINTS('sales');
 Schema Name | Table Name | Column Names | Constraint Name  | Constraint Type | Column Values
-------------+------------+--------------+------------------+-----------------+---------------
 public      | sales      | sales_id     | C_PRIMARY        | PRIMARY         | ('1')
 public      | sales      | product_id   | fk_sales_product | FOREIGN         | ('1002')


 

Join of both tables constrainted by PK-FK relationship also doesnt return any error.




dbadmin=>select s.code, p.description from 
sales s join product p on s.product_id=p.product_id
 
 code  | description
-------+-------------
 RC012 | Wagon
 RC999 | Wagon
 RC999 | Wagon
 RC999 | Wagon

 

Monday, June 27, 2016

Live Aggregate Projection

Create live aggregate projections for large tables that you add data to frequently. 

  • When anchor table is large.
  • The data need to be aggregated.
  • No update, delete, or merge data in the anchor table.

Anchor Projection

An anchor table is a database table that is the source for data in a projection. All projections have anchor tables. The projections are the physical storage of the table data. When you drop an anchor table and specify the CASCADE keyword, HP Vertica drops the anchor projection and the anchor table.
An anchor projection is the physical storage for an anchor table that has a live aggregate projection. You must create an anchor projection before you create a live aggregate projection, including Top-K projections. The anchor projection segmentation must be a subset of the live aggregate projection.
When HP Vertica loads data into an existing anchor table, it appends the data to the anchor projection. Then it aggregates the data and adds it to the live aggregate projection. After you create a live aggregate or Top-K projection, you cannot perform any update, delete, or merge operations on the anchor table or anchor projection.
When you drop the anchor table and specify the CASCADE keyword, HP Vertica drops the anchor table, the anchor projection, and the live aggregate projection.


Pre-Join Projection

A subset of query specific projections that are used to store the results of a join between a single large (fact/anchor) table in the logical schema with one or more dimension tables. As with other query-specific projections, HP Vertica calculates and stores a new result set each time data is inserted or loaded into these tables. This provides a significant performance advantage over joining tables when queries are run.
The result set of a pre-join projection is typically sorted for a specific query or commonalities in a class of queries based on the query predicate. This provides optimal query performance.


Pre-join projections can have only inner joins between tables on their primary and foreign key columns. Outer joins are not allowed in pre-join projections

Before creating the pre-join projection, we need following steps.


  1. Create table with primary keys and add foreign key constraints
  2. Create anchor tables
  3. Create pre-join projection
Here is the example.


<pre style="background-color: #eeeeee; border: 1px dashed #999999; color: black; font-family: &quot;andale mono&quot; , &quot;lucida console&quot; , &quot;monaco&quot; , &quot;fixed&quot; , monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code style="color: black; word-wrap: normal;">
ALTER TABLE sales  DROP CONSTRAINT IF EXISTS FK_Sales_product;

DROP TABLE IF EXISTS product cascade;
Drop table if exists sales cascade;

create table product(
product_id int NOT NULL ,
description  varchar(100) ,
date_added datetime);

ALTER TABLE product ADD PRIMARY KEY (product_id) ;


Drop table if exists sales;
CREATE TABLE sales (
sales_id integer ,
code varchar(20),
quantity smallint,
amount money,
product_id int NOT NULL,
sales_date datetime);

ALTER TABLE sales ADD PRIMARY KEY (sales_id) ;


ALTER TABLE sales    ADD CONSTRAINT fk_sales_product FOREIGN KEY(product_id)     REFERENCES product(product_id);

INSERT INTO sales
select 1,'RC012',1,50,1001,now();


INSERT INTO product
select 1001,'Wagon',now();


--SELECT ANALYZE_CONSTRAINTS('sales');


CREATE PROJECTION sales_anc (
        sales_id,
        code,
        quantity,
        amount,
        product_id,
        sales_date)
   AS SELECT * FROM public.sales UNSEGMENTED ALL NODES;

 CREATE PROJECTION product_anc (
       product_id,
       description,
       date_added)
   AS SELECT * FROM public.product UNSEGMENTED ALL NODES;
 
   select refresh('sales');
   select refresh('product');
 
   select * from projections where anchor_table_name='sales';
 
 
 
   CREATE PROJECTION Sales_product (
       sales_id,
       code,
       amount,
       description,
       sales_date
       )
   AS SELECT sales_id,code,amount,description,sales_date
   FROM public.sales s join product p ON s.product_id=p.product_id
   ORDER BY s.sales_date;
 
 
   explain SELECT count(*)
   FROM public.sales s join product p ON s.product_id=p.product_id
   ORDER BY s.sales_date;
 
 
</code> </pre>

Types of Vertica Projections

Types of projections:
·       Super Projection
·       Pre-Join Projection
·       Anchor Projection
·       Live-AggregateProjection
·       Top-K Projection

Super projection


When you create a table, by default Vertica creates a super projection consisting of all columns.

CREATE TABLE public.sales(
sales_id int NOT NULL,
code varchar(20) NOT NULL,
quantity int,
amount numeric(18,4),
product_id int,
sales_date timestamp);

 ALTER TABLE public.sales ADD CONSTRAINT pk_sales PRIMARY KEY (sales_id, code);


SELECT ps.node_name,p.projection_name, row_count, p.is_super_projection
FROM projection_storage PS join projections p ON
p.projection_name=ps.projection_name
WHERE ps.anchor_table_name='sales' and p. projection_basename='sales'
ORDER BY p.projection_name ,ps.node_name;






As k safe =1, Vertica creates 2 super projections (_b0 and _b1) and distributes evenly across available nodes.

Check in projection_columns  tables for the columns inside a projection.



Create Vertica Projection ( Segmented and Unsegmented)


For the below example :
The cluster has 3 nodes .
Sales table has 101 rows.
System k-Safety is 1.

Creating a projection( Segmented):

DROP PROJECTION IF EXISTS sales_p2;
CREATE PROJECTION sales_p2(code,amount,sales_date)
AS
SELECT code,amount,sales_date
FROM sales
ORDER BY sales_date
SEGMENTED BY hash(code)  
All Nodes offset 0 KSAFE ;

SEGMENTED BY hash(code)  -Choose the segmentation column with large number of unique values like primary key .
All Nodes will distribute the data evenly across ALL Nodes.
Offset 0 means segmentation will start from Node 0.
KSAFE is required when the projection is segmented. KSAFE without any k-num defaults to system 
K-Safety number.


Refresh Projection:

SELECT refresh('sales');


Query to analyse the data distribution:

SELECT ps.node_name,p.projection_name, row_count
FROM projection_storage ps JOIN projections p
ON p.projection_name=ps.projection_name
WHERE ps.anchor_table_name='sales' AND p. projection_basename='sales_p2'
ORDER BY p.projection_name ,ps.node_name;

As k-safe = 1, System creates 2 projections ( postfixed with _b0 and _b1 )












Below image demonstrates the concept of buddy projection.

















Creating a projection( Unsegmented):


DROP PROJECTION IF EXISTS sales_p1;
CREATE PROJECTION sales_p1 (code, amount, product_id, sales_date)
AS (
SELECT code, amount,product_id,sales_date
FROM sales
ORDER BY sales_date
)
UNSEGMENTED ALL NODES;



Select refresh ('sales');

Unsegmented Projection will create replication of the projection across mentioned Nodes.









Saturday, June 25, 2016

Create Primary key in vertica table

Creating a table with primary key (System assigns name C_PRIMARY):

DROP TABLE IF EXISTS sales;

CREATE TABLE sales (
sales_id integer primary key ,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime);


INSERT INTO sales
SELECT 1,'RT0132',1,100,10023,now()


Creating a primary on an existing table:

DROP TABLE IF EXISTS product;
create table product(
product_id int ,
description  varchar(100) ,
date_added datetime);

ALTER TABLE product ADD PRIMARY KEY (product_id) ;

Informational Message =>
Code: 2623 SQL State: 23502 --- Column "product_id" definition changed to NOT NULL


Check the constraint information from System table:

Select * from table_constraints where table_name = 'sales';







Dropping Primary Key Constraint:


ALTER TABLE sales DROP CONSTRAINT C_PRIMARY;



Creating a named Primary key:

DROP TABLE IF EXISTS sales ;
CREATE TABLE sales (
sales_id int CONSTRAINT pk_sales PRIMARY KEY ,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime);


Creating a named Primary key inline table creation:

DROP TABLE IF EXISTS sales ;
Create table sales (
sales_id int,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime,
CONSTRAINT pk_sales PRIMARY KEY(sales_id)
);


Creating composite primary key with multiple columns:

DROP TABLE IF EXISTS sales ;

Create table sales (
sales_id int,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime,
CONSTRAINT pk_sales PRIMARY KEY(sales_id,code)
);





Checking column level constraints:

select tc.constraint_name,tc.constraint_type,cc.column_name from
table_constraints  TC JOIN
constraint_columns CC ON CC.constraint_id = tc.constraint_id
WHERE tc.table_name ='sales'














Friday, June 24, 2016

KSafety in Vertica



K-Safety is the maximum number of Nodes , the cluster can afford to loose.

So k-Safety =1 => Cluster will continue to work , if 1 node fails. 
                              In other words, cluster will fail , if 2 nodes fail

To check system k-safety:

  
vsql=> select DESIGNED_FAULT_TOLERANCE, CURRENT_FAULT_TOLERANCE from system;

Where:

 DESIGNED_FAULT_TOLERANCE = The designed or intended K-Safety level.
 CURRENT_FAULT_TOLERANCE = The number of node failures the cluster can tolerate before it shuts down automatically.


To change k-safety value:

VSQL=> SELECT MARK_DESIGN_KSAFE ( k )

Where k

2 enables high availability if the schema design meets requirements for K-Safety=2
1 enables high availability if the schema design meets requirements for K-Safety=1
0 disables high availability


Thursday, June 23, 2016

Find the default Isolation level in Vertica


By default, Vertica uses the READ COMMITTED isolation level.


=>SHOW transaction_isolation;









=> ALTER DATABASE mydb SET TransactionIsolationLevel = 'SERIALIZABLE';
=> ALTER DATABASE mydb SET TransactionIsolationLevel = 'READ COMMITTED';


Limiting the number of rows in table

How to limit the number of records from a Vertica Query?

Select * from public.subscription limit 100;



List all databases and Show Database

List all databases


dbadmin=> \l
 List of databases
 name | user_name
-------+-----------
 Otter | dbadmin

(1 row)

Other commands (\d? [Pattern]) are as follows: (Case insensitive)

Command
List All
Example
\l
Databases
\df
functions
\df *date*
\dn
Schemas
\dt
Tables
\dt *     \dt public.*
\dtv
Tables and views
\dtv  Sub*
\dv
Views
\dp
Projections and privileges
\du
Users
\ds
Sequences
\dS
System Tables
\z
Table access privileges


List database configuration parameters


·       You must have super user privileges to view some values. If you are not the super user, and you use the keyword 'ALL', parameter values are masked with ****. If the parameter is specifically listed, an error message appears.
·       Non-superusers cannot view database or node-level security parameters.


SHOW DATABASE VMart ALL;
         Name            |        setting
-------------------------+------------------------
 AnalyzeRowCountInterval | 3600
 DefaultSessionLocale    | en_US@collation=binary

 JavaBinaryForUDx        | /usr/bin/java