Monday, September 12, 2016
Monday, August 29, 2016
Comparison of Space used between SQL Server and Vertica
Table Size in SQL Server
SELECT
t.NAME AS TableName,
s.Name AS SchemaName,
p.rows AS RowCounts,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME ='RETENTION_METRICS_Vertica'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY
t.Name
Table Size in Vertica
SELECT t.table_name AS table_name,
SUM(ps.wos_row_count + ps.ros_row_count) AS row_count,
SUM(ps.wos_used_bytes + ps.ros_used_bytes) AS byte_count,sum(ps.USED_BYTES) as sum_used_bytes
FROM tables t
JOIN projections p ON t.table_id = p.anchor_table_id
JOIN projection_storage ps on p.projection_name = ps.projection_name
WHERE t.table_name ='subscription'
GROUP BY t.table_name
ORDER BY byte_count DESC;
Tuesday, July 12, 2016
How to load data from HDFS into Vertica
Through Copy Command
Data:(hdfsdata.txt) ( delimited by space)
1 amazon
2 google
3 microsoft
Through External table
localhost should be your hadoop host
Data:(hdfsdata.txt) ( delimited by space)
1 amazon
2 google
3 microsoft
drop table if exists test.hdfsdata;
create table test.hdfsdata(
col1 int,
col2 varchar(100)
);
hadoop fs -put hdfsdata.txt /user/tickmoh/test/.
vsql> COPY test.hdfsdata SOURCE Hdfs(url='localhost:50070/webhdfs/v1/user/tickmoh/test/hdfsdata.txt', username='tickmoh') DELIMITER AS ' ' NULL AS 'null';
Through External table
drop table if exists test.hdfsdata_ext;
create external table test.hdfsdata_ext(
col1 int,
col2 varchar(100)
)
AS COPY SOURCE Hdfs(url='localhost:50070/webhdfs/v1/user/tickmoh/test/hdfsdata.txt', username='tickmoh') DELIMITER AS ' ' NULL AS 'null';
localhost should be your hadoop host
Thursday, June 30, 2016
Important queries
vsql -U dbadmin -f test200agg_8_6.sql &
SELECT p.PROJECTION_NAME, p.NODE_id,p.IS_UP_TO_DATE, p.HAS_STATISTICS, p.is_super_projection,
ps.USED_BYTES, ps.wos_row_count , ps.ros_row_count ,
ps.wos_used_bytes , ps.ros_used_bytes, ps.ROS_COUNT
FROM tables t
JOIN projections p ON t.table_id = p.anchor_table_id
JOIN projection_storage ps on p.projection_name = ps.projection_name
WHERE t.table_name ilike 'subscription';
drop table if exists test.tt1;
create external table test.tt1(
col1 int
)
AS COPY SOURCE Hdfs(url='hdfs://sjvqnhdpl010.mcafee.int:50070/user/smohapa1/testfile.txt', username = 'smohapa1',password='smohapa1');
COPY testTable SOURCE Hdfs(url='http://sjvqnhdpl010.mcafee.int:8888/user/smohapa1/testfile.txt', username='dbadmin');
DROP PROJECTION IF EXISTS sales_p1;
CREATE PROJECTION sales_p1(code,amount,product_id,sales_date)
as
(
select code,amount,product_id,sales_date
from sales
where sales_date>now()-30
) UNSEGMENTED ALL NODES ;
select refresh('sales');
[Error Code: 5600, SQL State: 42601] [Vertica][VJDBC](5600) ERROR: Invalid predicate in projection-select. Only PK=FK equijoins are allowed
DROP PROJECTION IF EXISTS sales_p1;
CREATE PROJECTION sales_p1(code,amount,product_id,sales_date)
as
(
select sales_date, count(code) ,sum(amount)
from sales
group by sales_date
) UNSEGMENTED ALL NODES ;
ERROR: SEGMENTED BY / UNSEGMENTED is not allowed in aggregate projection. The aggregate projection is automatically segmented on group by columns
Exporting the Sales table script
dbadmin=>
select export_tables('/home/dbadmin/scr/sql_tables_empty.sql','sales');
Exporting the Sales table script
dbadmin=>
select export_tables('/home/dbadmin/scr/sql_tables_empty.sql','');
Check the
table script
SELECT query_profiles.transaction_id, query_profiles.statement_id, query_profiles.query, resource_acquisitions.open_file_handle_count, resource_acquisitions.thread_count, resource_acquisitions.memory_inuse_kb, resource_acquisitions.duration_ms FROM query_profiles, resource_acquisitions WHERE ( resource_acquisitions.transaction_id = query_profiles.transaction_id AND resource_acquisitions.statement_id = query_profiles.statement_id ) AND ( query_profiles.transaction_id = 49539595901078494 AND query_profiles.statement_id = 1 );
SELECT transaction_id, statement_id, io_type, projection_name FROM projection_usage ORDER BY query_start_timestamp DESC;
Auto commit flag is on or off
<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "arial" , "helvetica" , sans-serif;">
</span><br />
<br />
<pre style="background-color: #eeeeee; border: 1px dashed #999999; color: black; font-family: "andale mono" , "lucida console" , "monaco" , "fixed" , monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code style="color: black; word-wrap: normal;">
// scalastyle:off println
package org.apache.spark.examples.streaming
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Counts words in new text files created in the given directory
* Usage: HdfsWordCount <directory>
* <directory> is the directory that Spark Streaming will use to find and read new text files.
*
* To run this on your local machine on directory `localdir`, run this example
* $ bin/run-example \
* org.apache.spark.examples.streaming.HdfsWordCount localdir
*
* Then create a text file in `localdir` and the words in the file will get counted.
*/
object HdfsWordCount {
def main(args: Array[String]) {
if (args.length < 1) {
System.err.println("Usage: HdfsWordCount <directory>")
System.exit(1)
}
StreamingExamples.setStreamingLogLevels()
val sparkConf = new SparkConf().setAppName("HdfsWordCount")
// Create the context
val ssc = new StreamingContext(sparkConf, Seconds(2))
// Create the FileInputDStream on the directory and use the
// stream to count words in new files created
val lines = ssc.textFileStream(args(0))
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
}
}
// scalastyle:on println
</code> </pre>
<br /></div>
Export table script
To extract table script in vertica , use the following .
EXPORT_TABLES ( [ 'destination' ] , [ 'scope' ] ) Example:
EXPORT_TABLES ( [ 'destination' ] , [ 'scope' ] ) Example:
dbadmin=> select EXPORT_TABLES ( '' , 'public.sales' );
dbadmin=> select EXPORT_TABLES ( '/home/dbadmin/scripts/sales.txt' , 'public.sales' );
default is '' means standard output , it will print on screen.
Wednesday, June 29, 2016
Tuesday, June 28, 2016
Create and drop foreign key
Vertica doesn't enforce the constraints by default.Here is an example that shows drop foreign key/tables , creating tables, primary key and then adding foreign key relationship to the table.
Drop foreign key:
Vertica 7.1 does not enforce constraint yet. Let us try to violate the primary key constraint. Then detect the constraint violation using analyse_constraints.
Creating foreign key:
Now let us try to violate integrity constraint .
Join of both tables constrainted by PK-FK relationship also doesnt return any error.
Drop foreign key:
ALTER TABLE sales DROP CONSTRAINT IF EXISTS FK_Sales_product;
DROP TABLE IF EXISTS product cascade;
Drop table if exists sales cascade;
create table product(
product_id int NOT NULL ,
description varchar(100) ,
date_added datetime);
ALTER TABLE product ADD PRIMARY KEY (product_id) ;
Drop table if exists sales;
CREATE TABLE sales (
sales_id integer NOT NULL,
code varchar(20),
quantity smallint,
amount money,
product_id int NOT NULL,
sales_date datetime);
ALTER TABLE sales ADD PRIMARY KEY (sales_id) ;
INSERT INTO sales
select 1,'RC012',1,50,1001,now();
INSERT INTO product
select 1001,'Wagon',now();
Vertica 7.1 does not enforce constraint yet. Let us try to violate the primary key constraint. Then detect the constraint violation using analyse_constraints.
Insert into sales
select 1,'RC999',1,50,1001,now();
12:21:52 [INSERT - 1 row(s), 0.026 secs] Command processed
... 1 statement(s) executed, 1 row(s) affected, exec/fetch time: 0.026/0.000 sec [1 successful, 0 warnings, 0 errors]
dbadmin=> SELECT ANALYZE_CONSTRAINTS('sales');
Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-------------+------------+--------------+-----------------+-----------------+---------------
public | sales | sales_id | C_PRIMARY | PRIMARY | ('1')
(1 row)
Creating foreign key:
dbadmin=> ALTER TABLE sales ADD CONSTRAINT fk_sales_product FOREIGN KEY(product_id) REFERENCES product(product_id);
WARNING 4887: Table sales has data. Queries using this table may give wrong results if the data does not satisfy this constraint
HINT: Use analyze_constraints() to check constraint violation on data
ALTER TABLE
Now let us try to violate integrity constraint .
Insert into sales
select 2,'RC999',1,50,1002,now();
It doesnt give any error.
dbadmin=> SELECT ANALYZE_CONSTRAINTS('sales');
Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
-------------+------------+--------------+------------------+-----------------+---------------
public | sales | sales_id | C_PRIMARY | PRIMARY | ('1')
public | sales | product_id | fk_sales_product | FOREIGN | ('1002')
Join of both tables constrainted by PK-FK relationship also doesnt return any error.
dbadmin=>select s.code, p.description from
sales s join product p on s.product_id=p.product_id
code | description
-------+-------------
RC012 | Wagon
RC999 | Wagon
RC999 | Wagon
RC999 | Wagon
Monday, June 27, 2016
Live Aggregate Projection
Create live aggregate projections for large tables that you add data to frequently.
- When anchor table is large.
- The data need to be aggregated.
- No update, delete, or merge data in the anchor table.
Anchor Projection
An anchor
table is a database table
that is the source for data in a projection. All projections have anchor
tables. The projections are the physical storage of the table data. When you
drop an anchor table and specify the CASCADE keyword, HP
Vertica drops the anchor
projection and the anchor table.
An anchor
projection is the physical
storage for an anchor table that has a live aggregate projection. You must
create an anchor projection before you create a live aggregate projection,
including Top-K projections. The anchor projection segmentation must be a
subset of the live aggregate projection.
When HP Vertica loads data into an existing anchor
table, it appends the data to the anchor projection. Then it aggregates the
data and adds it to the live aggregate projection. After you create a live
aggregate or Top-K projection, you cannot perform any update, delete, or merge
operations on the anchor table or anchor projection.
When you drop the anchor table and specify the
CASCADE keyword, HP Vertica drops
the anchor table, the anchor projection, and the live aggregate projection.
Pre-Join Projection
A subset of query specific projections that are used to
store the results of a join between a single large (fact/anchor) table in the
logical schema with one or more dimension tables. As with other query-specific
projections, HP Vertica calculates
and stores a new result set each time data is inserted or loaded into these
tables. This provides a significant performance advantage over joining tables
when queries are run.
The result set of a pre-join projection is typically sorted
for a specific query or commonalities in a class of queries based on the query
predicate. This provides optimal query performance.
Pre-join projections can have only inner joins between
tables on their primary and foreign key columns. Outer joins are not allowed in
pre-join projections
Before creating the pre-join projection, we need following steps.
Before creating the pre-join projection, we need following steps.
- Create table with primary keys and add foreign key constraints
- Create anchor tables
- Create pre-join projection
Here is the example.
<pre style="background-color: #eeeeee; border: 1px dashed #999999; color: black; font-family: "andale mono" , "lucida console" , "monaco" , "fixed" , monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code style="color: black; word-wrap: normal;">
ALTER TABLE sales DROP CONSTRAINT IF EXISTS FK_Sales_product;
DROP TABLE IF EXISTS product cascade;
Drop table if exists sales cascade;
create table product(
product_id int NOT NULL ,
description varchar(100) ,
date_added datetime);
ALTER TABLE product ADD PRIMARY KEY (product_id) ;
Drop table if exists sales;
CREATE TABLE sales (
sales_id integer ,
code varchar(20),
quantity smallint,
amount money,
product_id int NOT NULL,
sales_date datetime);
ALTER TABLE sales ADD PRIMARY KEY (sales_id) ;
ALTER TABLE sales ADD CONSTRAINT fk_sales_product FOREIGN KEY(product_id) REFERENCES product(product_id);
INSERT INTO sales
select 1,'RC012',1,50,1001,now();
INSERT INTO product
select 1001,'Wagon',now();
--SELECT ANALYZE_CONSTRAINTS('sales');
CREATE PROJECTION sales_anc (
sales_id,
code,
quantity,
amount,
product_id,
sales_date)
AS SELECT * FROM public.sales UNSEGMENTED ALL NODES;
CREATE PROJECTION product_anc (
product_id,
description,
date_added)
AS SELECT * FROM public.product UNSEGMENTED ALL NODES;
select refresh('sales');
select refresh('product');
select * from projections where anchor_table_name='sales';
CREATE PROJECTION Sales_product (
sales_id,
code,
amount,
description,
sales_date
)
AS SELECT sales_id,code,amount,description,sales_date
FROM public.sales s join product p ON s.product_id=p.product_id
ORDER BY s.sales_date;
explain SELECT count(*)
FROM public.sales s join product p ON s.product_id=p.product_id
ORDER BY s.sales_date;
</code> </pre>
<pre style="background-color: #eeeeee; border: 1px dashed #999999; color: black; font-family: "andale mono" , "lucida console" , "monaco" , "fixed" , monospace; font-size: 12px; line-height: 14px; overflow: auto; padding: 5px; width: 100%;"><code style="color: black; word-wrap: normal;">
ALTER TABLE sales DROP CONSTRAINT IF EXISTS FK_Sales_product;
DROP TABLE IF EXISTS product cascade;
Drop table if exists sales cascade;
create table product(
product_id int NOT NULL ,
description varchar(100) ,
date_added datetime);
ALTER TABLE product ADD PRIMARY KEY (product_id) ;
Drop table if exists sales;
CREATE TABLE sales (
sales_id integer ,
code varchar(20),
quantity smallint,
amount money,
product_id int NOT NULL,
sales_date datetime);
ALTER TABLE sales ADD PRIMARY KEY (sales_id) ;
ALTER TABLE sales ADD CONSTRAINT fk_sales_product FOREIGN KEY(product_id) REFERENCES product(product_id);
INSERT INTO sales
select 1,'RC012',1,50,1001,now();
INSERT INTO product
select 1001,'Wagon',now();
--SELECT ANALYZE_CONSTRAINTS('sales');
CREATE PROJECTION sales_anc (
sales_id,
code,
quantity,
amount,
product_id,
sales_date)
AS SELECT * FROM public.sales UNSEGMENTED ALL NODES;
CREATE PROJECTION product_anc (
product_id,
description,
date_added)
AS SELECT * FROM public.product UNSEGMENTED ALL NODES;
select refresh('sales');
select refresh('product');
select * from projections where anchor_table_name='sales';
CREATE PROJECTION Sales_product (
sales_id,
code,
amount,
description,
sales_date
)
AS SELECT sales_id,code,amount,description,sales_date
FROM public.sales s join product p ON s.product_id=p.product_id
ORDER BY s.sales_date;
explain SELECT count(*)
FROM public.sales s join product p ON s.product_id=p.product_id
ORDER BY s.sales_date;
</code> </pre>
Super projection
When you create a table, by default Vertica creates a super
projection consisting of all columns.
CREATE TABLE
public.sales(
sales_id int NOT NULL,
code varchar(20) NOT NULL,
quantity int,
amount numeric(18,4),
product_id int,
sales_date timestamp);
sales_id int NOT NULL,
code varchar(20) NOT NULL,
quantity int,
amount numeric(18,4),
product_id int,
sales_date timestamp);
ALTER TABLE
public.sales ADD CONSTRAINT pk_sales PRIMARY KEY (sales_id, code);
SELECT ps.node_name,p.projection_name,
row_count, p.is_super_projection
FROM projection_storage PS
join projections p ON
p.projection_name=ps.projection_name
WHERE
ps.anchor_table_name='sales' and p. projection_basename='sales'
ORDER BY p.projection_name
,ps.node_name;
As k safe =1, Vertica creates 2 super projections (_b0 and _b1) and distributes evenly across available nodes.
Check in projection_columns tables for the columns inside a projection.
Create Vertica Projection ( Segmented and Unsegmented)
For the below example :
The cluster has 3 nodes .
The cluster has 3 nodes .
Sales table has 101 rows.
System k-Safety is 1.
Creating a projection( Segmented):
DROP PROJECTION IF EXISTS sales_p2;
CREATE PROJECTION sales_p2(code,amount,sales_date)
AS
SELECT code,amount,sales_date
FROM sales
ORDER BY sales_date
SEGMENTED BY hash(code)
All Nodes offset 0 KSAFE ;
SEGMENTED BY hash(code) -Choose the segmentation column with large number of unique values like primary key .
All Nodes will distribute the data evenly across ALL Nodes.
Offset 0 means segmentation will start from Node 0.
KSAFE is required when the projection is segmented. KSAFE without any k-num defaults to system K-Safety number.
KSAFE is required when the projection is segmented. KSAFE without any k-num defaults to system K-Safety number.
Refresh Projection:
SELECT refresh('sales');
Query to analyse the data distribution:
SELECT ps.node_name,p.projection_name,
row_count
FROM projection_storage ps JOIN projections p
ON p.projection_name=ps.projection_name
WHERE ps.anchor_table_name='sales' AND p.
projection_basename='sales_p2'
ORDER BY p.projection_name ,ps.node_name;
As k-safe = 1, System creates 2 projections ( postfixed with _b0 and _b1 )
Below image demonstrates the concept of buddy
projection.
Creating a projection( Unsegmented):
DROP PROJECTION IF
EXISTS sales_p1;
CREATE PROJECTION
sales_p1 (code, amount, product_id, sales_date)
AS (
SELECT code,
amount,product_id,sales_date
FROM sales
ORDER BY sales_date
)
UNSEGMENTED ALL NODES;
Select refresh ('sales');
Unsegmented Projection will create replication of the projection across mentioned Nodes.
Saturday, June 25, 2016
Create Primary key in vertica table
Creating a table with primary key (System
assigns name C_PRIMARY):
DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
sales_id integer primary key ,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime);
INSERT INTO sales
SELECT 1,'RT0132',1,100,10023,now()
Creating a primary on an existing table:
DROP TABLE IF EXISTS product;
create table product(
product_id int ,
description
varchar(100) ,
date_added datetime);
ALTER TABLE product ADD PRIMARY KEY
(product_id) ;
Informational Message =>
Code: 2623 SQL State: 23502 --- Column
"product_id" definition changed to NOT NULL
Check the constraint information
from System table:
Select * from table_constraints where
table_name = 'sales';
Dropping Primary Key Constraint:
ALTER TABLE sales DROP CONSTRAINT C_PRIMARY;
Creating a named Primary key:
DROP TABLE IF EXISTS sales ;
CREATE TABLE sales (
sales_id int CONSTRAINT pk_sales PRIMARY KEY
,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime);
Creating a named Primary key
inline table creation:
DROP TABLE IF EXISTS sales ;
Create table sales (
sales_id int,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime,
CONSTRAINT pk_sales PRIMARY KEY(sales_id)
);
Creating composite primary key
with multiple columns:
DROP TABLE IF EXISTS sales ;
Create table sales (
sales_id int,
code varchar(20),
quantity smallint,
amount money,
product_id int,
sales_date datetime,
CONSTRAINT pk_sales PRIMARY
KEY(sales_id,code)
);
Checking
column level constraints:
select
tc.constraint_name,tc.constraint_type,cc.column_name from
table_constraints TC JOIN
constraint_columns CC ON CC.constraint_id =
tc.constraint_id
WHERE tc.table_name ='sales'
Friday, June 24, 2016
KSafety in Vertica
K-Safety is the maximum number of Nodes , the cluster can afford to loose.
So k-Safety =1 => Cluster will continue to work , if 1 node fails.
In other words, cluster will fail , if 2 nodes fail
To
check system k-safety:
vsql=>
select DESIGNED_FAULT_TOLERANCE, CURRENT_FAULT_TOLERANCE from system;
Where:
DESIGNED_FAULT_TOLERANCE
= The designed or intended K-Safety level.
CURRENT_FAULT_TOLERANCE = The number of node failures the
cluster can tolerate before it shuts down automatically.
To change k-safety value:
VSQL=> SELECT MARK_DESIGN_KSAFE ( k )
Where k
2 enables high availability if the schema design meets
requirements for K-Safety=2
1 enables high availability if the schema design meets
requirements for K-Safety=1
0 disables high availability
Thursday, June 23, 2016
Limiting the number of rows in table
How to limit the number of records from a Vertica Query?
Select * from public.subscription limit 100;
List all databases and Show Database
List all databases
dbadmin=>
\l
List
of databases
name
| user_name
-------+-----------
Otter
| dbadmin
(1
row)
Other commands (\d? [Pattern]) are as follows: (Case
insensitive)
Command
|
List All
|
Example
|
\l
|
Databases
|
|
\df
|
functions
|
\df *date*
|
\dn
|
Schemas
|
|
\dt
|
Tables
|
\dt *
\dt public.*
|
\dtv
|
Tables
and views
|
\dtv
Sub*
|
\dv
|
Views
|
|
\dp
|
Projections
and privileges
|
|
\du
|
Users
|
|
\ds
|
Sequences
|
|
\dS
|
System
Tables
|
|
\z
|
Table
access privileges
|
List database
configuration parameters
· You must have super user
privileges to view some values. If you are not the super user, and you use the keyword
'ALL', parameter values are masked with ****. If the parameter is specifically
listed, an error message appears.
·
Non-superusers
cannot view database or node-level security parameters.
SHOW DATABASE VMart ALL;
Name
| setting
-------------------------+------------------------
AnalyzeRowCountInterval
| 3600
DefaultSessionLocale
| en_US@collation=binary
JavaBinaryForUDx
| /usr/bin/java
Subscribe to:
Posts (Atom)