Thursday, July 31, 2008
Is a Temporary Table Really Necessary? - PART III:
Below is a short query that displays this logic by obtaining all the user table names and executing sp_spaceused on each table.
SET NOCOUNT ON
DECLARE @lngTabCount INTEGER
DECLARE @lngLoopCount INTEGER
DECLARE @strTabName SYSNAME
CREATE TABLE #tTables
(numID INTEGER IDENTITY(1,1),strTableName SYSNAME)
INSERT INTO #tTables (strTableName)
SELECT name
FROM dbo.sysobjects
WHERE xtype = 'u'
SET @lngTabCount = @@ROWCOUNTSET ,@lngLoopCount = @lngTabCount
WHILE @lngLoopCount <> 0BEGINSET
@strTabName = (SELECT strTableName FROM #tTables WHERE numID = @lngLoopCount)EXEC sp_spaceused @strTabNameSET @lngLoopCount = @lngLoopCount - 1
ENDDROP TABLE #tTables
GO
Cursor-like actions without cursor overhead and performance related problems.
Wednesday, July 30, 2008
Is a Temporary Table Really Necessary? - PART II:
CREATE TABLE #tquery2a (multiplecolumns DATATYPES)
CREATE TABLE #tquery2b(mulitplecolumns DATATYPES)
INSERT INTO #tquery2a
SELECT columns
FROM dbo.table2
WHERE table2.strPhoneNumber = '5555555555'
INSERT INTO #tquery2b
SELECT columns
FROM dbo.table3
WHERE table3.strPhoneNumberCalled = '1234561234'
SELECT table1.numCustID, #tquery2a.strPhoneNumber, #tquery2b.strPhoneNumberCalled
FROM dbo.table1 table1
INNER JOIN #tquery2a #tquery2a ON table1.numBillID = #tquery2a.numBillID
INNER JOIN #tquery2b #tquery2b ON #tquery2a.numBillDtlID = #tquery2b.numBillDtlID
WHERE table1.numCustID = '5555'
ORDER BY #tquery2b.dtmCalled DESC
Believe it or not this method works, especially with the ORDER BY statement and its performance is vastly better than the original query.Reporting off an OLTP designed databases is not always the easiest thing to do.
The database is just built to maximize reports that executives want. Using temporary tables to stage the results from numerous SELECT statements, aggregate those results before displaying them is sometimes the only way to can get reports out of an OLTP database.
Working in a call center application you are usually asked to produce reports that summarize what the call center reps are doing on a time filtered basis. Working your way through all the tables to gather the data and then summarizing it in multiple ways can only be accomplished with the use of temporary tables.
Before any comes up with this argument: I know I work in a multi-billion company but that doesn't mean that executives are willing to listen to our arguments that they need a data warehouse or a simple reporting database if it means they have to spend money to get one when they can just as easy piggy-back off of the OLTP database and blame me if the queries are too slow and cause a performance headache for the servers. Sorry, that was for the theoretical guys out there who have magically gotten everything they wanted no matter the cost or the size of the companies they worked for.
The last argument for the use of a temporary table is to replace a cursor. I am not fond of cursors and advocate doing anything possible to replace the cursor (performance of your solution needs to be tested against the performance of the cursor though).
Tuesday, July 29, 2008
Is a Temporary Table Really Necessary? - PART I:
There are several reasons that I use temporary tables in my work; to hold the results of a called stored procedure, to reduce the number of rows for joins, to aggregate data from different sources, or to replace cursors.As your query become more complex you will find yourself repeating blocks of code within a query or between different queries.
This reuse of code makes the case for a creating a stored procedure with that code and calling that stored procedure. This may make for a large amount of stored procedures in your database, but it does greatly reduce the maintenance when the functionality needs to be changed and you have to change code to meet that functionality, only one query to change now and not multiple queries that have to researched and changed. I use this technique quite often and it often forces me to use a temporary table to hold the results of those stored procedures since Transact-SQL does not allow the results of a stored procedure to be used as a table.
This is probably the number one reason in my code for the use of temporary tables. I quite often find myself having to join a 10 million plus row table to a 100 million plus row table to a 20 million plus row table and then ordering the results so only the most recent activity displays first.
Even with proper indexes and using WHERE clauses to filter and force the use of an index, the performance of the query is unacceptable (since the application I work on is used by call centers, acceptable performance for a query is measured in seconds) and often the sorting produces huge performance losses as well as huge tempdb activity. I have quite often found that using corresponding temporary tables for each of the permanent tables to hold data from filtered by the WHERE clauses before I join and sort the data will increase performance on the query to such a large degree that I can actually place it into production without worrying about its performance or the impact on the tempdb database.
Below is a very simple query to show how I do this.
Original Query to find details on a particular customer's phone call
SELECT table1.numCustID, table2.strPhoneNumber, table3.strPhoneNumberCalled
FROM dbo.table1 table1
INNER JOIN dbo.table2 table2 ON table1.numBillID = table2.numBillID
INNER JOIN dbo.table3 table3 ON table2.numBillDtlID = table3.numBillDtlID
WHERE table1.numCustID = '5555'
AND table2.strPhoneNumber = '5555555555'
AND table3.strPhoneNumberCalled = '1234561234'
ORDER BY table3.dtmCalled DESC
(This query does not match the schema or an existing query at Verizon. It has been created to show a particular problem with a hypothetical telecommunications database.)
Monday, July 28, 2008
Sybase Interview Question's - SET V:
- Explain Stored Procedures on performance?Query optimizer generates a query plan for a st. proc based on the parameters parsed in the first time it is executed. To make st.prc to perform well, first the quires used in the procedure are to be checked. Also since it is compiled and stored in memory, procedure may become inefficient over time. Further execution used only if it is gain otherwise again the st. proc is recompiled.For example when it was first compiled, it may be fast as the table size is small and the table scan is better choice. But when the table grows into huge, index selection may be a better option than the table scans for that query. But the procedure will still do only table scan as it is pre-compiled, so as a general rule, if the table size changes by 20%, we have to run update statistics on the table and recompile the stored procedure. For stored proc in which, the parameter has the strong influence over the result set, we have to make it with recompile option, so that it compiles every time it is accessed.
- What is sp_dboption?It is used to enable or disable db option for your database
- Explain Denormalization techniques?Denormalization is used for performance boosting. In which normalized db is denormalized for the sake of performance. According to normalization all columns has to depend on the primary key, but some times creating a manipulated field.Redundant column may be used in a table when we always join the first table to get a column data from the second table using foreign key. Data partitioning techniques like vertically splitting the tables, horizontally splitting the tables can be used to denormalize a db. Vertical split technique is typically used to minimize the size of the primary table or to move infrequently accessed columns to a separate table. Horizontal split technique involves splitting the table at row level into two or more tables. For ex, if we can split the table into two tables, so that the first table has the recent entries and a table that has historical data, we can drastically increase the size of the new table, whose columns are going to access frequently
- Which is better Stored procedure or Trigger?1.If you intend to set or modify some column values in the proc/trig, a trig is the right place to do it. A BEFORE trigger can directly modify the row buffer before it is updated/inserted. A procedure would have to issue a UPDATE, thus a performance penalty.2. The trigger has all the row data automatically available to it. For the procedure to have the same information, you'll either have to pass all values to the proc via parms, or the proc will need to re-read the data, a performance penalty.3. If your application issues a single update, delete, or insert that modifies multiple rows, a FOR EACH ROW trigger will be fired for each row. It could be difficult to provide the SP the information that allows it to identify the set of rows that were modified.
- I made two tables A B id1 age id1 age 1 Asked me to find out all the id1's which are in table A and select * from A where not exists(select id1 from B) do not exist in table b
- How did you know the index was the cause of the performance degration?
- Compare left and Right Outer Join?For left join all row was selected in outer table and for inner table row which are not meet the join condition are filled with NULL. In Right join inner table all row was selected and for outer table the rows which not meet the join conditins will filled with NULL
- What is Join and Its Types?Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.
- what are different global variables ?@@rowcount-returns no of rows processed by preceeding command.@@error- returns the error code for last or current query.@@transtate- returns the current transaction state. @@tranchained-returns the current transcation mode.@@server- returns the server name.@@version-returns the version of the sql server and OS used in a system.@@spid- current process id.@@identity-returns the lastly incremented identity value.@@nestlevel-No of level in stored procedures/trigger.@@sqlstatus-status of previous fetch statement in cursor.
- what is temporary tables and its types?Sub query: A select statement that is nested inside another select, insert, update or delete statement, or inside another subquery.Correlated subquery: A subquery that cannot be evaluated independently, but that depends on the outer query for its results. Also called a repeating subquery, since the subquery is executed once for each row that might be selected by the outer query.
Wednesday, July 23, 2008
Sybase Interview QuestionS SET-IV:
- what is In & OUT in stored procedure?The output keyword may be specified in the sp creation statement if the parameters can bepassed both in and out of the sp,when passing out you must specifiy output keyword inexecution statement
- Tell me the limitation of the Trigger?any create commend,drop command,alter table,grant,revoke,select into,truncate,update statistics are not permitted in trigger
- can I update view?yes,but view had created using only one table than update is possiable
- what is difference between constraints and rules?rules deals with constant and one rule can bound one column,constraints can bound many column and it can be compare with column in another table
- How will you bind a rule?sp_bindrule
- Explain about @@sqlstatus?It returns the status of the pervious fetch statement in a cursor
- What are the steps involved in creating cursor?Declaring cursors,declaring variables,opening cursors,fetching rows,main loop,closing the cursor,deallocating cursors
- What is Transaction Log?It is a database level system table called syslogs,which contains a sequential list of all modification to every objects in the database.
- Compare Rollback transaction and Rollback Trigger?
- How will bcp out the Identity column?-E
- What is Demand lock and Dead lock?A deadlock occurs when there is a cyclic dependency between two or more transactions for the same set of resources.Demand Lock:A demand lock prevents any more shared locks from being set on a data resource (table or data page). Any new shared lock request has to wait for the demand lock request to finish.
- How to get the last month’s last Day?select datename(dw,dateadd(dd,30-datepart(dd,getdate()),dateadd(mm,-1,getdate())))
- What is patindex()?returns the starting postion of the specified ,else 0
- How do I get the Duplicate rows from a table?select * from table name group by column1 having count(*)>1
- What is command permission and object permission?OP is Grant/revoke to these objects for permissions which includes select,update,delete,execute.CP is grant /revoke permission for create (db,table,view,procedure,defaults)
Monday, July 21, 2008
Replacing Cursors and While Loops
CURSORS and WHILE loops, we can't seem to live with them and we can't seem to live without them or can we?! I recently had to look at some code here where I work that has been in place for some time, code not written by me. I was testing performance on a new box (8 gig of RAM and 4 PROCs with hyper-threading) and we wanted to compare its performance to one of our current boxes (4 gig of RAM and 4 older PROCS), so after doing the usual - backups, restores, re-indexes, & DBCC - against a 17 gig database, we decide to import an important job and run it. The good news is it ran much faster as was expected: 2 hours 35 minutes down to 1 hour 19 minutes. I was, however, a little distraught to see that the code had inner cursors embedded in an outer cursor, a BIG RED flag to me that this code wasn't performing as well as it could.
Cursor Description
The inner cursors were attempting to pivot data from the many side of a one-to-many table into single column data of an Output table. The approach was to loop through each primary record getting the key value (1.7 million of them) and then loop through the secondary table for each of those primary records while altering and concatenating the retrieved data for each into 3 different variables. This was followed by an update to the Output table based on a match to the primary record ID after which the process continued until all output records were updated. The approach is sound enough but slow - very, very slow. There is a lot of overhead Declaring, Opening, Fetching, Closing and De-Allocating cursors. I knew I could do better. The cursor code is shown below (changed to not expose any specific metadata information of the company where I work). Only one inner cursor is shown.
(Declare & start of outer Cursor code)
--Inner Cursor code…code for just one column pivot
-- (variables are previously declared an)
DECLARE temp_cursor CURSOR FOR
SELECT column_data
FROM DB.dbo.many_tbl
WHERE id = @tmp_id -- key data
ORDER BY column_data
OPEN temp_cursor
FETCH NEXT FROM temp_cursor
INTO @tmp_data
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @tmp_values = @tmp_values+convert(varchar(20),@tmp_data)+','
FETCH NEXT FROM temp_cursor
INTO @tmp_data
END
CLOSE temp_cursor
DEALLOCATE temp_cursor
UPDATE DB.dbo.OutPut_tbl
SET column_out = @tmp_values
WHERE id = @tmp_id
(Fetch next in outer Cursor)
(Close & Deallocate outer Cursor)
Replacing the Inner Cursors
Back in 2002 I had submitted the “Easy Table Pivot†code to SQL Server Central and I knew that I could at least get rid of the inner cursor(s). The code below directly replaced the above inner cursor.
(Declare & start of outer Cursor code) --Direct column pivot into a variable -- (variables are previously declared)
SELECT @tmp_values = @tmp_values + convert(varchar(20), column_data) + ',' FROM DB.dbo.many_tbl WHERE id = @tmp_id ORDER BY column_data UPDATE DB.dbo.OutPut_tbl SET column_out = @tmp_values WHERE id = @tmp_id (Fetch next in outer Cursor) (Close & Deallocate outer Cursor)
Replacing the Outer Cursor
This code is simpler and faster than the cursor and this piece of the total job dropped from 52 minutes to about 11 minutes. So who could ask for anything better! Perhaps in SQL 7.0 you can't, at least I don't think you can (all the above code was already in a stored procedure) but what about SQL 2000? After looking at the above code the select statement looked like a good candidate for a scalar function. I was sure this part of the job would perform better as a set process in a single update with a function call then as a loop. By the way, I had replaced the cursor with a WHILE loop to eliminate the overhead associated with Opening, Closing and De-Allocating the cursor. Anyway, here is the function I created, basically from the same select statement.
CREATE function dbo.ufn_data_pivot(@id as int) Returns varchar(20) AS BEGIN DECLARE @value varchar(20) SET @value = '' SELECT @value = @value + convert(varchar(20), column_data) + ',' FROM DB.dbo.many_tbl WHERE id = @id ORDER BY column_data Return @value END
Conclusion
You can be sure that I will be making every attempt in the future to replace all manual looping code with function calls in SQL 2000 and beyond, especially on large data sets.
Sunday, July 20, 2008
When To Use Cursors
My standard line is that you should try to solve the problem in a set based way first and reserve cursors for these situations:
a) Multiple database maintenance type tasks, where you need to run through many (or at least more than one) databases
b) You just cannot figure out a set based solution. That may sound simple and/or lame, but we get paid to solve problems. If you (and your DBA) can't figure out a set based solution in 30 minutes, use the cursor. That solves the problem and often you gain a deeper understanding of the problem that may lead you back to a set based solution.
c)You want to leverage logic you've already encapsulated in a stored procedure.
I hope you'll agree with the first point, imagine you'll vilify me for the second point, and maybe scratch your head about the third. The goal today is to get you to rethink your views on cursors and if enough interest, I may follow up with a deeper discussion. We'll focus on the third point as a starting place.
First, some background. I believe in using stored procedures for all data access with the exception of things that intrinsically require dynamic SQL like searches. I also believe that putting some business logic in a stored procedure is worth doing if it reduces round trips to the client machine. The limiting factor to stored procedures is that it's hard to pass in the equivalent of an array so that you can use that encapsulated logic for one record (request) or for many.
One example from my background required parsing a string containing order information (for books, shirts, etc) and splitting it into parent/child tables. Orders were received via an HTTP post, each post containing all the information relevant to the order. The post was inserted as a single string into a table so that it completed quickly, then a job ran frequently to process the new records. It was an ugly bit of parsing and I fell back on point #2 above, using a cursor to handle the parsing because there were some special cases I needed to handle. So, in pseudo code, I had something like this:
create proc usp_ParseOrder @WebOrderID
as begin trans
some ugly cursor stuff here
commit trans
Possibly I could have solved it set based, but for the sake of discussion let's say that it performed well enough and had the advantage of being easy to understand if maintenance was needed that no further effort was warranted.
For the purposes of this discussion it's how I solved the problem inside the stored procedure but rather that it was complicated to express.
So the job that processed new records looked something like this (again, pseudo code):
open cursor
for each record in cursor
exec usp_ParseOrder @WebOrderID
next
close cursor
That's the flaw of building procedures designed to handle a single request (think of users adding/editing records, etc). They are fast and clean, but if you want to reuse the logic you either call the procedure repeatedly, refactor the procedure to handle 1 to unlimited records, or you duplicate the logic for batch processing.
The flaws are offset by a some nice wins:
a)Duration of each transaction should be short compared to a large batch
b)Growth of the log file has less chance of getting out of hand. Logging 100 small transactions provides the chance for the log to roll over where doing one big transaction may require the log to grow
c)That complex logic is in one place and is pretty easy to work with
d)If something goes wrong you roll back one very small transaction
Those potentials wins should also be considerations when you're coding. It's easy to think batch when you're processing 100 records or even a 1000, but what about 10k? 100k? Ever roll back a million record transaction?
I think looping to call a stored procedure multiple times to leverage logic is a valid and useful technique. Be practical and pragmatic about it's application and you'll do well.
Saturday, July 19, 2008
Sybase interview questions SET III:
- How Does Sybase know to what extent a transaction has to be rolled back ? sybase rollback thr transaction upto where it begins(that is to Begn commit)
- How do you check whether the rollback was correct? By checking the @@trancount value before rollback statement execute How to import table using “INSERT” statement?insert tablename (column list) select [column list] from table name1
- How do you select unique rows using Sybase? using Distinct keyword
- How many database Sybase 11 have ,list out with explanations?four.Master,Model,sybsystemprocs,tempdb
- What is Roll Forward and Roll Back? Roll forward : commited transaction not written to data area are rolled forward into the data.Roll back:uncommited transaction rollback all data modification done with in transaction are reversed
- What is Dirty reads and how its differ from Phantom reads? when one transaction reads a set of rows that satisfy a search condition, and then a second transaction modifies the data (through an insert, delete, update, and so on). If the first transaction repeats the read with the same search conditions, it obtains a different set of rows.
- How will you Restart and Exit on while loop? continue,break
- How many columns can table have? 250
- Give me the syntax for creating user-Defined Datatypes? sp_addtype type_name,system_type,{nullnot nullidentity}
- How shall I simulate from level 0 to level 3 in Isolation? using holdlock
- Can I explicitly insert a value in a Identity column? set identity_insert on
- Can I change the data type of column ? yes using modify keyword
- What is isnull function? It substitues a specified value for a null value in a column in a query or an aggregate function
Friday, July 18, 2008
Sybase interview questions SET II:
- While bcp in default,rules,constraints can be applied? Only default can apply
- What is Hotspot?
- When a multipe processes attempt to modify a same page in table How do I force the lock?Using Holdlock What is Isolation levels and explain them? The categories of locking behavior within transaction that are defined by ANSI,Level are 0,1,2,3
- How to get Yesterday’s date? dataadd(dd,-1,getdate())
- What is sp_depends? It is used to display a list of objects referenced by stored procedures
- What is Normalization? Database is said to be normalized,If it compiles with a set of formal rules.these formal rules defines how data that represents different entities should be organized in adatabase.
- How do I delete Duplicate rows in a table in a single Query? SELECT DISTINCT * into #Temparory_table from table_nameDELETE table_nameINSERT table_name(column list) select * from #temparory_table
- If i want creation text of a particular object (such as SP,Trigger,view,rule,default) what shall i do other than sp_helptext and defncopy sp_helptext,defncopy,syscomments Compare Join and SubQuery in performance? Generally Join queries consume more memory than sub query. Sub query in turn involve intermediate table creation so affects performance, because of high I/Os fetches need for processing. If the RAM memory is more, then the Joins can be used instead of sub queries. Under memory constraints we can go for sub queries.Sometimes sub queries are flattened to join to improve the performance. Also making the outer query to compare inner query with equality option can materialize the subquery.
- Explain Worktables for performance? a) If column has a unique/primary key, than there is no need to specify the distinct clause. b) If you have more than four tables to joins, it will definitely use a worktable to resolve the query. Try to keep the number of tables to join to minimum. Try to create subquery to avoid join of more than four tables. c) If there is an index on a column, which keeps the column sorted, you don't need to specify order by clause in the query. d) If you know that data is unique, than there is no need to specify the group by clause in the query.
- There's a performance issue What will be your very first step towards the solution? First you need to identify which query is creating the problem and than to identify where the problem lies in that query. Query that is taking maximum time is the first you want to consider for the optimization. To check which query is taking maximum time to execute place getdate() both before and after the query. Identity the type of the query whether it is a data look-up query, or data modification query.
Thursday, July 17, 2008
Sybase interview Questions SET - I:
- What is stored procedure?what is the maximum parameter can pass through it? It is a db objects that exists independly of the table ,Which contain a set of query.maximum paraperters pass through it is 255
- What is Trigger?how many trigger can a table have and what are? It is internal part of a statement that fired when insert,update,delete operation taken place.Three trigger,Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table.
- What is magic table and why it is so called? The table that exists within a trigger named as Inserted and deleted are Magic table
- What is view?how many table can join for creating view? It is logical way of looking at physical data located in a table,16
- How will you add a column and drop a column? Alter table table name [add][drop] column name
- How will you find the version of Sybase? @@version
- Compare WHERE clause and HAVING clause? Where clause cannot have Aggeregate function ,and it select the row before the group by function .having select the row after the group by
- What is IDENTITY column? It is used for sequential,unique numbering of rows being inserted into the table.one identity column is allowed for a table
- What is CURSOR? It is a pointer, use for row by row operation
- How many index can have for a table? 250
- What is Clustered and Nonclustered index? When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater.
- While creating index I forgot to tell what type what will happen? nonclustered index
- What is Transaction? It is set of operation to be completed at one time as a single operation.
- What is BCP?and Types? It is used for import and export the values from table to os file and vice-ver.Fast bcp and Slow bcp
- What is Format file? It is the file, which has certain format to store data in bcp.
