Query performance of "=" vs "<" and ">"

Suppose you have a large table (>5 million records) containing 15
columns including a numerical string column called ACTIVITY_DATE which
is indexed (nonunique, nonclustered) and a numerical string column
called TIMESTAMP which is indexed (nonunique,nonclustered).
ACTIVITY_DATE contains a date string ('20081211'); TIMESTAMP also
appends a time ('20081211 19:18:03.34923').

Suppose you want to return all events that took place on December 11th
and suppose you had to choose between the following queries:

  - select col1,col2,col3 from TABLE where ACTIVITY_DATE='20081211'

  - select col1,col2,col3 from TABLE where TIMESTAMP>'20081211' and
TIMESTAMP<'20081212'


Is either of these queries 'better' from a Sybase internals point of
view?  Are these queries absolutely equivalent and the ACTIVITY_DATE
column is superfluous as long as TIMESTAMP is there?  Or is an '='
query special, something to be fought for?


Thanks in advance,
subnet
0
dsubnet0
12/12/2008 12:56:59 PM
sybase.ase.general 8655 articles. 0 followers. Follow

4 Replies
729 Views

Similar Articles

[PageSpeed] 18

It all depends on the number of io's needed to satisfy the query as to which 
one will be faster.  My guess is that the "=" query is bound to be faster 
because:

1.  The key of the index (ACTIVITY_DATE) is 8 bytes in length which is much 
smaller than the TIMESTAMP index key which appears to be 22 bytes.  Almost 3 
times smaller key means that almost 3 times as many rows can fit on an index 
page, and therefore your index will be smaller and fewer index pages would 
be needed to be processed to satisfy the query.

2.  Depending on your statistics maintenance strategy, an equality search 
argument can be costed more accurately than a range predicate.  Therefore 
the estimated number of rows that the optimizer expects to qualify may be 
more accurate with the "=" predicate.

To be more accurate, you might just consider storing these values as their 
native datatypes with either "datetime/smalldatetime/date/time" rather than 
character datatypes.

"dsubnet0" <dsubnet0@gmail.com> wrote in message 
news:b81854dd-692e-43af-b74b-affa44e4a11e@t3g2000yqa.googlegroups.com...
> Suppose you have a large table (>5 million records) containing 15
> columns including a numerical string column called ACTIVITY_DATE which
> is indexed (nonunique, nonclustered) and a numerical string column
> called TIMESTAMP which is indexed (nonunique,nonclustered).
> ACTIVITY_DATE contains a date string ('20081211'); TIMESTAMP also
> appends a time ('20081211 19:18:03.34923').
>
> Suppose you want to return all events that took place on December 11th
> and suppose you had to choose between the following queries:
>
>  - select col1,col2,col3 from TABLE where ACTIVITY_DATE='20081211'
>
>  - select col1,col2,col3 from TABLE where TIMESTAMP>'20081211' and
> TIMESTAMP<'20081212'
>
>
> Is either of these queries 'better' from a Sybase internals point of
> view?  Are these queries absolutely equivalent and the ACTIVITY_DATE
> column is superfluous as long as TIMESTAMP is there?  Or is an '='
> query special, something to be fought for?
>
>
> Thanks in advance,
> subnet 


0
Sherlock
12/12/2008 5:39:03 PM
On 2008-12-13 04:39:03 +1100, "Sherlock, Kevin [TeamSybase]" 
<kevin.sherlock@teamsybase.com> said:

> To be more accurate, you might just consider storing these values as their
> native datatypes with either "datetime/smalldatetime/date/time" rather than
> character datatypes.

Definintely.  Storing datetime as a string is a:
- simply wrong, breaks basic database design rules
- requires conversion/unpacing on every read access, and 
conversion/packing on every write access; a performance hit which is to 
be avoided.
Get used to datepart & date arithmetic functions.

And yes, ACTIVITY_DATE is redundant.
-- 
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright � 2008 Software Gems Pty Ltd
--
With the financial meltdown, consolidating many databases into one 
server and managing mixed load is a demand.  Ask people who have been 
doing it for years.

0
Derek
12/17/2008 11:40:03 PM
On Dec 17, 6:40=A0pm, Derek Asirvadem <derek.asirva...@gmail.com> wrote:
> On 2008-12-13 04:39:03 +1100, "Sherlock, Kevin [TeamSybase]"
> <kevin.sherl...@teamsybase.com> said:
>
> > To be more accurate, you might just consider storing these values as th=
eir
> > native datatypes with either "datetime/smalldatetime/date/time" rather =
than
> > character datatypes.
>
> Definintely. =A0Storing datetime as a string is a:
> - simply wrong, breaks basic database design rules
> - requires conversion/unpacing on every read access, and
> conversion/packing on every write access; a performance hit which is to
> be avoided.
> Get used to datepart & date arithmetic functions.
>
> And yes, ACTIVITY_DATE is redundant.
> --
> Cheers
> Derek
> Senior Sybase DBA / Information Architect
> Copyright =A9 2008 Software Gems Pty Ltd
> --
> With the financial meltdown, consolidating many databases into one
> server and managing mixed load is a demand. =A0Ask people who have been
> doing it for years.

Re: datatype, I very much agree and we are investigating that
possibility, but let's assume for now that implementing that in our
application will fail cost-benefit (as it probably will).

Derek, your final comment contradicts Kevin's comments, as well as
'statistics io' which confirms that using a 'between' query on
TIMESTAMP requires 37,890 logical reads while a point query on
ACTIVITY_DATE required only 11,336.

Are you speaking in terms of finding the minimum number of columns to
represent the data, or do you really mean that there's a way to obtain
the same performance out of TIMESTAMP that you can get out of
ACTIVITY_DATE?  It appears in this case that redundifying some data
could net us a performance boost.
0
dsubnet0
12/18/2008 1:03:20 PM
> On 2008-12-19 00:03:20 +1100, dsubnet0 <dsubnet0@gmail.com> said:
> 
> Derek, your final comment contradicts Kevin's comments, as well as
> 'statistics io' which confirms that using a 'between' query on
> TIMESTAMP requires 37,890 logical reads while a point query on
> ACTIVITY_DATE required only 11,336.
> 
> Are you speaking in terms of finding the minimum number of columns to
> represent the data, or do you really mean that there's a way to obtain
> the same performance out of TIMESTAMP that you can get out of
> ACTIVITY_DATE?  It appears in this case that redundifying some data
> could net us a performance boost.

That's not what I stated/meant.  Any contradiction is in your 
interpretation.  The whole idea of database is multiple uses; different 
forms and types of use; and thus 35 years ago, minds greater than mine 
came up with database design methods, rules, normalisation.  
Contradictions have already been experienced, and resolved, by those 
very rules, by setting them in a hierarchy.  One needs to understand 
the whole ... it starts to break down when one takes a single isolated 
need ( one particular SELECT ) out of the database context, argue its 
merits.  That's why you need a DataModeller (not a Developer) to design 
databases.

1  From a database design perspective, formal Normalisation (which is 
demanded for Relational databases) and sensibility:
a.  you have cast the column datatypes incorrectly
b.  you have a redundant column ACTIVITY_DATE
period.  FIXING those two problems will give you relief in both 
performance and coding problems.  Period

2  As a technical statement, without regard to your db as is, as as it 
should be, I agree with Kevin's statement [1]

3  Since ACTIVITY_DATE is a redundant column, of course 
ACTIVITY_DATE_IDX is also a redundant index.  If you did not have the 
redundant col; the redundant index; you would not have anything to 
compare with.

4  As an isolated technical fact regarding one SELECT statement, of 
course using an index which is one third the size of some other index, 
and thus one third as many I/Os, is more efficient.

5  However there are many other isolated technical facts about all the 
uses and forms of use of the database, and particularly that table.  A 
few that come to mind:
- the redundant col is a headache and not too expensive re space etc, 
but the redundant index is very expensive: every INSERT/DELETE (let's 
say you have an APL table with 4 necessary indices and the redundant 
fifth is ACTIVITY_DATE_IDX) has to insert/delete to 6 instead of 5 
objects; depending on how full the pages are, as many as 13 writes.
-  If you have less indices overall, the number of structures that are 
in memory is less (more will fit into the same memory), that you are 
scanning is less (you can do more work with the same processing power).

6  While all these points are true, if you follow my recommendation re 
hierarchy of rules, and fixed the root problem/cause, all the lower 
order (consequential) problems would disappear.  Re [1.a]:
- cast TIMESTAMP correctly as a datetime, it would use 8 bytes (instead of 22)
- the TIMESTAMP_IDX would be 8 instead of 22
- Kevin's statement [1] re ACTIVITY_DATE_IDX would apply to TIMESTAMP_IDX
- you would get your isolated query serviced in 11,336 instead of 37,890 I/Os.

7  Additionally then:
- all requirements for ACTIVITY_DATE use TIMESTAMP
- remove the duplicate column and index
- your table is smaller by 22 bytes; it has one less index, and it is 
faster overall


> It appears in this case that redundifying some data
> could net us a performance boost.

Removing duplicate data (more precisely, normalising the database, 
which eliminates duplicate data) will ALWAYS give you an overall 
performance boost.  Some people present introducing duplicate data into 
the database as "denormalising for performance" which is a common myth, 
and completely, totally, false.  On inspection, the truth is always 
lack of normalisation; and that always impedes performance overall (if 
at all, it can only be justified in the context of isolated queries).

> Re: datatype, I very much agree and we are investigating that
> possibility, but let's assume for now that implementing that in our
> application will fail cost-benefit (as it probably will).

In the many assignments I have executed re exactly this kind of issue, 
let me assure you that the later you correct such errors, the more 
expensive it is to correct.  It is only by limiting yourself to 
isolated facts, and avoiding other facts, that you can continue with 
the path you have chosen.  That's why it is so important to use all 
database design rules, to design it correctly, from the beginning.
-- 
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright � 2008 Software Gems Pty Ltd
Quality Standards = Zero Maintenance + Zero Surprises
Performance Standards = Predictability + Scaleability

0
Derek
12/19/2008 3:08:48 AM
Reply:

Similar Artilces:

Regular Expression to remove "/", "\", "<", ">" and "="
Can anyone please show me the regular expression to reject a string ("<blue", "right>" etc.) which has the following expression in it: "/", "\", "<", ">" and "="  hi, It may Help u.. it is in Class file u may use this expressin in validation controls also. Regex objReg = new Regex(@"^[^,.?/\~|`;:'<>]*$", RegexOptions.Singleline); Regex objReg = new Regex(@"^[^,][^.][^?][^/][^\][^~|][^`][^;][^:][^'][[^<][^>]$", RegexOptions.IgnoreCase);Thanks &...

Precedence of "where" ("of", "is", "will")?
Nobody on #perl6 today could answer this one. Is: Str | Int where { $_ } the same as: (Str | Int) where { $_ } or: Str | (Int where { $_ }) ? Followup questions, Mr. President: What kind of operators are "where", "of", "is", and "will"? Is there a reason that S03 doesn't list them? What are their precedence(s)? -- Chip Salzenberg - a.k.a. - <chip@pobox.com> Open Source is not an excuse to write fun code then leave the actual work to others. Chip Salzenberg writes: &...

.ALLCOL("%COLUMN%", " ", ", ", ", ")
Do you know anyway for me to exclude a subset of columns returned by this function. We have two columns (rec_user and rec_datetime) which are in all of our tables, but when generating triggers I want automatically generate a script which does not include those two columns but does include all other columns in that table. Bruce I should add that I am using PD 9.0.0.580. Bruce "Bruce Lamb" <lamb.bruce@mayo.edu> wrote in message news:6HgI315nCHA.155@forums.sybase.com... > Do you know anyway for me to exclude a subset of columns returned by this > function. ...

"<->" as "->" with automatic "is rw"
I like that arguments will be readonly by default. But when I look at my current code, I see that I would be typing " is rw" quite a lot, which in my opinion is too long for a thing that occurs very often. Every such situation in my code is a foreach loop. A thing that in Perl 6 will mostly be used with the pointy sub declaration syntax. If I'm not mistaken, <-> is still available. It communicates "bidirectional" and that is more or less the same as read/write access. I'm proposing for zip(@foos, @bars, @xyzzies) <-> $foo, $bar, $xyzz...

quotes, quotes, quotes...
I am getting this error and I know what is causing it, but I have no idea how to fix it, any help would be great. The script steps through the /var/log/messages file on a linux server and puts The entries into a mysql database. However when it gets to the 'hlt' line in the messages file it just barfs. The single quotes are freaking it out. I know about quotes but not how to use in this situation. Thanks, Paul Error: May 27 17:53:00 localhost kernel: Checking 'hlt' instruction... OK. <----- doesn't like this in the messages file DBD::mysql::st exec...

What is this for? "<label style=""display:none;"" for=""" & cboFieldType.ClientID.ToString & """>Type</label>"
I'm digging through the code of DNN and I found many ocurrences of this type of code: lblFieldType.Text = "<label style=""display:none;"" for=""" & cboFieldType.ClientID.ToString & """>Type</label>" What is that code for? The label is never displayed and I could not figure out what DDN uses it for. thank you This is for ADA compliance. The guidelines require that form elements have an associated label. This helps the text readers out with identifying the different parts of the form. BruceDynamic...

what's the different from "<page src="">" and "<page codeBehind="">"
what's the different from "<page src="">" and "<page codeBehind="">" Thank YOU how to make session' life longer? Thank You This very question was asked only 5 days ago. Searching the forums would have yielded the answer you require. However, see this thread. Steven BeyRecursion: see Recursion You can set a new value for the Session Timeout in your web.config file. Or, you can do it in a page through this property: Session.Timeout Note that if you choose the second method, the timeout will only be modified on this page,...

"Using" or "With"
Hi all Please can someone enlighten to me as regards the difference with the "Using" and "With" statement when accessing data - which is better, what are the limitations and/or any pointers. Many thanks. Regards DaveDavid WinchesterPlease mark as answer if this is the solution.  using gives you the ability to use the connection and it closes the connection directlly after you finish using it. and there is no need to try- cach - finaly. there is no limitation on using USING keywordMuhanad YOUNISMCSD.NETMy Blog || My Photos || LinkedIn I have a dataobject the re...

"Me" is better than "You"
Yes I know, strings are frozen. But let me talk about it, I really can't get through the idea of a PC talkin to me. I consider my PC as an extension of myself, not a dumb companion who addresses Me as You. Yes there are times when I get angry with Him while I work and get wrong calculations etc.., but it really is my fault, Me using wrong istructions and eventually wanting to find someone else to blame, but it's Me. And yes, I consider Thunderbird my mail program, reading my mail on my PC as Me. So I personally like to have Me in the header bar as a compact address ...

replace the "." with a ","
Oi.... I need to build a small programm in ASP.NET and chose to use C# for it.Now i got everything working but there's one little problem.the first textbox is a double. I need to make it so that when someone enters a "." then it gets replaced by a ","any ideas?Ghan  string blah = "4.2.2.2";blah = blah.Replace(".", ",");Ryan Ryan OlshanASPInsider | Microsoft MVP, ASP.NEThttp://ryanolshan.comHow to ask a question...

"To" and "From" missing
When I print emails, the words "To" and "From" are blank, even though the "To" name and "From name (addresser, addressee) do show up. This is not a problem for other users on my system. Suggestions In mailbox right click, view. On the message window, right click and choose print options. Make sure print header is checked. -- Barry Merchant NSC Volunteer SysOp *** no email unless requested please!! *** > In mailbox right click, view. On the message window, right click and > choose print options. Make sure prin...

double quote
hello there...  i tried everything of think but not working the way i wanted to be... not sure what i'm missing...i'm generating a <span> in code behind and then using in javascript.... here is what i'm doing code behind: int i=0string _keywordID = "keyword";string _name = row["visit_info_nm"].ToString().Trim(); String _getElementByID = String.Format("<span id='{0}' OnClick = \"document.getElementById('{1}').value='{2}';\">{3}</span><br>", i, _keywordID, _name, _name); here is what it generate : <span id='1' OnClick = \"document.getElementById('keyword')...

"-" not "_"
I wrote a SQL statement in the data tab. I wrote a bunch of alaises as example ' word-type ' but when I hit the layout tab it converts the "-" to "_". So now my field name is ' word_type '. Is there any way to prevent this? CardGunner Don' use a hypen ( - ).  It isn't a valid character for column names.   See http://searchsqlserver.techtarget.com/expert/KnowledgebaseAnswer/0,289625,sid87_gci1188931,00.html   Here's an excerpt about column names: Letters as defined in the Unicode Standard 2.0 Decimal numbers from either B...

Web resources about - Query performance of "=" vs "<" and ">" - sybase.ase.general

Resources last updated: 12/27/2015 11:51:25 PM