Thursday, 26 August 2010

How the Z-index Attribute Works for HTML Elements

There are many ways to classify elements on a Web page. For the purposes of this article and the z-index attribute, we can divide them into two categories: windowed and windowless.

Windowed Elements

* <OBJECT> tag elements
* ActiveX controls
* Plug-ins
* Dynamic HTML (DHTML) Scriptlets
* SELECT elements
* IFRAMEs in Internet Explorer 5.01 and earlier

Windowless Elements

* Windowless ActiveX controls
* IFRAMEs in Internet Explorer 5.5 and later
* Most DHTML elements, such as hyperlinks or tables

All windowed elements paint themselves on top of all windowless elements, despite the wishes of their container. However, windowed elements do follow the z-index attribute with respect to each other, just as windowless elements follow the z-index attribute with respect to each other.

All windowless elements are rendered on the same MSHTML plane, and windowed elements draw on a separate MSHTML plane. You can use z-index to manipulate elements on the same plane but not to mix and match with elements in different planes. You can rearrange the z-indexing of the elements on each plane, but the windowed plane always draws on the top of the windowless plane.

http://support.microsoft.com/kb/177378

How to Use the Canonical Tag

Google, Yahoo & Microsoft Unite On “Canonical Tag” To Reduce Duplicate Content Clutter

The web is full of duplicate content. Search engines try to index and display the original or “canonical” version. Searchers only want to see one version in results. And site owners worry that if search engines find multiple versions of a page, their link credit will be diluted and they’ll lose ranking.

Today, Google, Yahoo and Microsoft (links are to their separate announcements) have united to offer a way to reduce duplicate content clutter and make things easier for everyone. Webmasters rejoice! Worried about duplicate content on your site? Want to know what “canonical” means? Read on for more details.

Multiple URLs, one page

Duplicate content comes in different forms, but a major scenario is multiple URLs that point to the same page. This can come up for lots of reasons. An ecommerce site might allow various sort orders for a page (by lowest price, highest rated…), the marketing department might want tracking codes added to URLs for analytics. You could end up with 100 pages, but 10 URLs for each page. Suddenly search engines have to sort through 1,000 URLs.

This can be a problem for a couple of reasons.

* Less of the site may get crawled. Search engine crawlers use a limited amount of bandwidth on each site (based on numerous factors). If the crawler only is able to crawl 100 pages of your site in a single visit, you want it to be 100 unique pages, not 10 pages 10 times each.

* Each page may not get full link credit. If a page has 10 URLs that point to it, then other sites can link to it 10 different ways. One link to each URL dilutes the value the page could have if all 10 links pointed to a single URL.

Using the new canonical tag

Specify the canonical version using a tag in the head section of the page as follows:

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish"/>

That’s it!

* You can only use the tag on pages within a single site (subdomains and subfolders are fine).
* You can use relative or absolute links, but the search engines recommend absolute links.

This tag will operate in a similar way to a 301 redirect for all URLs that display the page with this tag.

* Links to all URLs will be consolidated to the one specified as canonical.
* Search engines will consider this URL a “strong hint” as to the one to crawl and index.

Canonical URL best practices

The search engines use this as a hint, not as a directive, (Google calls it a “suggestion that we honor strongly”) but are more likely to use it if the URLs use best practices, such as:

* The content rendered for each URL is very similar or exact
* The canonical URL is the shortest version
* The URL uses easy to understand parameter patterns (such as using ? and %)

Can this be abused by spammers? They might try, but Matt Cutts of Google told me that the same safeguards that prevent abuse by other methods (such as redirects) are in place here as well, and that Google reserves the right to take action on sites that are using the tag to manipulate search engines and violate search engine guidelines.

For instance, this tag will only work with very similar or identical content, so you can’t use it to send all of the link value from the less important pages of your site to the more important ones.

If tags conflict (such as pages point to each other as canonical, the URL specified as canonical redirects to a non-canonical version, or the page specified as canonical doesn’t exist), search engines will sort things out just as they do now, and will determine which URL they think is the best canonical version.

For more info visit
http://searchengineland.com/canonical-tag-16537
http://googlewebmastercentral.blogspot.com/2007/09/google-duplicate-content-caused-by-url.html
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
How to Use the Canonical Tag

Regular Expressions

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as *.txt to find all text files in a file manager. The regex equivalent is .*\.txt$.

Some Definitions

We are going to be using the terms literal, metacharacter, target string, escape sequence and search string in this overview. Here is a definition of our terms:

literal A literal is any character we use in a search or matching expression, for example, to find ind in windows the ind is a literal string - each character plays a part in the search, it is literally the string we want to find.

metacharacter A metacharacter is one or more special characters that have a unique meaning and are NOT used as literals in the search expression, for example, the character ^ (circumflex or caret) is a metacharacter.

escape sequence An escape sequence is a way of indicating that we want to use one of our metacharacters as a literal. In a regular expression an escape sequence involves placing the metacharacter \ (backslash) in front of the metacharacter that we want to use as a literal, for example, if we want to find ^ind in w^indow then we use the search string \^ind and if we want to find \\file in the string c:\\file then we would need to use the search string \\\\file (each \ we want to search for (a literal) is preceded by an escape sequence \).

target string This term describes the string that we will be searching, that is, the string in which we want to find our match or search pattern.

search expression This term describes the expression that we will be using to search our target string, that is, the pattern we use to find what we want.

Brackets, Ranges and Negation

Bracket expressions introduce our first metacharacters, in this case the square brackets which allow us to define list of things to test for rather than the single characters we have been checking up until now. These lists can be grouped into what are known as Character Classes typically comprising well know groups such as all numbers etc.

[ ] Match anything inside the square brackets for one character position once and only once, for example, [12] means match the target to either 1 or 2 while [0123456789] means match to any character in the range 0 to 9.

- The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].

You can define more than one range inside a list e.g. [0-9A-C] means check for 0 to 9 and A to C (but not a to c).

NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.

^ The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.

NOTE:Spaces, or in this case the lack of them, between ranges are very important.

Positioning (or Anchors)

^ The ^ (circumflex or caret) outside square brackets means look only at the beginning of the target string, for example, ^Win will not find Windows in STRING1 but ^Moz will find Mozilla.

$ The $ (dollar) means look only at the end of the target string, for example, fox$ will find a match in 'silver fox' since it appears at the end of the string but not in 'the fox jumped over the moon'.

. The . (period) means any character(s) in this position, for example, ton. will find tons and tonneau but not wanton because it has no following character.

Iteration 'metacharacters'

The following is a set of iteration metacharacters (a.k.a. quantifiers) that can control the number of times a character or string is found in our searches.

? The ? (question mark) matches the preceding character 0 or 1 times only, for example, colou?r will find both color and colour.

* The * (asterisk or star) matches the preceding character 0 or more times, for example, tre* will find tree and tread and trough.

+ The + (plus) matches the previous character 1 or more times, for example, tre+ will find tree and tread but not trough.

More 'metacharacters'

The following is a set of additional metacharacters that provide added power to our searches:

() The ( (open parenthesis) and ) (close parenthesis) may be used to group (or bind) parts of our search expression together.

"MSIE.(5\.[5-9])|([6-9])" matches MSIE 5.5 (or greater) OR MSIE 6+.

| The | (vertical bar or pipe) is called alternation in techspeak and means find the left hand OR right values, for example, gr(a|e)y will find 'gray' or 'grey'.

Common Extensions and Abbreviations

Character Class Abbreviations

\d Match any character in the range 0 - 9
\D Match any character NOT in the range 0 - 9
\s Match any whitespace characters (space, tab etc.).
\S Match any character NOT whitespace (space, tab).
\w Match any character in the range 0 - 9, A - Z and a - z
\W Match any character NOT the range 0 - 9, A - Z and a - z

Positional Abbreviations

\b Word boundary. Match any character(s) at the beginning (\bxx) and/or end (xx\b) of a word, thus \bton\b will find ton but not tons, but \bton will find tons.
\B Not word boundary. Match any character(s) NOT at the beginning(\Bxx) and/or end (xx\B) of a word, thus \Bton\B will find wantons but not tons, but ton\B will find both wantons and tons.

See Regular Expressions - User guide for more information.

http://www.regular-expressions.info/