4.6. Doing something for every element with a certain attribute [Dive Into Greasemonkey]

4.6. Doing something for every element with a certain attribute

The single most powerful tool in your Greasemonkey arsenal is the evaluate method, which finds elements, attributes, and text on a page using a query language called XPath.

Let's say, for example, that you wanted to find all the links on a page. You might consider using document.getElementsByTagName('a'), but then you'll still need to check each element to see if it has an href attribute, because the <a> element can also be used for named anchors.

Instead, use Firefox's built-in XPath support to find all the <a> elements that have an href attribute.

Example: Find all the links on a page

var allLinks, thisLink;
allLinks = document.evaluate(
    '//a[@href]',
    document,
    null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
    null);
for (var i = 0; i < allLinks.snapshotLength; i++) {
    thisLink = allLinks.snapshotItem(i);
    // do something with thisLink
}

The document.evaluate method is the key here. It takes an XPath query as a string, then a bunch of other parameters I'll explain in a minute. This XPath query finds all the <a> elements that have an href attribute, and returns them in random order. (That is, the first one you get is not guaranteed to be the first link on the page.) Then you access each found element with the allLinks.snapshotItem(i) method.

XPath expressions can do wonderous things. Here's one that finds any element that has a title attribute.

Example: Find all the elements with a `title` attribute

var allElements, thisElement;
allElements = document.evaluate(
    '//*[@title]',
    document,
    null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
    null);
for (var i = 0; i < allElements.snapshotLength; i++) {
    thisElement = allElements.snapshotItem(i);
    switch (thisElement.nodeName.toUpperCase()) {
        case 'A':
            // this is a link, do something
            break;
        case 'IMG':
            // this is an image, do something else
            break;
        default:
            // do something with other kinds of HTML elements
    }
}

Once you have a reference to an element (like thisElement), you can use thisElement.nodeName to determine its HTML tag name. If the page is served as text/html, the tag name is always returned as uppercase, regardless of how it was specified in the original page. However, if the page is served as application/xhtml+xml, the tag name is always lowercase. I always use thisElement.nodeName.toUpperCase() and forget about it.

Here's an XPath query that returns every <div> with a specific class.

Example: Find all `<div>`s with a `class` of `sponsoredlink`

var allDivs, thisDiv;
allDivs = document.evaluate(
    "//div[@class='sponsoredlink']",
    document,
    null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
    null);
for (var i = 0; i < allDivs.snapshotLength; i++) {
    thisDiv = allDivs.snapshotItem(i);
    // do something with thisDiv
}

Note that I used double quotes around the XPath query string, so that I could use single quotes within it.

There are lots of variations of the document.evaluate method. The second parameter (document in both of the previous examples) can be any element, and the XPath query will only return nodes that are children of that element. So if you already have a reference to an element (say, from document.getElementById or a member of a document.getElementsByTagName array), you can restrict the query to search only children of that element.

The third parameter is a reference to a namespace resolver function, which is only useful if you care about writing user scripts that work on pages served with the application/xhtml+xml media type. If you don't know what that means, don't worry about it; there aren't very many pages like that and you probably won't run into one. Mozilla XPath documentation explains how to use it, if you really want to know.

The fourth parameter is how you want your results returned. The previous two examples both use XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, which returns elements in random order. I use this 99% of the time, but if for some reason you wanted to make sure that you got elements back in exactly the order in which they appear in the page, you can use XPathResult.ORDERED_NODE_SNAPSHOT_TYPE instead. Mozilla XPath documentation gives examples of some other variations as well.

The fifth parameter can be used to merge the results of two XPath queries. Pass in the result of a previous call to document.evaluate, and it will return the combined results of both queries. The previous two examples both use null, meaning that we are only interested in the single XPath query defined in the first parameter.

Got all that? XPath can be as simple or as complicated as you like. I urge you to read this excellent XPath tutorial to learn more about XPath syntax. As for the other parameters to document.evaluate, I rarely use them except as you've already seen here. In fact, you can define a function to encapsulate them.

Example: The `xpath` function

function xpath(query) {
    return document.evaluate(query, document, null,
        XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
}

Now you can simply call xpath('//a[@href]') to get all the links on the page, or xpath('//*[@title]') to get all the elements with a title attribute. You'll still need to use the snapshotItem method to access each item in the results; it's not a regular Javascript array.

Dive Into Greasemonkey