The Panda algorithm looks for high-quality content, but what exactly is
it looking for, how is it finding what it deems to be high-quality,
and—perhaps most pressingly—what in the world can we do to befriend the
bear?
In today's Whiteboard Friday, Michael Cottam explains what these things are, and more importantly, what we can do to be sure we get the nod from this particular bear.
In today's Whiteboard Friday, Michael Cottam explains what these things are, and more importantly, what we can do to be sure we get the nod from this particular bear.
Video transcription
Howdy Moz fans, and welcome to another edition of Whiteboard Friday.
I'm Michael Cottam. I'm an independent SEO consultant from Portland,
Oregon and have been a Moz associate for many years.
Today we're going to talk about Panda optimization. We're going to talk about real world things you can do, no general hand waving. We're going to talk about specific tactics you can use. We're going to talk about first of all what does Panda measure, secondly, how might Panda actually go about measuring these factors on your site, and then lastly, what are you going to do to win based on those factors.
There was a study done -- I think it was last summer by serpIQ, and there'll be a link to that in the notes -- that showed that for reasonably competitive terms you needed 1,500 to 2,500 words on a page to rank on page 1. They average this over ten or twenty thousand different keyword searches. Stripping out the HTML tags, count the words, what do you have left? Analyze your own pages and see if you're up near that 1,500 mark.
How do we win on that? Well, this is all about size matters. At least 1,500 words, push to 2,000 or 2,500 if you can. Sometimes that may mean going through your site and condensing four or five pages of content all into one page. You might think, well, that might make a giant long page, terrible user experience. But you can solve this with tab navigation so all the content is on the page. When you click a tab, JavaScript changes the CSS style of the various tabs to make one part show versus the other part. Google's going to see everything in all those tabs when they crawl the page, because it's all in the HTML before you click.
Lately we've seen a bunch of different blog posts from different places talking about press releases and how press releases, well, they're evil. The links don't count. Google didn't spot them all. Google is much better at it than they used to be. But still, if you do a Google search on any e-press release you've done, you generally find if you search the first sentence or so of the press release, you'll find four or five indexed pages containing that. But that's way better than it was 3 years ago when you'd get 60 pages all be indexed still with nothing else in it.
The press release piece is probably the easiest piece for Google to measure for original content, because if you think about what happens when a press release is republished, you've got the site template from whichever news site or industry site is going to run it, header/footer, maybe some sidebar and some ads, you have the press release as one contiguous chunk, and that's really it. If Google's going to do page chunking to try to pull out the template, and the header and the footer, and things like that and see what is the core content of the page, that's probably the simplest case for them to do.
If you're interweaving bits of text you got from different places with your own text, customer reviews, things like that, that aren't going to be the same as other sites, then it's much harder for Google to spot.
What might Google be doing to try to decide does this block of text on your page exist on a hundred other sites? There are various techniques like hashing, or there are ways to record a thumbprint vaguely of what the word patterns are and things like that. That's not the hard part. There's lots of talk about the thumbprint and hashing.
The difficult part is if you've got a page that's got content from 12 different places and it's not just all the manufacturer's product content or whatever, you've got you've got your own customer reviews or your own intro sentence at the top, things like that, if you interweave that, that makes it very difficult for Google to go and chunk the page up into meaningful pieces, know when the chunks start and end, and then compare that to what they found on all the other sites that happen to be selling the same product that you've got to put the product description on from your site, etc.
What do you do to win there? You really want to interweave the original content that you've created. That might be your overview, your customers' reviews, things like that, your ratings. Interweave that with the stock text and photos. Break it up a bit. What you don't want is one giant block of text that is exactly the same as that giant block of text that's on the other hundred sites that are selling the same product you're trying to sell.
How might Google be going about measuring whether your site or your page is top heavy? Certainly, if you look at the tools that are built into the Chrome developer tools, Firefox developer tools has similar sorts of things where they can render the whole page there and give you the dimensions and highlight that on top of the page for you. So certainly it's very easy for them to go and render the whole page.
They're not going to read through the HTML and assume the first X number of words is above the fold. No sites render that way any more. So they're going to have to be rendering it to determine above the fold. There's just too much CSS positioning happening today.
So render and measure the pixels. Then how do you know whether it's ads or template or content? Now with a lot of the stuff I'm saying here we don't know absolutely what Google is doing to measure these things, but we can guess and infer based on how we see it behave, what ranks and doesn't, and also just knowing how parsers are written, how crawlers are written, things like that, what's possible.
The simplest way, if I were Google Panda, the way I would decide whether something was content or not is I would see if it was clickable. It's very easy to tell whether a given element there is linked to anything else. This is not going to be a foolproof thing, but your menus are going to be clickable, ads are going to be clickable for sure, navigation buttons are going to be clickable.
There are going to be some false positives with things like photo carousels that may be clickable to advance and things like that. But in general, if you're trying to do a quick and dirty analysis and say what above the fold is content, if you wipe out everything that's clickable and wipe out everything that's white space, you should be left with various blocks around the screen which is probably going to be content. That's probably what they're doing. I pretty much bet on that.
How do you win? First of all, minimize your header. If your header has a lot of white space and things are stacked, that's going to push the content down further on every single page on your site. Look at: Does the width of your main menu bar really have to have that much space above and below it? Has your logo got a lot of white space before the top of the page? Are you putting your share buttons down in a way that pushes everything down? Look for those sorts of things, because a little bit of win there moves a lot of content up the page above the fold on every page of your site.
Another question might be: Okay, so what's above the fold? Obviously, we don't know for sure, but we can guess since the vast majority of people are running browsers that are better than 1280 by 1000, that's probably a good benchmark. If you're analyzing your own site, look at it with 1280 by 1000, and that's most likely about the kind of dimensions that Google's looking at for above the fold.
If you've got great, fabulous original images, then that's probably great content to show the user. If you've got the same product photos that the other hundred websites all have, then not so much.
What's Google likely to be doing? First of all, if you've never played with Google reverse image search, give it a shot. It's incredibly powerful. I do a lot of work in the travel industry, and the problem with the travel industry is if you're brochuring hotels on your site, really your only source for hotel photos, unless you travel to all the destinations and shoot them yourselves, very expensive of course, is you're going to get the hotel's image library.
You could take those images. Maybe they show up as 5 mg photos in TIFF format. You can change them to JPEG. You can shrink them down to maybe 1000 pixels wide from the original 5000. You can do a little sharpening. You can convert the formats. You might change the contrast. You might even overlay some text, save it with a different file name. Google will still spot those.
If you do a reverse image search on a hotel photo from pick any site you want, you'll find hundreds and hundreds and hundreds of other sites that all have the exact same photo. They're all named differently. They have different dimensions. Some are JPEG, some are PNG files, etc.
Google reverse image search is really good. To think that Panda isn't using that to decide whether you have original images I think is crazy. If they're not doing it, they'll be doing it next week. Don't think that just because you renamed a file or cropped it or resized it a little, that you now have an original image. You do not.
Image dimensions are undoubtedly another factor that Google's going to be looking at. Nobody really wants to decide to go to overwater bungalows in Bora-Bora by looking at little tiny, postage stamp size thumbnails. If you've got big thousand pixel wide pictures of these things, that's fabulous content. You've got to expect Panda is going to like that because users are going to like that. Size and originality.
How do you win? Go big. Be original. Okay, you say, "But how do I be original? I've got X number of hundred or thousand products on the site. It all comes from manufacturers. I can't shoot my own photos."
Consider for your major search targets, like category pages, so not necessarily individual product pages but category pages, make up an image that's a collage of some of those other images. Take those pieces, glue them together, use whatever Photoshop kind of software you want, but make up a new image that consists of images that are from the manufacturers of the products in that category, and that can be your new image header for that page. Make that category page, which is probably a better search target for you anyway, rank better.
How's Google going to measure that? Well, this is an interesting issue, because if you look at how YouTube videos are embedded, by default it's with an iframe. If you look at how a lot of the mapping tools are embedded by default, it's with an iframe.
Why is that bad? Let's think about how Google has considered iframe content in the past in terms of links and on page content and things like that. If you iframe it in, Google has been considering it as belonging to the page it was iframed in, not the page that is embedding that content. So the risk you have here is if you're using iframes to embed maps or videos, things like that, not sure that Panda's going to be able to spot that and realize you've got embedded rich content.
Chances are with YouTube, Wistia, Vimeo, and a few like that Google's probably done a little bit of work to try to spot iframed in videos. But you know what? There's a better solution there. With Wistia, you've got the SEO embed type that creates an embed object, not an iframe. YouTube, there's the little checkbox, after you click Share Embed, that says "use old embed code." So you can do that.
The other thing you can consider is where you don't have a video already and you want to add rich content, make an introductory video for a category, for your company, for a product. It can be the same stuff that you've already written as content for that category or about your company, about us, that sort of stuff. Just talk to the camera and do a 30 second introductory video for that category, that product, or read your review out basically from a whiteboard behind the camera. Then use the transcript of that video as that extra text content on the page.
When we talk about maps, I really prefer to use the Google Maps API. It's a JavaScript API. You might have some questions. Can Google follow the JavaScript? Well, I think in the case of maps it's their own product, and certainly Google's interested in knowing whether a page has a map embedded.
If you screenshot a map and then turn it into a JPEG, well that's nice. It's another big image, and it's probably original now or looks original to Google, but it's not that extra rich interactive content that a map is.
My advice is use the Google Maps API. I think they're on Version 3.0. It's actually a lot easier to use, once you've seen an example, than you might think. That seems to work very well for producing that other piece of interactive content.
I've talked a lot here. How much does this work? Links are still very important for ranking. Two or three years ago, I would say links were 80%, 90% of what it took to get something to rank. Panda has changed that in an insane way.
Here's the test example. Go to Google and do a search for best time to visit Tahiti. You'll find my little site, Visual Itineraries, up there at number one for that, ahead of TripAdvisor, Lonely Planet, USA Today, all these other sites. These other sites have between 10,000 and 250,000 domains linking to them. My site has under 100. I rank number one for that.
Now, in case you think okay, yeah, it's internal link anchor text or page title match, things like that, here's the other proof. Do a Google search for "when should I go to French Polynesia." The only word in that that matches the page title or any anchor text is the word "to." It's a stop word, that's not going to count. I'm still like number three or number four on page one, up with all these other guys that have tens or hundreds of thousands of domains linking.
Please click through to my site, because I don't want bounce rate stuff happening, and actually have a look and see what I've done. See the thin header I've got at the top. Have a look at the images I've got in there. Some of them I created by screenshotting Excel charts. I've got embedded video. I've got an embedded Google Map.
There we go. Thanks everybody, and take care.
Today we're going to talk about Panda optimization. We're going to talk about real world things you can do, no general hand waving. We're going to talk about specific tactics you can use. We're going to talk about first of all what does Panda measure, secondly, how might Panda actually go about measuring these factors on your site, and then lastly, what are you going to do to win based on those factors.
What does Panda measure (and what can we do about it)?
To start off, this is the list of the major factors we're going to talk about for Panda: thin or thick content; the issues around duplicate or original content; the top heavy part of the Panda algorithm; how do you come up with fabulous images and how is Panda going to measure how fabulous they are; and rich interactive experience pieces.Thin (thick) content
First of all, thin/thick content. Certainly, a lot of sites got penalized when Panda first came out where the site design had basically broken the content out into a lot of pages with just a few sentences on it. Here we're talking about how much text is there per page? How might Panda actually go about measuring this? This is probably the easiest piece to measure of everything on here. It's very simple programmatically to strip all the HTML tags out and then just do a word count.There was a study done -- I think it was last summer by serpIQ, and there'll be a link to that in the notes -- that showed that for reasonably competitive terms you needed 1,500 to 2,500 words on a page to rank on page 1. They average this over ten or twenty thousand different keyword searches. Stripping out the HTML tags, count the words, what do you have left? Analyze your own pages and see if you're up near that 1,500 mark.
How do we win on that? Well, this is all about size matters. At least 1,500 words, push to 2,000 or 2,500 if you can. Sometimes that may mean going through your site and condensing four or five pages of content all into one page. You might think, well, that might make a giant long page, terrible user experience. But you can solve this with tab navigation so all the content is on the page. When you click a tab, JavaScript changes the CSS style of the various tabs to make one part show versus the other part. Google's going to see everything in all those tabs when they crawl the page, because it's all in the HTML before you click.
Duplicate/original content
The second thing let's talk about is duplicate and original content. Now there's been a ton of stuff written about duplicate content and penalties and how does Google check this, that, and the other.Lately we've seen a bunch of different blog posts from different places talking about press releases and how press releases, well, they're evil. The links don't count. Google didn't spot them all. Google is much better at it than they used to be. But still, if you do a Google search on any e-press release you've done, you generally find if you search the first sentence or so of the press release, you'll find four or five indexed pages containing that. But that's way better than it was 3 years ago when you'd get 60 pages all be indexed still with nothing else in it.
The press release piece is probably the easiest piece for Google to measure for original content, because if you think about what happens when a press release is republished, you've got the site template from whichever news site or industry site is going to run it, header/footer, maybe some sidebar and some ads, you have the press release as one contiguous chunk, and that's really it. If Google's going to do page chunking to try to pull out the template, and the header and the footer, and things like that and see what is the core content of the page, that's probably the simplest case for them to do.
If you're interweaving bits of text you got from different places with your own text, customer reviews, things like that, that aren't going to be the same as other sites, then it's much harder for Google to spot.
What might Google be doing to try to decide does this block of text on your page exist on a hundred other sites? There are various techniques like hashing, or there are ways to record a thumbprint vaguely of what the word patterns are and things like that. That's not the hard part. There's lots of talk about the thumbprint and hashing.
The difficult part is if you've got a page that's got content from 12 different places and it's not just all the manufacturer's product content or whatever, you've got you've got your own customer reviews or your own intro sentence at the top, things like that, if you interweave that, that makes it very difficult for Google to go and chunk the page up into meaningful pieces, know when the chunks start and end, and then compare that to what they found on all the other sites that happen to be selling the same product that you've got to put the product description on from your site, etc.
What do you do to win there? You really want to interweave the original content that you've created. That might be your overview, your customers' reviews, things like that, your ratings. Interweave that with the stock text and photos. Break it up a bit. What you don't want is one giant block of text that is exactly the same as that giant block of text that's on the other hundred sites that are selling the same product you're trying to sell.
Top-heaviness
Let's talk about top heavy, a pretty important part of the Panda algorithm. Mostly when people talk about the top heavy algorithm, the example they give is ads above the fold. But if you actually read what Google said about it when they launched it, the description of what they're trying to solve, it's not really just about ads above the fold. It's about anything that's not content above the fold and your structure of your website pushing that content down, so that when the user lands on your page, they can't get anything useful without scrolling. That's what it's really about.How might Google be going about measuring whether your site or your page is top heavy? Certainly, if you look at the tools that are built into the Chrome developer tools, Firefox developer tools has similar sorts of things where they can render the whole page there and give you the dimensions and highlight that on top of the page for you. So certainly it's very easy for them to go and render the whole page.
They're not going to read through the HTML and assume the first X number of words is above the fold. No sites render that way any more. So they're going to have to be rendering it to determine above the fold. There's just too much CSS positioning happening today.
So render and measure the pixels. Then how do you know whether it's ads or template or content? Now with a lot of the stuff I'm saying here we don't know absolutely what Google is doing to measure these things, but we can guess and infer based on how we see it behave, what ranks and doesn't, and also just knowing how parsers are written, how crawlers are written, things like that, what's possible.
The simplest way, if I were Google Panda, the way I would decide whether something was content or not is I would see if it was clickable. It's very easy to tell whether a given element there is linked to anything else. This is not going to be a foolproof thing, but your menus are going to be clickable, ads are going to be clickable for sure, navigation buttons are going to be clickable.
There are going to be some false positives with things like photo carousels that may be clickable to advance and things like that. But in general, if you're trying to do a quick and dirty analysis and say what above the fold is content, if you wipe out everything that's clickable and wipe out everything that's white space, you should be left with various blocks around the screen which is probably going to be content. That's probably what they're doing. I pretty much bet on that.
How do you win? First of all, minimize your header. If your header has a lot of white space and things are stacked, that's going to push the content down further on every single page on your site. Look at: Does the width of your main menu bar really have to have that much space above and below it? Has your logo got a lot of white space before the top of the page? Are you putting your share buttons down in a way that pushes everything down? Look for those sorts of things, because a little bit of win there moves a lot of content up the page above the fold on every page of your site.
Another question might be: Okay, so what's above the fold? Obviously, we don't know for sure, but we can guess since the vast majority of people are running browsers that are better than 1280 by 1000, that's probably a good benchmark. If you're analyzing your own site, look at it with 1280 by 1000, and that's most likely about the kind of dimensions that Google's looking at for above the fold.
Image fabulosity
Images are certainly rich content. Everybody loves images rather than text. It makes a much more engaging experience. How is Google going to go and measure how fabulous your images are?If you've got great, fabulous original images, then that's probably great content to show the user. If you've got the same product photos that the other hundred websites all have, then not so much.
What's Google likely to be doing? First of all, if you've never played with Google reverse image search, give it a shot. It's incredibly powerful. I do a lot of work in the travel industry, and the problem with the travel industry is if you're brochuring hotels on your site, really your only source for hotel photos, unless you travel to all the destinations and shoot them yourselves, very expensive of course, is you're going to get the hotel's image library.
You could take those images. Maybe they show up as 5 mg photos in TIFF format. You can change them to JPEG. You can shrink them down to maybe 1000 pixels wide from the original 5000. You can do a little sharpening. You can convert the formats. You might change the contrast. You might even overlay some text, save it with a different file name. Google will still spot those.
If you do a reverse image search on a hotel photo from pick any site you want, you'll find hundreds and hundreds and hundreds of other sites that all have the exact same photo. They're all named differently. They have different dimensions. Some are JPEG, some are PNG files, etc.
Google reverse image search is really good. To think that Panda isn't using that to decide whether you have original images I think is crazy. If they're not doing it, they'll be doing it next week. Don't think that just because you renamed a file or cropped it or resized it a little, that you now have an original image. You do not.
Image dimensions are undoubtedly another factor that Google's going to be looking at. Nobody really wants to decide to go to overwater bungalows in Bora-Bora by looking at little tiny, postage stamp size thumbnails. If you've got big thousand pixel wide pictures of these things, that's fabulous content. You've got to expect Panda is going to like that because users are going to like that. Size and originality.
How do you win? Go big. Be original. Okay, you say, "But how do I be original? I've got X number of hundred or thousand products on the site. It all comes from manufacturers. I can't shoot my own photos."
Consider for your major search targets, like category pages, so not necessarily individual product pages but category pages, make up an image that's a collage of some of those other images. Take those pieces, glue them together, use whatever Photoshop kind of software you want, but make up a new image that consists of images that are from the manufacturers of the products in that category, and that can be your new image header for that page. Make that category page, which is probably a better search target for you anyway, rank better.
Interactive experience
Certainly, a more engaging page is one where there's a video to play, or a map you can zoom in on and browse around and see where the hotels are and click on and things like that. Undoubtedly, part of what Panda's doing is measuring your site to say how much fun is there here to play with for the user.How's Google going to measure that? Well, this is an interesting issue, because if you look at how YouTube videos are embedded, by default it's with an iframe. If you look at how a lot of the mapping tools are embedded by default, it's with an iframe.
Why is that bad? Let's think about how Google has considered iframe content in the past in terms of links and on page content and things like that. If you iframe it in, Google has been considering it as belonging to the page it was iframed in, not the page that is embedding that content. So the risk you have here is if you're using iframes to embed maps or videos, things like that, not sure that Panda's going to be able to spot that and realize you've got embedded rich content.
Chances are with YouTube, Wistia, Vimeo, and a few like that Google's probably done a little bit of work to try to spot iframed in videos. But you know what? There's a better solution there. With Wistia, you've got the SEO embed type that creates an embed object, not an iframe. YouTube, there's the little checkbox, after you click Share Embed, that says "use old embed code." So you can do that.
The other thing you can consider is where you don't have a video already and you want to add rich content, make an introductory video for a category, for your company, for a product. It can be the same stuff that you've already written as content for that category or about your company, about us, that sort of stuff. Just talk to the camera and do a 30 second introductory video for that category, that product, or read your review out basically from a whiteboard behind the camera. Then use the transcript of that video as that extra text content on the page.
When we talk about maps, I really prefer to use the Google Maps API. It's a JavaScript API. You might have some questions. Can Google follow the JavaScript? Well, I think in the case of maps it's their own product, and certainly Google's interested in knowing whether a page has a map embedded.
If you screenshot a map and then turn it into a JPEG, well that's nice. It's another big image, and it's probably original now or looks original to Google, but it's not that extra rich interactive content that a map is.
My advice is use the Google Maps API. I think they're on Version 3.0. It's actually a lot easier to use, once you've seen an example, than you might think. That seems to work very well for producing that other piece of interactive content.
I've talked a lot here. How much does this work? Links are still very important for ranking. Two or three years ago, I would say links were 80%, 90% of what it took to get something to rank. Panda has changed that in an insane way.
Here's the test example. Go to Google and do a search for best time to visit Tahiti. You'll find my little site, Visual Itineraries, up there at number one for that, ahead of TripAdvisor, Lonely Planet, USA Today, all these other sites. These other sites have between 10,000 and 250,000 domains linking to them. My site has under 100. I rank number one for that.
Now, in case you think okay, yeah, it's internal link anchor text or page title match, things like that, here's the other proof. Do a Google search for "when should I go to French Polynesia." The only word in that that matches the page title or any anchor text is the word "to." It's a stop word, that's not going to count. I'm still like number three or number four on page one, up with all these other guys that have tens or hundreds of thousands of domains linking.
Please click through to my site, because I don't want bounce rate stuff happening, and actually have a look and see what I've done. See the thin header I've got at the top. Have a look at the images I've got in there. Some of them I created by screenshotting Excel charts. I've got embedded video. I've got an embedded Google Map.
There we go. Thanks everybody, and take care.