PHP Bulletin Board Home
News About Home
Features of phpBB Test drive phpBB Downloads Support for phpBB The phpBB Community Styles for customising phpBB 3rd party modifications to phpBB

Support Home | Knowledge Base Home | Submit Article | Search Articles | Browse Articles
 Google and phpBB 
Description: Find out how to make Google spider your forum.
Author: TC
Date: Tue Oct 29, 2002 5:16 pm
Type: Tutorial
Keywords: sids, google, inktomi, sessions, spiders, robots, agents
Category: Improvements
"Why doesn't Google spider my forum?"

That was the question echoed here many times. The problem is that while Google has no problem with dynamic sites, Google does not like session IDs (cloaking). This is a solution aimed at ensuring that Google does not get any SIDs when it's agents (robots) visit your site.

Code:

##############################################################
## MOD Title: enhance-google-indexing
## MOD Author: Showscout & R. U. Serious
## MOD Description: If the User_agent includes the string 'Googlebot', then no session_ids are appended to links, which will (hopefully) allow google to index more than just your index-site.
## MOD Version: 0.9.1
##
## Installation Level: easy
## Installation Time: 2 Minutes
## Files To Edit: includes/sessions.php
## Included Files: n/a
##############################################################
## Author Notes: There may be issues with register globals on newer
##       PHP version. If you know for sure and also how to fix it post in
##       this thread: http://www.phpbb.com/phpBB/viewtopic.php?t=32328
##
##       Obviously, if someone thinks it's funny to surf around with a
##       user_agent containing Googlebot and at the same time does not
##       allow cookies, he will loose his session/login on every pageview.
##       Should he complain to you, tell him to eat your shorts.
##
##       If you want to add further crawlers look at the appropiate line and
##       feel free to add part of the user_agent which should be _unique_
##       unique to that, so a user is never confused with a bot.
##
##############################################################
## Version History: 0.9.0 initial release, only googlebot
##                         0.9.1 added inktomi (MSN-search/crawler-bot)
##############################################################
## Before Adding This MOD To Your Forum, You Should Back Up All Files Related To This MOD
##############################################################

#-----[ OPEN  ]------------------------------------------
includes/sessions.php

#-----[ FIND ]------------------------------------------
   global $SID;

   if ( !empty($SID) && !preg_match('#sid=#', $url) )

#-----[ REPLACE WITH ]------------------------------------------
   global $SID, $HTTP_SERVER_VARS;

   if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))

#
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------
#
# EoM


If you further wish to restrict Google to use only a single session, implement the following code. This will prevent Google from appearing numerous times on the guest/who-is-online list (and record).

Code:

#################################################################
## MOD Title: GoogleSingleSession (Add-On to enhance-google-indexing )
## MOD Author: - R. U. Serious
## MOD Description: This MOD will give all 'guests' where the useragent
##          contains 'Googlebot' one session (static session_id)
##          Hence it will only appear as a single guest.
##
## MOD Version: 0.9
##
## Installation Level: (easy)
## Installation Time: 5 Minutes
## Files To Edit: includes/sessions.php 
##############################################################

#-----[ OPEN ]------------------------------------------
#
includes/sessions.php

#
#-----[ FIND ]------------------------------------------
#
$session_id = md5(uniqid($user_ip));

#
#-----[ REPLACE WITH ]------------------------------------------
#
# Note: d8ef2eab is one of the googlecrawlbots ips
#
//$session_id = md5(uniqid($user_ip));
global $HTTP_SERVER_VARS;
$session_id = ( !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') ) ? md5(uniqid($user_ip)) : md5(d8ef2eab);


#
#-----[ FIND ]------------------------------------------
#
   else
   {
      $sessiondata = '';
      $session_id = ( isset($HTTP_GET_VARS['sid']) ) ? $HTTP_GET_VARS['sid'] : '';
      $sessionmethod = SESSION_METHOD_GET;
   }


#
#-----[ AFTER ADD ]------------------------------------------
#
   global $HTTP_SERVER_VARS;
   if ( empty($session_id)  && strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') )
   {
      $sessiondata = '';
      $session_id = md5(d8ef2eab);
      $sessionmethod = SESSION_METHOD_GET;
   }


#
#-----[ FIND ]------------------------------------------
#

         if ($ip_check_s == $ip_check_u)

#
#-----[ REPLACE WITH ]------------------------------------------
#

   //      if ( $ip_check_s == $ip_check_u )
         if (($ip_check_s == $ip_check_u) || ($session_id == md5(d8ef2eab)&&(strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot'))))

#
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------
#
# EoM


There are plenty of people using this method currently with great success - you can follow further discussions regarding this here. While this could in theory be applied to any other search robots and/or spiders, Google and Inktomi are easily the most popular engines that we are positive do not like SIDs.

Special thanks to R. U. Serious for the coding and Showscout for the idea, and of course all who tested this and participated in the development.

[oct072002 - edited for typo - see below]
[1106014473 - edited for newer version code]

Username: Password:
News | Features | Demo | Downloads | Support | Community | Styles | Mods | Links | Merchandise | About | Home
 © Copyright 2002 The phpBB Group.