As websites grow bigger and bigger, it is useful to add a search engine that allows users to search the entire website for particular keywords.
Many free search scripts exist on the web but few really explain the search logic in detail or have certain restrictions that don't always match your productive server landscape.
In this section a very simple search function is described in detail, ready to be re-used. The web technologies used are: PHP, Ajax and Javascript.
Logic of the search program in 6 steps
- Create a search form in HTML allowing the user to enter a search string
A simple search form in HTML consists at least of a text field <input type=text and a button <input type=button as shown below.
<form name="frmSearch">
<input type="text" name="searchstring" size=20>
<input type="button" name="search" value="Search" onClick="Search()">
</form> - The actual search will be realized in PHP. Generally, when calling a PHP script from a form, the results will overwrite the calling HTML page.
To avoid this behaviour, AJAX technology can be used. AJAX (asychronous javascript and XML) allows PHP scripts to be called from Javascript.
Moreover, the PHP results can be passed back to Javascript and presented on the same HTML page from which the search was initiated.
The following code shows how AJAX reads the searchstring, calls the PHP script and finally manages the returned PHP result:
<script language="Javascript">
var http = new XMLHttpRequest();
function Search() {
var query = document.frmSearch.query.value;
var urlsite = "search.php";
var params = "query=" + query + "&search=Go";
http.open("GET", urlsite + "?" + params, true);
http.onreadystatechange = useHttpResponse;
http.send(null);
}
function useHttpResponse() {
if (http.readyState == 4) {
var textout = http.responseText;
document.getElementById("content").innerHTML = textout;
}
}
</script>
First an XMLHttpRequest object called http is created which is the foundation for AJAX. When the user presses the Search button, the Javascript function Search() is called. In this function, the PHP search script is called using http.open("GET", url, true). The url is the path to the PHP search script including the search parameters provided with the ? sign. (e.g. http://www.alexanderzunic.de/search.php?searchstring=Hello&search=Search). The XMLHttpRequest object has a special property called onreadystatechange. Onreadystatechange stores the function that will process the response from the server. Each time the readyState changes then our onreadystatechange function executes. When the property readyState is 4 that means the response is complete and we can get our data. The readystate property is processed in function useHttpResponse(). The property responseText contains the result of the PHP search script. This output can now be written to any HTML output object like a DIV.
- Read the search string from the form
define("OUTPUT_LENGTH", 500);
// Define arrays for saving file information
$title = Array();
$link = Array();
$content = Array();
$size = Array();
$fileext = Array();
// Read searchstring from form and start search
if(isset($_GET["search"])) {
$searchstring=$_GET["query"];
$number_of_files = read_files_to_search();
search_for_query($searchstring, $number_of_files);
}
- Perpare the searchable text by looping through all files in the defined search folders
function read_files_to_search() {
global $title;
global $link;
global $content;
global $size;
global $fileext;
$i=0;
foreach (glob("*.*") as $filename) {
if (pathinfo($filename, PATHINFO_EXTENSION) == "htm") {
$text = file_get_contents($filename);
$title[$i] = get_title($text);
$link[$i] = get_path($filename);
$content[$i] = strip_html_tags($text);
$size[$i] = filesize($filename);
$fileext[$i] = pathinfo($filename, PATHINFO_EXTENSION);
if ($title[$i] == "") {
$title[$i] = $link[$i];
}
$i++;
}
}
return $i;
}
PHP function glob() finds all files matching with the search pattern (in this case *.*, i.e. all files). A foreach loop runs through all files which are stored in variable $filename. Next, function pathinfo($filename, PATHINFO_EXTENSION) checks $filename's file extension. Only if the file has extension .htm, it is considered further.
PHP function file_get_contents() reads the contents of $filename which is stored in variable $text. The file's attributes title, link, contents, size and extension are stored in arrays. The title is extracted with function get_title():
function get_title($text) {
$res = preg_match("/<title>(.*)<\/title>/", $text, $title_matches);
$title = $title_matches[1];
return $title;
}
Using PHP function preg_match(), a regular expression is defined and searched in $text. The regular expression looks for text between the <title>...</title> tags. The text between the tags is stored in results array $title_matches. The first element of $title_matches is passed to variable $title which is returned back to the main function.
The file's full path name is extracted with function get_path():
function get_path ($filename) {
$dir = dirname(__FILE__)."/".$filename;
$dir = str_replace("/home/www/htdocs/alexanderzunic.de", "", $dir);
return $dir;
}
PHP function dirname(__FILE__) returns the current directory's full path name and passed to variable $dir. Additionally, the file's name is appended. Using PHP function str_replace() the server's root directory name (in this case /home/www/htdocs/alexanderzunic.de) is deleted to extract only the relative path.
Next, the file's content (excluding all HTML tags) is extracted using function strip_html_tags():
function strip_html_tags($txt) {
$gen_tags = Array("head", "style", "script", "form");
// Replaces all tags which contain text that should not be searched
foreach ($gen_tags as $tag) {
$txt = preg_replace("/<".$tag."(.*)>(.|\s)*?<\/".$tag.">/","",$txt);
}
$txt = strip_tags($txt);
return $txt;
}
First tags containing text that should not be searched are removed using PHP function preg_replace(). PHP function strip_tags() removes all HTML tags from the text.
The file's size is determined with PHP function filesize(). If no file title can be found, the title is set to the file's link.
- Search for searchstring by looping through the file content array
The searchable file contents is stored in array $content. Function search_for_query() is now used to search for the searchstring in the contents array:
function search_for_query($q, $n) {
global $title;
global $link;
global $content;
global $size;
global $fileext;
$output="";
$result=0;
$pattern="/".$q."/";
if ($n > 0) {
for($j=0; $j<$n; $j++) {
if (preg_match($pattern, $content[$j])) {
$result++;
$output .= "<a href='".$link[$j]."' class='searchtitle'>";
$output .= $title[$j]."</a></b><br>";
$output .= "<span class='searchresult'>";
$output .= substr($content[$j], 0, OUTPUT_LENGTH)."...</span><br>";
$output .= "<span class='resultlink'>www.alexanderzunic.de".$link[$j]."</span>";
$output .= "<span class='searchresult'> - Filesize: ".round((float)$size[$j]/1000,0)." kB</span>";
$output .= "<br><br>";
  }
  }
}
print_results($result, $output);
}
The function checks if the number of searchable files is larger than 0. Next it loops through all files and determines whether the searchstring is found. This is done using PHP function preg_match() which searches for $pattern (i.e. the searchstring) with the content array. If the search was successful the results counter is increased and the file's link, title, content and size is passed in HTML format to an output variable. The output of the file's content is limited to 500 characters (as defined in constant OUTPUT_LENGTH).
- Output search results with echo
Finally the search results (including the number of hits) are printed.function print_results($r, $o) {
$txt = "<html>";
$txt .= "<p class='resultpages'>".$r." result pages were found </p><br>";
$txt .= $o;
$txt .= "</html>";
echo $txt;
}

