Recently we were developing an extension for a client and came across a simple requirement, insert a word after a given number of words in a text. The function is fairly easy, however it get's a bit tricky when the text that you want to insert is html and you have to preserve the html formatting. Ultimately this prompted us to develop our own function to do the trick.
First, let's analyse the problem. PHP has a standard function, str_word_count which allows you to split a given string into 'words'. The function effectively explodes the string based on spaces and allows you to specify aditional charactersets that needs to be considered as a word delimiter. The function then returns an array consisting of all the 'words' with the array key denoting the position of the last character of the word in the original string. The problem here is that the function counts html tags themselves as words and worse, counts the words inside html tags as well. Is there a workaround for this? Yes there is; once you have the result from str_word_count, you can iterate through results array eliminating the html tags and words inside the html tags. You can also use preg_match_all to match the html tags and then offset the differences from str_word_count. But they always add some additional overhead to the computation, the system has to run through loops once or more to come up with the answer.
That is why we developed str_word_count_html, a function that is almost identical to str_word_count but is html aware. The function takes any text as an input and a parameter which instructs the function of the retrurn values. The function will return the number of words, an array consisting of words or an associative array where key is the position of the last letter within the input. Code follows below:
Hope you will find this useful and if you have any questions, you know where to contact us.
Note:
1. Note that the function works with properly parsed html only, the function will not properly work with tags that are not properly closed.
2. The function works with html text as the signs < and > do not appear in their literal form in html, but rather as < and > Unformatted html therefore, may cause problems.
First, let's analyse the problem. PHP has a standard function, str_word_count which allows you to split a given string into 'words'. The function effectively explodes the string based on spaces and allows you to specify aditional charactersets that needs to be considered as a word delimiter. The function then returns an array consisting of all the 'words' with the array key denoting the position of the last character of the word in the original string. The problem here is that the function counts html tags themselves as words and worse, counts the words inside html tags as well. Is there a workaround for this? Yes there is; once you have the result from str_word_count, you can iterate through results array eliminating the html tags and words inside the html tags. You can also use preg_match_all to match the html tags and then offset the differences from str_word_count. But they always add some additional overhead to the computation, the system has to run through loops once or more to come up with the answer.
That is why we developed str_word_count_html, a function that is almost identical to str_word_count but is html aware. The function takes any text as an input and a parameter which instructs the function of the retrurn values. The function will return the number of words, an array consisting of words or an associative array where key is the position of the last letter within the input. Code follows below:
function str_word_count_html($text[,$param]){
// Function to count the number of words in an html text
// Input: $text - the html text
// Input: $param - parameters for the return value 0 - Number of words only, 1 - array containing words, 2 - array containg words and array key is the position of the last letter of the word insde $text
// Return: integer or an array based on $param
$text = trim($text);
$length = strlen($text);
$flag =0;
$lastpos =0;
$words = array();
$wordcount = 0;
for($i=0;$i<$length;$i++){
$letter = substr($text,$i,1);
switch($letter){
case '>':
$flag = 0;
break;
case '<':
$flag =1;
break;
case ' ':
if(!$flag){
$wordcount++;
if($param==1){
$k = $wordcount;
}
elseif($params==2){
$k = $i;
}
$words[$k]= substr($text,$lastpos,$i-$lastpos);
$lastpos = $i+1;
}
break;
}
}
if(!$param){
return $words;
}
else{
return $wordcount;
}
}Hope you will find this useful and if you have any questions, you know where to contact us.
Note:
1. Note that the function works with properly parsed html only, the function will not properly work with tags that are not properly closed.
2. The function works with html text as the signs < and > do not appear in their literal form in html, but rather as < and > Unformatted html therefore, may cause problems.