Tag Archives: php

Script to shorten the long urls present in your status or text

I described in my last two posts http://tinyurl.com/62pf6td and  http://tinyurl.com/5tpd9rk how to identify a  url in some text and how to make tiny url api calls. In this post, I describe how I proceeded on to make a simple script in which I could provide as input some text which may contain several urls and then execute the script to identify the urls present and replace them by tinyurls using tinyurl.com api calls.

First, I created a html form to take input text.

<form method="post" action="status.php">
 <p> Status: <br />
 <textarea rows=4 cols=90 name="status" value="">
 </textarea>
 </p>    
 <p> <input type="submit" value="Shorten urls"/> </p>
 </form>

Then, I wrote a function for making the api call to tinyurl.com and replacing the urls .

function tweet_status($status){
 $result = $status;
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 if(!empty($status)){
 if (preg_match_all("/$pattern/",$status,$matches)){
 foreach($matches[0] as $url){
 $short_url = shortURL($url);
 $result = str_replace($url,$short_url,$result);
 }
 }  
 }
 return $result;
}

The code for making the above function call after reading the text from text area element in form is:

$status = trim($status);
$short_status = tweet_status($status);
if(!empty($status)){
 $length = strlen($short_status);
 /* This function call may be skipped as url gets hyperlinked
 by itself when copied to twitter status field */
 $short_status = hyperlink($short_status);
 echo "<p>Shortened status:<br /> $short_status</p>";        
 echo "<p>Characters now: $length</p>";
}

You may have observed there is a call to function hyperlink(). This function call is used to add the hyperlink tag to the url. The code for it is as follows:

function hyperlink($status){
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 if(!empty($status)){
 if (preg_match_all("/$pattern/",$status,$matches)){
 foreach($matches[0] as $url){
 $hyperlink_url = "<a href=\"$url\">$url</a>";
 $status = str_replace($url,$hyperlink_url,$status);
 }
 }
 }
 return $status;
}

You can try out using this simple script present at http://tinyurl.com/4zpvrg7 . You may have noticed that all the urls mentioned in this post are of the form tinyurl.com. All these urls have been obtained by using this script.

This script can be used to update a status containing a few urls, without the user bothering to convert the long urls individually by using a url shortener web service. I shall add a "Tweet button" to this script soon so that the shortened status can be tweeted and hence eliminating the need of user using this script to copy the output text and then paste it in twitter.com.

Writing regular expression(regex) for a web url in php

I use TweetDeck client to access my twitter account and post status updates. While posting a web url in the tweet, if I copy the link to a web url, then first of all it is shortened using a URL shortener service like bit.ly or tiny.cc and then the shortened url becomes hyperlinked.

Naturally, this process can be divided into three parts.

  1. Recognizing there is a web url in the text area and if yes, extracting it.
  2. Calling a url shortener service api to shorten the extracted url.
  3. Replacing the web url text with hyperlinked html text for shortened url.

In this post, I am looking into step 1 of this process that is how can we identify if there is a web url in the text. For matching the presence of a web url in the text, I am using regex matching php function preg_match(). So, the question was to how to write a regular expression for a web url. I observed what kind of web urls the TweetDeck client hyperlinks automatically. I found that it recognizes http://something.com or https://somethingelse.com or www.something.com but doesn't consider something.com as web url.

So, the web url should start with either http:// or https:// or www. . The corresponding regular expression for this is:

$pattern = "(((https?)\:\/\/)|(www\.))";

Now, comes the part of writing the regex for what follows after the scheme definition. The response to this question asked on stackoverflow.com helps us in writing down the regex for this part which is based on the following points:

  • string must start with an ASCII letter or number
  • ASCII letters, numbers, dots and dashes follow (no slashes or colons allowed)
  • optional: a port is allowed (":8080")
  • optional: anything after a slash may follow except space

The regex for this part is concatenated with the first part as described below.

$pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

Now, you can call preg_match("/$pattern/",$text,$result) where $text contains the original text and $result shall contain the result. $result[0] contains the matched string if there is any match. However, preg_match() shall return only the first such $pattern in the $text. If there maybe more than one url in $text, you would like to use preg_match_all() to match all the urls in $text.

Here's the php script which takes input a block of text under quotes and outputs the url matched if any.

<?php
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 $text = $argv[1];
 echo "$text\n";
 if (preg_match_all("/$pattern/",$text,$matches)){
    echo "Match\n";
     foreach($matches[0] as $url)
          echo "URL: $url\n";
 }
else
 echo "Not match\n";
?>

Here are a few sample runs of this script.

  1. php regex.php "tf http://technoflirt.com/tech ac http://technoflirt.com/noflirt"
    tf http://technoflirt.com/tech ac http://technoflirt.com/noflirt
    Match
    URL: http://technoflirt.com/tech
    URL: http://technoflirt.com/noflirt
  2. php regex.php "tf technoflirt.com"
    tf technoflirt.com
    Not match
  3. php regex.php "tf www.technoflirt.com"
    tf www.technoflirt.com
    Match
    URL: www.technoflirt.com

Some useful links I encountered while trying to solve this problem:

Running a php script with compatibility to php4/php5 installed on host machine

I had to write a php script to do some processing on xml input. I executed the code using this line:

php <script_name>.php

The code was working fine on Mac OSX operating system. So, I checked in the code and later I checked out the code on a linux machine using RHEL 4.6 operating system. When I gave the same command on my linux machine, it showed errors like unknown function. So, I checked the php versions installed on Mac OSX  machine and RHEL machine using the command below.

php -v

The output on Mac OSX was:

PHP 5.3.2 (cli) (built: Aug  7 2010 00:04:41)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies

The output on RHEL machine was:

PHP 4.3.9 (cgi) (built: Sep 12 2007 11:09:31)
Copyright (c) 1997-2004 The PHP Group
Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies

So, the difference in the PHP version was the reason why the code written on Mac OSX which was actually a PHP5 script didn't run on the RHEL machine which had PHP4 installed. So, I had to do some modifications to the original PHP5 script to create a new PHP4 script which was now running as desired on RHEL machine.

I wanted to run the php script oblivious of the php version installed on the machine on which I am running it.  So, I had to write a shell script which extracted the php version installed on the machine and then select the php script which was compatible with the php version installed on this machine. The script I wrote is below.

# Variables to read php version
php4_version=`php -v | grep "PHP 4"`
php5_version=`php -v | grep "PHP 5"`

# One of the variables length should be exclusively greater than 0
if [ ${#php4_version} -gt 0 ]
then
 file="<php4_compatible_script>.php"
 elif [ ${#php5_version} -gt 0 ]
then
 file="<php5_compatible_script>.php"
else
  echo "Error: Unexpected php version"
fi
# Executing the php script
php $file

The script works by extracting the version from 'php -v' using 'grep' command. On a php4 version machine, php5_version variable would be empty and on a php5 version machine, php4_version variable would be empty. Using this notion, we check on the length of the variable to select the php script which would be compatible with the machine. Then in the last step we execute that php script.

Do let me know of better ways of achieving this result by leaving behind a comment.