Category Archives: server scripting

Script to shorten the long urls present in your status or text

I described in my last two posts http://tinyurl.com/62pf6td and  http://tinyurl.com/5tpd9rk how to identify a  url in some text and how to make tiny url api calls. In this post, I describe how I proceeded on to make a simple script in which I could provide as input some text which may contain several urls and then execute the script to identify the urls present and replace them by tinyurls using tinyurl.com api calls.

First, I created a html form to take input text.

<form method="post" action="status.php">
 <p> Status: <br />
 <textarea rows=4 cols=90 name="status" value="">
 </textarea>
 </p>    
 <p> <input type="submit" value="Shorten urls"/> </p>
 </form>

Then, I wrote a function for making the api call to tinyurl.com and replacing the urls .

function tweet_status($status){
 $result = $status;
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 if(!empty($status)){
 if (preg_match_all("/$pattern/",$status,$matches)){
 foreach($matches[0] as $url){
 $short_url = shortURL($url);
 $result = str_replace($url,$short_url,$result);
 }
 }  
 }
 return $result;
}

The code for making the above function call after reading the text from text area element in form is:

$status = trim($status);
$short_status = tweet_status($status);
if(!empty($status)){
 $length = strlen($short_status);
 /* This function call may be skipped as url gets hyperlinked
 by itself when copied to twitter status field */
 $short_status = hyperlink($short_status);
 echo "<p>Shortened status:<br /> $short_status</p>";        
 echo "<p>Characters now: $length</p>";
}

You may have observed there is a call to function hyperlink(). This function call is used to add the hyperlink tag to the url. The code for it is as follows:

function hyperlink($status){
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 if(!empty($status)){
 if (preg_match_all("/$pattern/",$status,$matches)){
 foreach($matches[0] as $url){
 $hyperlink_url = "<a href=\"$url\">$url</a>";
 $status = str_replace($url,$hyperlink_url,$status);
 }
 }
 }
 return $status;
}

You can try out using this simple script present at http://tinyurl.com/4zpvrg7 . You may have noticed that all the urls mentioned in this post are of the form tinyurl.com. All these urls have been obtained by using this script.

This script can be used to update a status containing a few urls, without the user bothering to convert the long urls individually by using a url shortener web service. I shall add a "Tweet button" to this script soon so that the shortened status can be tweeted and hence eliminating the need of user using this script to copy the output text and then paste it in twitter.com.

Using tinyurl.com api calls for shortening long urls

TinyURL is a url shortener service or you may also call it a url crusher service, which takes in long urls and converts them into very short urls. With the increasing popularity of Twitter and a limitation of 140 characters on tweets, the url shortening services have gained popularity over the past few years. There are over 100+ url shortening services now at a user's disposal.

Almost all such services provides an option for developers to make api calls to return a short url given a long url but they also require an account and an API key. However, tinyurl.com doesn't require the user to have an account or an api key.

All one has to do is include the following function into a function.php file which you can include in any other php file where you need to make a call to tinyurl API.

<?php
 function shortURL($url) {
 $ch = curl_init();
 $timeout = 5;
 curl_setopt($ch,CURLOPT_URL,'http://tinyurl.com/api-create.php?url='.$url);
 curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
 curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
 $short_url = curl_exec($ch);
 curl_close($ch);
 return $short_url;
 }
 ?>

To get the short url for a long url, one has to simply call this function as below:

$url = "http://technoflirt.com/tech/2011/01/11/running-php-script-host/";
 $short_url = shortURL($url);
 echo "Actual url =". $url . "<br />";
 echo "Shortened url = <a href=$short_url>$short_url</a>  <br />";

One can also observe how the short url link gets hyperlinked in the last statement.  However, tinyurl.com itself occupies quite a few characters compared to other services such as bit.ly, go.af etc. , so other services are now often used in comparison. The advantage is that it doesn't have a rate limit in the API calls as bit.ly and others.

Writing regular expression(regex) for a web url in php

I use TweetDeck client to access my twitter account and post status updates. While posting a web url in the tweet, if I copy the link to a web url, then first of all it is shortened using a URL shortener service like bit.ly or tiny.cc and then the shortened url becomes hyperlinked.

Naturally, this process can be divided into three parts.

  1. Recognizing there is a web url in the text area and if yes, extracting it.
  2. Calling a url shortener service api to shorten the extracted url.
  3. Replacing the web url text with hyperlinked html text for shortened url.

In this post, I am looking into step 1 of this process that is how can we identify if there is a web url in the text. For matching the presence of a web url in the text, I am using regex matching php function preg_match(). So, the question was to how to write a regular expression for a web url. I observed what kind of web urls the TweetDeck client hyperlinks automatically. I found that it recognizes http://something.com or https://somethingelse.com or www.something.com but doesn't consider something.com as web url.

So, the web url should start with either http:// or https:// or www. . The corresponding regular expression for this is:

$pattern = "(((https?)\:\/\/)|(www\.))";

Now, comes the part of writing the regex for what follows after the scheme definition. The response to this question asked on stackoverflow.com helps us in writing down the regex for this part which is based on the following points:

  • string must start with an ASCII letter or number
  • ASCII letters, numbers, dots and dashes follow (no slashes or colons allowed)
  • optional: a port is allowed (":8080")
  • optional: anything after a slash may follow except space

The regex for this part is concatenated with the first part as described below.

$pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

Now, you can call preg_match("/$pattern/",$text,$result) where $text contains the original text and $result shall contain the result. $result[0] contains the matched string if there is any match. However, preg_match() shall return only the first such $pattern in the $text. If there maybe more than one url in $text, you would like to use preg_match_all() to match all the urls in $text.

Here's the php script which takes input a block of text under quotes and outputs the url matched if any.

<?php
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 $text = $argv[1];
 echo "$text\n";
 if (preg_match_all("/$pattern/",$text,$matches)){
    echo "Match\n";
     foreach($matches[0] as $url)
          echo "URL: $url\n";
 }
else
 echo "Not match\n";
?>

Here are a few sample runs of this script.

  1. php regex.php "tf http://technoflirt.com/tech ac http://technoflirt.com/noflirt"
    tf http://technoflirt.com/tech ac http://technoflirt.com/noflirt
    Match
    URL: http://technoflirt.com/tech
    URL: http://technoflirt.com/noflirt
  2. php regex.php "tf technoflirt.com"
    tf technoflirt.com
    Not match
  3. php regex.php "tf www.technoflirt.com"
    tf www.technoflirt.com
    Match
    URL: www.technoflirt.com

Some useful links I encountered while trying to solve this problem: