Tag Archives: url

Script to shorten the long urls present in your status or text

I described in my last two posts http://tinyurl.com/62pf6td and  http://tinyurl.com/5tpd9rk how to identify a  url in some text and how to make tiny url api calls. In this post, I describe how I proceeded on to make a simple script in which I could provide as input some text which may contain several urls and then execute the script to identify the urls present and replace them by tinyurls using tinyurl.com api calls.

First, I created a html form to take input text.

<form method="post" action="status.php">
 <p> Status: <br />
 <textarea rows=4 cols=90 name="status" value="">
 <p> <input type="submit" value="Shorten urls"/> </p>

Then, I wrote a function for making the api call to tinyurl.com and replacing the urls .

function tweet_status($status){
 $result = $status;
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 if (preg_match_all("/$pattern/",$status,$matches)){
 foreach($matches[0] as $url){
 $short_url = shortURL($url);
 $result = str_replace($url,$short_url,$result);
 return $result;

The code for making the above function call after reading the text from text area element in form is:

$status = trim($status);
$short_status = tweet_status($status);
 $length = strlen($short_status);
 /* This function call may be skipped as url gets hyperlinked
 by itself when copied to twitter status field */
 $short_status = hyperlink($short_status);
 echo "<p>Shortened status:<br /> $short_status</p>";        
 echo "<p>Characters now: $length</p>";

You may have observed there is a call to function hyperlink(). This function call is used to add the hyperlink tag to the url. The code for it is as follows:

function hyperlink($status){
 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 if (preg_match_all("/$pattern/",$status,$matches)){
 foreach($matches[0] as $url){
 $hyperlink_url = "<a href=\"$url\">$url</a>";
 $status = str_replace($url,$hyperlink_url,$status);
 return $status;

You can try out using this simple script present at http://tinyurl.com/4zpvrg7 . You may have noticed that all the urls mentioned in this post are of the form tinyurl.com. All these urls have been obtained by using this script.

This script can be used to update a status containing a few urls, without the user bothering to convert the long urls individually by using a url shortener web service. I shall add a "Tweet button" to this script soon so that the shortened status can be tweeted and hence eliminating the need of user using this script to copy the output text and then paste it in twitter.com.

Writing regular expression(regex) for a web url in php

I use TweetDeck client to access my twitter account and post status updates. While posting a web url in the tweet, if I copy the link to a web url, then first of all it is shortened using a URL shortener service like bit.ly or tiny.cc and then the shortened url becomes hyperlinked.

Naturally, this process can be divided into three parts.

  1. Recognizing there is a web url in the text area and if yes, extracting it.
  2. Calling a url shortener service api to shorten the extracted url.
  3. Replacing the web url text with hyperlinked html text for shortened url.

In this post, I am looking into step 1 of this process that is how can we identify if there is a web url in the text. For matching the presence of a web url in the text, I am using regex matching php function preg_match(). So, the question was to how to write a regular expression for a web url. I observed what kind of web urls the TweetDeck client hyperlinks automatically. I found that it recognizes http://something.com or https://somethingelse.com or www.something.com but doesn't consider something.com as web url.

So, the web url should start with either http:// or https:// or www. . The corresponding regular expression for this is:

$pattern = "(((https?)\:\/\/)|(www\.))";

Now, comes the part of writing the regex for what follows after the scheme definition. The response to this question asked on stackoverflow.com helps us in writing down the regex for this part which is based on the following points:

  • string must start with an ASCII letter or number
  • ASCII letters, numbers, dots and dashes follow (no slashes or colons allowed)
  • optional: a port is allowed (":8080")
  • optional: anything after a slash may follow except space

The regex for this part is concatenated with the first part as described below.

$pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

Now, you can call preg_match("/$pattern/",$text,$result) where $text contains the original text and $result shall contain the result. $result[0] contains the matched string if there is any match. However, preg_match() shall return only the first such $pattern in the $text. If there maybe more than one url in $text, you would like to use preg_match_all() to match all the urls in $text.

Here's the php script which takes input a block of text under quotes and outputs the url matched if any.

 $pattern = "(((https?)\:\/\/)|(www\.))";
 $pattern .= "[A-Za-z0-9][A-Za-z0-9.-]+(:\d+)?(\/[^ ]*)?";

 $text = $argv[1];
 echo "$text\n";
 if (preg_match_all("/$pattern/",$text,$matches)){
    echo "Match\n";
     foreach($matches[0] as $url)
          echo "URL: $url\n";
 echo "Not match\n";

Here are a few sample runs of this script.

  1. php regex.php "tf http://technoflirt.com/tech ac http://technoflirt.com/noflirt"
    tf http://technoflirt.com/tech ac http://technoflirt.com/noflirt
    URL: http://technoflirt.com/tech
    URL: http://technoflirt.com/noflirt
  2. php regex.php "tf technoflirt.com"
    tf technoflirt.com
    Not match
  3. php regex.php "tf www.technoflirt.com"
    tf www.technoflirt.com
    URL: www.technoflirt.com

Some useful links I encountered while trying to solve this problem: