How to Automatically Linkify Text with PHP Regular Expressions

How to Automatically Linkify Text with PHP Regular Expressions

Good software enables us to take a lot of niceties for granted. Intelligent interfaces handle all the simple tasks so that we don’t need to worry about them. For example, when I type “www.desktopped.com” into an email or an instant message, I expect that it will be clickable on the other end without having to manually add in HTML tags. Another example is parsing text from a twitter feed. For example, “@desktopped is a blog about the #computers“, we expect both @desktopped and #computers to be links.

The ability to “linkify” text is a great tool to have when developing a blog or website. Possible uses include:

  • Turning URLs clickable in content, comments, and anywhere else
  • Turning valid email addresses clickable
  • Turning twitter text clickable so that @desktopped, #computers, www.desktopped.com all become links.

To search a string for patterns, such as strings that begin with “http://” or “@” is an ability that can be applied in almost endless ways to improve the way we process and display data.

How can we do this? The best way with PHP is to use a universal pattern matching syntax called regular expressions and some useful PHP functions.

Regular Expressions Basics

A regular expression is a pattern string that represents a set of strings by using a variety of special characters.

The Basic Special Characters

  • | connects two possible values and will turn up a match if the string matches either. For example hi|hello matches the strings “hi” and “hello”
  • () are used to group values and set order of operations. For example, br(i|y)an will match both “brian” and “bryan”.
  • [] are used to match a single character that appears inside the brackets. [abc] will match “a”, “b”, or “c”, but not “d”.
  • * will turn up a match if there is zero or more of the preceding element. The string go*gle will match “ggle”, “gogle”, “google”, “gooogle”, etc.
  • + will turn up a match if there is one or more of the preceding element. The string go+gle will match “gogle”, “google”, gooogle”, etc.
  • ? will turn up a match if there is zero or one of the preceding character. The string desktopp?ed will match both “desktopped” and “desktoped”.

Other Common Special Characters

  • \w will match a “word” character, which translates to any character alphanumeric or ‘_’
  • \n \r and \t will match a new line, carriage return and tab respectively.

A full reference for special characters can be found here:

http://www.php.net/manual/en/function.preg-replace.php#89364

PHP Function: preg_replace

The preg_replace function in PHP will take a regular expressions pattern, a replacement string, and the text to be examined as arguments. It will check the input text against the pattern and then if there’s a match it will place certain pieces of the input text into the replacement string.

The pieces that are placed into the replacement string are determined by what is in parenthesis in the pattern string. They are then referenced in the replacement string by using $0, $1, $2, etc., where the $n matches the nth parenthesized pattern.

A Simple Example

The code will output “Brian is a pretty cool guy.” If $text was “My name is Zach”, the output would be “Zach is a pretty cool guy.” If $text was “My name is Nick”, there’d be no match and the original text would be returned; “My name is Nick”.

Useful Regex Functions

This function will turn all URLs in a body of text into clickable links

This function will turn all pound signs (#) and at-sign (@) into hash tag and @reply links in a twitter feed.

This function finds strings in your post body that you’ve identified with the pattern :tagname: and turns them into tag searches on your blog. For example: “This post is about :PHP:.” will result in “The post is about PHP“.

This function will highlight search terms in search result titles on your WordPress blog. Pass an array of keywords and it will do the rest. (Must be used inside the loop)

The function will take any string (usually a page title) and generate a URL slug.

Further Reading

Regular Expressions Resource:
http://www.regular-expressions.info

PHP pre_replace Manual:
http://php.net/manual/en/function.preg-replace.php

Posted Saturday, May 29th, 2010 · Back to Top

SPONSOR

Add Comment

31 Comments 8 Mentions

  1. Jaemi Author Editor

    I cannot believe this -just- came up in my twitter feed…..I spent a couple hours tonight trying to figure out how to do this….then when i found out the code, I still had to figure out how to implement it correctly… Really good post.

    ·

  2. Markus Thömmes Author Editor

    Hi, found a little error:

    “pound signs (#) and ampersands (@)”

    Isn’t that a number sign and an at-sign? I think the ampersand is something like this “&”.

    If I’m wrong, just ignore this.

    Just another cool and helpable post of this brilliant blog. I really like, when your name is highlightet in my feed reader ;).

    ·

  3. Dalesh Kowlesar Author Editor

    Markus, I believe that you are right

    ·

  4. Matthew Author Editor

    I might be mistaken about the intention of some regex’es, but shouldn’t there be some html insertion taking place in the twitter_it, link_it and tag_it functions?
    As far as I can see now, these just mainly recognize the presence of a string matching the pattern and perform near to no modification of the text.

    ·

  5. Brian Muse Author Editor

    @Markus
    Thanks for catching that typo, fixed it.

    @Matthew
    Some of the code was stripped and mis-formatted when copied into this post. I’ve just gone back in an manually fixed all the code. Should be working now. Thanks for catching this!

    ·

  6. Rilwis Author Editor

    Very useful regex patterns.

    In the last example, I think WordPress does better. It checks for utf-8 string and then replaces all unnecessary characters. The function is “sanitize_title_with_dashes” in formatting.php file.

    ·

  7. Zack Hovatter Author Editor

    Very nice! The slug function is great for one of my dynamic content projects. Actually, all of them are! :P

    ·

  8. Matthew V Author Editor

    Typo:

    “\w will match and “word” character”

    maybe

    “\w will match a “word” character”?

    ·

  9. Rakesh Solanki Author Editor

    Before apply this tricks i was still confuse with coding, but now these are working. Thanks Brian

    ·

  10. WebpageLottery Author Editor

    Accidently found what I am looking for (create_slug function). Thank you.

    ·

  11. Thomas Craig Consulting Author Editor

    Great post, very helpful. Can’t wait to implement and expand on some of your examples.

    ·

  12. Wyatt Author Editor

    If you use Ruby, I wrote some Rack middleware that linkifies text: http://rubygems.org/gems/rack-linkify

    ·

  13. billythekid Author Editor

    Great. I’d previously been using eregi_replace() for this task but with it’s impending deprecation knew I’d have to switch it to preg. Glad you posted this! Thanks. ;oD

    ·

  14. Binary Spectrum Author Editor

    Thanks for Useful post..

    ·

  15. Z Author Editor

    Anyway to approach this and not have it replace image tags? For example, if you have an <img src= it will break it.

    ·

  16. Jon Author Editor

    Thanks for this! I’d been looking for a decent regex for URLS for quite a while but hadn’t quite managed to get my head around it :)

    ·

  17. petar Author Editor

    thanks a ton, finaly a decent and understandable example!

    ·

  18. ectopmall Author Editor

    Back in March, we introduced wholesale nfl jerseysyou to the Nike 78 project.

    ·

  19. jean Author Editor

    hi i like this post, but unfortunately it doesn’t work on my blog, Sorry but I add it on theme functions.php file on my wordpress blog but it doesn’t work, could you please tell me where should I have to place this function to make it works on wordpress. sorry for my ignorance.

    ·

  20. ridgerunner Author Editor

    Your third URL linkify regex has a really nasty subexpression which goes super-linear into the world of catastrophic backtracking (not a place you want to be). Here is the bad part:

    ([\w\-\.]+)+

    Works fine when there is a match. The problem comes when testing against a string which does NOT match. Watch your server go into an infinite loop testing all the possible combinations for a long subject string before it can claim match failure. (This can take a long, LONG time!)…

    For example: Lets say your string is 10 chars long. With your 2^n super-linear expression, The possible combinations is 2^10 or 1024. No problem. But when your subject string gets a bit longer, say: 80 chars, then the number of combinations to be tested goes up exponentially to 2^80 which is approximately 1.2E24 or 1,200,000,000,000,000,000,000,000 – that’s twelve hundred billion trillion possibilities! Not in our lifetimes.)

    p.s. For a more robust regex, you may want to take a look at my LinkifyURL project over on Github.

    ·

  21. Cedrick Author Editor

    How would this work if you have something like this?

    EXAMPLE

    BECOMES
    <img source='http://www.stuff.com‘ alt=’http://www.stuff.com/images/image.jpg’>

    ·

  22. pkwebmarket Author Editor

    thanks a lot to share this,
    Its really very easy and understandable example! :)

    ·

  23. preg_replace Author Editor

    Really helpful, resolve my problem , thx!

    ·

  24. Beats Headphones Sale Author Editor

    Hi. I treasured to drop you a quick notice to impart my thanks. We have been observing your blogging site for a month or so and have picked up a heap of excellent data in addition as appreciated the best way you have got structured your internet site. I’m trying to run my very own site even so I feel its way too normal and that i want to concentrate on more compact subjects.

    ·

  25. Joaquim Homrighausen Author Editor

    Doesn’t seem to allow for commas (,) in URL:s

    ·

  26. Karthikeyan Author Editor

    Great Post.. Really useful to me :)

    Thanks for sharing this

    ·

  27. Juan Carlos Author Editor

    Fantastic post, thank you Brian!

    ·

  28. loui vuitton outlet Author Editor

    Targeted visitors in Interstate 217 has been disrupted following cave-ins and also loui vuitton outlet landslides had been claimed following the quake. Concerning One-hundred-twenty people today along with a range of cars or trucks were being stranded.

    ·

  29. Steven Waters Author Editor

    These regular expressions were extremely helpful for linkifying everything using the new Twitter API. Thank you!

    ·

  30. Zach Smith Author Editor

    great post – subscribing to rss now :)

    ·

  31. GaryM Author Editor

    Oh sure, it works HERE…

    The only thing I guess I’m doing differently is stripping dangerous tags out of the text before letting the linkify function work on it.

    ·

 

Build Internet by One Mighty Roar. Since 2008.