Skip to content Skip to sidebar Skip to footer

Surrounding All Instances Of @________, #___________, And Http://_________ With Anchor Tags?

Related (but slightly different): Javascript Regex: surround @_____, #_____, and http://______ with anchor tags in one pass? I would like to surround all instances of @_______, #

Solution 1:

In my code I got similar function, you can take a look and change it to fit your needs:

function checkChatUrl($matches)
{
    if(strpos($matches[0],'http://www.xxx.pl/?task=forum')!==false) $n='>forum';
    elseif(strpos($matches[0],'http://www.xxx.pl')!==false) $n='>xxx';
    elseif(strpos($matches[0],'db.php')!==false) return "";
    elseif(strpos($matches[0],'%22')!==false) return "";
    else $n=">".substr($matches[1].$matches[2],0,10).((strlen($matches[1].$matches[2])>10)?'..':'');
    return "<a href='http://$matches[1]$matches[2]' target=_blank $n</a>";
}

$text=preg_replace_callback("/\bhttp:\/\/([\w\.]+)([\#\,\/\~\?\&\=\;\-\w+\.\/]+)\b/i",'checkChatUrl',$text);

This was designed for url links on chat, it makes its name shorter and for some urls uses prepared shortcuts.

Solution 2:

str.replace(
    /(\s|^)([#@])([\w\d]+)|(http:\/\/\S+)/g,
    '$1<a href="$3$4">$2$3$4</a>'
);

Solution 3:

For matching @ and # tags, I'd suggest using the \w metapattern (matches word characters - so it'll match digits and letters, but not whitespace/punctuation). Thus, you'd want something like the following patterns to pull out the matched items:

(@\w+)
(#\w+)

For matching URLs, a simple but naive pattern would be to just match http:// followed by any non-whitespace:

(http://\S+)

However, there are certain characters not valid in URLs that would get captured by this. A more sophisticated pattern that only allows characters which are valid in URLs would be the following:

(http://[a-zA-Z0-9+$_.+!*'(),#/-]+)

Solution 4:

Here is a revised answer based on the revised question. You should actually put the revision/comment on the original question.

It uses 3 patterns for 3 actions and chains them. It uses the word boundary pattern (\b\B) as appropriate instead of (^|\s). This picks up patterns separated by punctuation and no space, eg @tweet,#tag

<scripttype=text/javascript>functionaddTags(str) {
    return str.replace(/\B(@)(\w+)/g, '<a href"//twitter.com=/$2">$1$2</a>')
              .replace(/\B(#)(\w+)/g, '<a href="web#q=$2">$1$2</a>')
              .replace(/\b(http:\S+[^,.])/g, '<a href="$1">$1</a>')
              ;
}
functiontestTags() {
    document.getElementById('outstr').innerHTML =
    document.getElementById('outtxt').innerHTML =
        addTags(document.getElementById('instr').value);
}
</script><inputtype=textsize=100id="instr"value="@begin ignore@email.com and then #cow to http://mysite.com and also http://yoursite.com."><br><p><textareaid="outtxt"cols=90></textarea><pid=outstr></p><p><buttononclick="testTags();">TEST</button>

I tested it with the above.

Solution 5:

One important thing!

Make sure you are aware of the possible risks in doing naive replacement on links.

Do not allow users to insert arbitrary HTML on your site. The name of the XSS game is sanitizing user input. If you stick to a whitelist based approach -- only allow input that you know to be good, and immediately discard anything else -- then you're usually well on your way to solving any XSS problems you might have.

Naïve replacement counts as allowing inserting arbitrary HTML on you site.

At the very least, try to make sure that the resulting <a href=''> does not start with javascipt:, as you'd be open to Cross-Site Request Forgeries.

Post a Comment for "Surrounding All Instances Of @________, #___________, And Http://_________ With Anchor Tags?"