Regex help

Ugly Joe

New member
I'm trying to make a regular expression that will append a path to all relative links on a webpage. That is:

<a href="blah/blah2/page.htm">page<a>

would be replaced with something like:

<a href="/newdir/deeperdir/blah/blah2/page.htm">page<a>

I can find all the links and add the extra path information. My problem is that I can't discern between relative and absolute links. That is, I don't want:

<a href="http://zophar.net">page<a>

to become:

<a href="/newdir/deeperdir/http://zophar.net">page<a>

I've been trying to figure out how to add a "not ://" or "not http|ftp|mailto" clause to the regex, but I can't figure out a way to do it. Can any regex gurus out there lend me hand?
<P ID="signature">_______________________________________
</P>
 
>Can any regex gurus out there lend me hand?
>

This should work, although it's UGLY...

<pre>(?:<[aA] [hH][rR][eE][fF]=")
(?!(?:[hH][tT][tT][pP]|[fF][tT][pP]|[mM][aA][iI][lL][tT][oO]):(?://)?)(.*)(?:">)</pre>

Capture group 1 is your URL that you can substitute.

EDIT: no newlines in the regex, can't beat the forum software
<P ID="signature"><p align="center">
downstairs.gif
</p>
</P><P ID="edit"><FONT class="small">Edited by Crazy_MYKL on 09/08/06 11:24 AM.</FONT></P>
 
> >Can any regex gurus out there lend me hand?

> This should work, although it's UGLY...

Awesome, that showed me what I needed to do <img src=smilies/magbiggrin.gif>

I'm doing a preg_replace in php, searching for:

/((src|href)[ ]*=[ ]*(\"|')((?!([A-Za-z0-9]*:))[^\"']*)(\"|'))/i

and replacing with:

\2=\3/newdir/deeperdir/\4\6

Works like a charm. Replaces relative href's and src's (assuming they used some kind of quotes) and leaves any url with a colon in it alone. Thanks!
<P ID="signature">_______________________________________
</P>
 
Back
Top Bottom