Jul 15 2010

Regex for URL grabbing from HTML content

Category: Code @ 18:22

Something I had to work on recently, which I wont go into, but the project had to grab HREF properties from A tags inside a mass of HTML, but with a 100% success rate no matter how poorly structured the HTML was.  I think this pretty much covers it, except for one small case..  If you were to not use quotes on your href, and you made up your own properties for the A tag, AND you had actual spaces in the href, it may end up missing the end of the URL off.  Otherwise, bullet-proof.

 "href+ ?=+ ?(?:(?:(?:""|')(.+?)(?:""|'))|(.+?)(?: class ?=| onclick ?=| id ?=| accesskey ?=| dir ?=| ltr ?=| lang ?=| style ?=| tabindex ?=| title ?=| onblur ?=| ondblclick ?=| onfocus ?=| onmousedown ?=| onmousemove ?=| onmouseout ?=| onmouseover ?=| onmouseup ?=| onkeydown ?=| onkeypress ?=| onkeyup ?=|>))"

Tags: , ,

Comments

1.
fatty fatty United Kingdom says:

Love youSmile

2.
tonyenkiducx tonyenkiducx United Kingdom says:

I know, you're hot for regex.  I'll code you up an algorithm later ;)

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading