Mind-Manual
Use your Mind Better!
Regular Expressions: How I Hate|Love Thee
August 7, 2008 on 8:33 am | In Tech |I’ve spent a lot of time wrestling with regex for PHP (functions like preg_match, preg_replace, preg_match_all) in the last week. Regular expressions are a very powerful and compact way of searching through strings for certain patterns. They are also the most frustrating thing I’ve ever learned in programming. There are a number of tutorials and reference guides online, but while good for telling you the basic parts, don’t mention how the parts come together. I even looked in some PHP books and was disappointed. I found these two tutorials to be pretty good, but both left out two crucial trouble-shooting tips.
So, here’s my solutions to two major problems I had while learning this stuff:
1. Your expression matches and returns too much.
Say your expression is “/<b>(.*)</b>/” (meaning, capture everything between the <b> tags) and data is “<b>hey</b><b>blah</b>”. Simple as can be, except it captures everything all the way to very last </b>, so returns “hey</b><b>blah”. That’s because in PHP, regex is set to be “greedy” by default. That is, keep going till the very last possible match. The way to fix it is to add a ? in your regex so it now reads: “/<b>(.*?)</b>/”. The ? tells the preceding character to be “lazy” (not greedy), and will work for * and +.
2. Your expression doesn’t match or return anything, even though your expression is ridiculously simple.
I had a lot of trouble with this. You have to remember that the . matches any one character except newline characters. So, if you were using this pattern:
“/<b>(.*?)</b>/”
with this data:
“<b>
hey
</b>
<b>
blah
</b>”
and got no results, it’s because the . doesn’t work across newlines, and thus doesn’t capture anything. Annoying, however there’s a few ways to fix it. I choose to add a modifier to the end of the pattern, so that the regex engine will treat the whole thing as a single line (the output will maintain the newlines, just for regex, it’ll be one line). So, the fixed pattern looks like:
“/<b>(.*?)</b>/s”
Hope this helps.
I’ve been sorta MIA for a while. Like I’ve mentioned before, I left my last job and have been working on some projects and picking up some one-off paying gigs. For example, last week, I transcribed around 25,000 words in about 9 hours. That’s a lot of words to write. As you may have noticed I am no longer posting with my regularity here, and it will stay that way for the forseeable future. I’ll get back to regular posting at some point, probably, but perhaps at a new site or something. Cheers!
I really like comments, so please take a few seconds to leave one. If you enjoyed this post, make sure you subscribe to my RSS feed!

Coc and 5 Dollar Bills
Do you really know yourself?
Are You Working Off Contaminated ToDo Lists?
No Comments yet »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Powered by WordPress with Pool theme design by Borja Fernandez.
Entries and comments feeds.
Valid XHTML and CSS. ^Top^