Fedora Users — Re: [OT] searching for a regular expression to match strings

Re: [OT] searching for a regular expression to match strings

Date Prev

Date Next

Thread Prev

Thread Next

Date Index

Thread Index

To: "Community assistance, encouragement, and advice for using Fedora." <fedora-list@xxxxxxxxxx>

Subject: Re: [OT] searching for a regular expression to match strings

From: Konstantin Svist <fry.kun@xxxxxxxxx>

Date: Thu, 29 Jan 2009 10:10:05 -0800

In-reply-to: <1233248264.30348.19.camel@xxxxxxxxxxxxxxxxxx>

References: <1233248264.30348.19.camel@xxxxxxxxxxxxxxxxxx>

Reply-to: "Community assistance, encouragement, and advice for using Fedora." <fedora-list@xxxxxxxxxx>

User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20090105 Fedora/2.0.0.19-1.fc8 Lightning/0.9 Thunderbird/2.0.0.19 Mnenhy/0.7.5.0

Christoph Höger wrote:
> Hi,
>
> anyone knows about a (high-performance) regular expression to match
> java-like Strings?
> (e.g. "Hi, World \n this is a \"-quoted string.\n")
>
> I have tested 
>
> ((\\.)|[^"\\])*
>
> which basically does what I want (although capturing too much escape
> sequences). 
>
> The problem is: I've tried jakarta's regexp and java's implementation
> and both run into Stack Overflows for input strings with more than 500
> characters. That is definitely not acceptable as a hard limit for
> tokenizing source code.
>
> Any suggestions?
>   

IIRC from taking a compiler course, tokenizing is usually done with a
lexer like Lex, Flex, JLex, Ragel, and so on (in conjunction with Yacc
or Bison).
They're highly tuned for the task - for example, Ragel creates a state
machine (decision trees), and only needs to scan each character of the
source text once. This is as high performance as you can expect to get
(maybe you can squeeze a cycle or two more by coding in assembly, of
course). See http://www.complang.org/ragel/
I'm not sure exactly how regular expressions work, but I suspect they're
not nearly this robust.

HTH,
Konstantin

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines