tags: #pcre #perl #regex There is a library called pcre that allows to embed these regex's into any C program. # Cheat Sheet 1 Summarised from: https://perlmaven.com/regex-cheat-sheet ## Character Classes ```plaintext [bgh.] One of the characters listed in the character class b,g,h or . in this case. [b-h] The same as [bcdefgh]. [a-z] Lower case Latin letters. [bc-] The characters b, c or - (dash). [^bx] Complementary character class. Anything except b or x. \w Word characters: [a-zA-Z0-9_]. \d Digits: [0-9] \s [\f\t\n\r ] form-feed, tab, newline, carriage return and SPACE \W The complementary of \w: [^\w] \D [^\d] \S [^\s] [:class:] POSIX character classes (alpha, alnum...) \p{...} Unicode definitions (IsAlpha, IsLower, IsHebrew, ...) \P{...} Complementary Unicode character classes. ``` ## Quantifiers ```plaintext Greedy a? 0-1 'a' characters a+ 1-infinite 'a' characters a* 0-infinite 'a' characters a{n,m} n-m 'a' characters a{n,} n-infinite 'a' characters a{n} n 'a' characters Minimal a+? a*? a{n,m}? a{n,}? a?? a{n}? ``` ## Other ```plaintext | Alternation ``` ## Grouping and Capturing ```plaintext (...) Grouping and capturing \1, \2, \3, \4 ... Capture buffers during regex matching $1, $2, $3, $4 ... Capture variables after successful matching (?:...) Group without capturing (don't set \1 nor $1) ``` ## Anchors ```plaintext ^ Beginning of string (or beginning of line if /m enabled) $ End of string (or end of line if /m enabled) \A Beginning of string \Z End of string (or before new-line) \z End of string \b Word boundary (start-of-word or end-of-word) \G Match only at pos(): at the end-of-match position of prior m//g ``` ## Modifiers ```plaintext /m Change ^ and $ to match beginning and end of line respectively /s Change . to match new-line as well /i Case insensitive pattern matching /x Extended pattern (disregard white-space, allow comments starting with #) ``` ## Extended ```plaintext (?#text) Embedded comment (?adlupimsx-imsx) One or more embedded pattern-match modifiers, to be turned on or off. (?:pattern) Non-capturing group. (?|pattern) Branch test. (?=pattern) A zero-width positive look-ahead assertion. (?!pattern) A zero-width negative look-ahead assertion. (?<=pattern) A zero-width positive look-behind assertion. (?pattern) A named capture group. \k \k'NAME' Named backreference. (?{ code }) Zero-width assertion with code execution. (??{ code }) A "postponed" regular subexpression with code execution. ``` # Examples ## ffprobe language list This generates a list of what languages are available in a list of `.mp4` files. The `-P` option in grep indicates to use pcre's. ```bash ffp *.mp4 | grep -P -o '(?<=\()...(?=\)).*Audio:' | sort | uniq ``` To break the regex down: ```plaintext (?<=\() # ensures that opening paren \( precedes what is matched (look-behind assertion) ... # any three characters (?=\)) # ensures that closing paren \) follows what is matched ``` and then the `-o` option means to only print out what is matched, not the rest of the lines containing the matches. ## Perl as alternative to sed Instead of ```bash cat files | sed 's/pattern/replacement/g' ``` we can do ```bash cat a.php | perl -pe 's/(e(\w))/$1$2$2$1<$2,$1>/' ``` and this gives us access to the full power of Perl's regular expressions, amongst other things. Also ```bash cat a.php | perl -pe 'tr/[a-z]/[A-Z]/;s/\W/_x_/;' cat a.php | perl -e 'for() { tr/[a-z]/[A-Z]/;s/\W/_/g;print; }' ``` note that the `-pe 'expression` option, the `-e` allows code to be send in via the command line, and `-p` causes that code to be wrapped in ```perl for() { YOUR_CODE; print; } ``` where this is for `perl -pe 'YOUR_CODE'`.