Dup Ver Goto 📝

SimpleTokeniserInPython2

PT2/aw/dev/lang/parser does not exist
To
22 lines, 140 words, 866 chars Page 'SimpleTokeniserInPython2' does not exist.

The aim here is to emulate what happens with the bash. If we have e.g.

the "quick brown" fox' jumps over"'" the" lazy dog.

this should become

["the","quick brown",'fox jumps over" the', "lazy", "dog."]

We use a simple state machine with four states: whitespace, nonwhitespace, singlequote and doublequote. The logic:

From state 'whitespace'
If we are in 'whitespace' and we see a single quote, consume the quote and move to state 'singlequote'.
If we are in 'whitespace' and we see a double quote, consume the quote and move to state 'doublequote'.
If we are in 'whitespace' and we see non-whitespace, start a new token, add the character to it, move to state 'nonwhitespace'.
If we are in 'whitespace' and we see whitespace, consume the whitespace and remain in state 'whitespace'.
...

Source: