Skip to content Skip to sidebar Skip to footer

Regular Expression Matching Liquid Code

I'm building a website using Jekyll I would like to automatically remove liquid code (and only liquid code) from a given HTML file. I'm doing it in Python using regular expressions

Solution 1:

The regular expression and tokenizer in Python-Liquid might be overkill for your use case, but will handle {% comment %} and {% raw %} blocks, and any multi-line tags. Something a simple regular expression alone would not cope with.

The Writing a tokenizer example from Python's re docs uses the same technique.

You could use python-liquid to filter out Liquid tokens like this.

from liquid.lex import get_lexer
from liquid.token import TOKEN_LITERAL

s = """
<div class="post-preview">
    <div class="post-title">
        <div class="post-name">
            <a href="{{ post.url }}">{{ post.title }}</a>
        </div>
        <div class="post-date">
            {% include time.html %}
        </div>
    </div>

    <div class="post-snippet">
        {% if post.content contains '<!--break-->' %}
            {{ post.content | split:'<!--break-->' | first }}
            <div class="post-readmore">
                <a href="{{ post.url }}">read more-></a>
            </div>
        {% endif %}
    </div>
    {% include post-meta.html %}
</div>
"""

tokenize = get_lexer()
tokens = tokenize(s)
only_html = [token.value for token in tokens if token.type == TOKEN_LITERAL]
print("".join(only_html))

Notice the output preserves newlines outside Liquid markup.

<divclass="post-preview"><divclass="post-title"><divclass="post-name"><ahref=""></a></div><divclass="post-date"></div></div><divclass="post-snippet"><divclass="post-readmore"><ahref="">read more-></a></div></div></div>

Solution 2:

I am not saying that you should but you could use a "dirty" solution like:

{%\ (if)[\s\S]*?{%\ end\1\ %}|
{%.+?%}|
{{.+?}}

This matches all of the template code provided, see a demo on regex101.com. Additional tweaks would be to extend the alternation group (if) or use a parser altogether.

Solution 3:

I actually don't know liquid but I assume that a block of code is always contained in either {} or {{}}. I think it is hard to find a regular expression that will not match something like this:

print"This is my string with a number {}, cool!".format(42)

I would suggest that you open your html files with an editor that highlights all matches of your regular expression. Kate or Geany could do that for example. Than you can semiautomatically replace all the occurences with the empty string. I would however strongly not suggest to replace all the occurences in your code without checking them before. It is too likely that you will replace something important.

I would also use this regular expression:

\{[^\}]*\}|\{\{[^\}]*\}\}

Edit (see comments above):

From what I understand I think you want to have the possibility to have two versions of your code with either liquid or javascript. The only elegant solution that comes to my mind is to use git for that problem. You can create two separate branches of your project, one for liquid and one for javascript. This allows you to work on both versions separately and gives you the ability to switch between the two versions.

Edit 2:

I would also not give up on finding a better solution. You could also repost your question and make your problem more clear to the other users. There is likely a more elegant solution.

Post a Comment for "Regular Expression Matching Liquid Code"