Dup Ver Goto 📝

HTML Script Css Inliner

PT2/lang/python/misc does not exist
To
126 lines, 545 words, 3871 chars Page 'HtmlScriptCssInliner' does not exist.

This is a simple script I wrote, for my music exercise generators here, which take all referenced <script src=...></script> and <link rel="stylesheet" href="..."/> tags and replace them with inline <script> and <style> elements. Not particularly robust, but served my needs. The results are here, which are then single self-contained HTML files which are easier to save and use offline. (And save and use offline is the main purpose of doing this: no need for a _files directory to move around with the .html file.)

Why Hashes

Yes, this is a bodge. If you want to 'write it properly', feel free. This shows the essential bits and pieces necessary. One thing that may seem odd is the use of hashlib: this ensures that when iteratively searching and replacing <script src...> with its inline version, if e.g. a <script src...> string appears in the replacement string, it isn't subsequently replaced. The assumption here is that the hash hex digest string won't show up in any of the files.

To illustrate, suppose we are doing the following replacement:

hello => hello world
world => mr flibble

and we start with the string hello. If we apply these in a top-to-bottom order, we get

hello
hello world
hello mr flibble

but if we apply them in a bottom-to-top order, we get

hello
hello world

in this first case the world in the replacement of hello by hello world, is then replaced by mr flibble. We don't want that. Doing two passes, one replacing what is to be replaced by its hex digest, and then another pass replacing the hex digests, avoids this issue, provided the hex digest doesn't appear in one of the replacement texts. This is a fair assumption, and should it go wrong, we can just sugar what gets hashed with a random sequence of words or something.

The Source

#!/usr/bin/env python3
import sys
import re
import hashlib
from icecream import ic; ic.configureOutput(includeContext=True)

def main():
  for x in sys.argv[1:]:
    try:
      procfile(x)
    except Exception as e:
      ic("Exception proc",x,type(e),e)
      continue

scriptre = re.compile(r"(<script[^>]*>\s*</script>)")
srcre = re.compile(r"src=(['\"])(.*?)\1")
linkre = re.compile(r"(<link[^>]*>)")
hrefre = re.compile(r"href=(['\"])(.*?)\1")
def procfile(fn):
  try:
    with open(fn) as f:
      a = f.read()
  except Exception as e:
    ic(f"Exception reading {x}",type(e),e)
    return
  orig = a
  m = scriptre.findall(a)
  scriptsrcs = []
  styles = []
  dscr = {}
  dsty = {} 
  for y in m:
    print(y)
    scriptsrcs.append(y)
  m = linkre.findall(a)
  for y in m:
    print(y)
    styles.append(y)
  for s in scriptsrcs:
    m = srcre.search(s)
    if not m:
      print(f"#fail src {s}")
      continue
    src = m.group(2)
    try:
      with open(src) as f:
        a = f.read()
        dscr[s] = "<script>\n"+a+"\n</script>\n"
    except Exception as e:
      ic("Exception read src",src,type(e),e)
      raise
  for s in styles:
    m = hrefre.search(s)
    if not m:
      print(f"#fail href {s}")
      continue
    src = m.group(2)
    try:
      with open(src) as f:
        a = f.read()
        dsty[s] = "<style>\n"+a+"\n</style>\n"
    except Exception as e:
      ic("Exception read src",src,type(e),e)
      raise
  tmp = orig
  hmap = {}
  for k,v in dsty.items():
    h = hashlib.sha256()
    h.update(k.encode())
    h = h.hexdigest()
    hmap[h] = v
    tmp = tmp.replace(k,h)
  for k,v in dscr.items():
    h = hashlib.sha256()
    h.update(k.encode())
    h = h.hexdigest()
    hmap[h] = v
    tmp = tmp.replace(k,h)
  for k,v in hmap.items():
    tmp = tmp.replace(k,v)

  bn = fn.split("/")[-1]
  ofn = f"../m/{bn}"
  with open(ofn,"wt") as f:
    print(tmp,file=f)
    print("Written",ofn)

if __name__ == "__main__":
  main()