This is a simple script I wrote, for my music exercise generators here, which take all referenced <script src=...></script> and <link rel="stylesheet" href="..."/> tags and replace them with inline <script> and <style> elements. Not particularly robust, but served my needs. The results
are here, which are then single self-contained HTML files which are easier
to save and use offline. (And save and use offline is the main purpose of doing this: no need for a _files
directory to move around with the .html file.)
Why Hashes
Yes, this is a bodge. If you want to 'write it properly', feel free. This shows the essential
bits and pieces necessary. One thing that may seem odd is the use of hashlib: this ensures
that when iteratively searching and replacing <script src...> with its inline version,
if e.g. a <script src...> string appears in the replacement string, it isn't subsequently
replaced. The assumption here is that the hash hex digest string won't show up in any of the files.
To illustrate, suppose we are doing the following replacement:
hello => hello world
world => mr flibble
and we start with the string hello. If we apply these in a top-to-bottom order, we get
hello
hello world
hello mr flibble
but if we apply them in a bottom-to-top order, we get
hello
hello world
in this first case the world in the replacement of hello by hello world, is then replaced
by mr flibble. We don't want that. Doing two passes, one replacing what is to be replaced
by its hex digest, and then another pass replacing the hex digests, avoids this issue,
provided the hex digest doesn't appear in one of the replacement texts. This is a fair assumption,
and should it go wrong, we can just sugar what gets hashed with a random sequence of words or something.
The Source
#!/usr/bin/env python3
import sys
import re
import hashlib
from icecream import ic; ic.configureOutput(includeContext=True)
def main():
for x in sys.argv[1:]:
try:
procfile(x)
except Exception as e:
ic("Exception proc",x,type(e),e)
continue
scriptre = re.compile(r"(<script[^>]*>\s*</script>)")
srcre = re.compile(r"src=(['\"])(.*?)\1")
linkre = re.compile(r"(<link[^>]*>)")
hrefre = re.compile(r"href=(['\"])(.*?)\1")
def procfile(fn):
try:
with open(fn) as f:
a = f.read()
except Exception as e:
ic(f"Exception reading {x}",type(e),e)
return
orig = a
m = scriptre.findall(a)
scriptsrcs = []
styles = []
dscr = {}
dsty = {}
for y in m:
print(y)
scriptsrcs.append(y)
m = linkre.findall(a)
for y in m:
print(y)
styles.append(y)
for s in scriptsrcs:
m = srcre.search(s)
if not m:
print(f"#fail src {s}")
continue
src = m.group(2)
try:
with open(src) as f:
a = f.read()
dscr[s] = "<script>\n"+a+"\n</script>\n"
except Exception as e:
ic("Exception read src",src,type(e),e)
raise
for s in styles:
m = hrefre.search(s)
if not m:
print(f"#fail href {s}")
continue
src = m.group(2)
try:
with open(src) as f:
a = f.read()
dsty[s] = "<style>\n"+a+"\n</style>\n"
except Exception as e:
ic("Exception read src",src,type(e),e)
raise
tmp = orig
hmap = {}
for k,v in dsty.items():
h = hashlib.sha256()
h.update(k.encode())
h = h.hexdigest()
hmap[h] = v
tmp = tmp.replace(k,h)
for k,v in dscr.items():
h = hashlib.sha256()
h.update(k.encode())
h = h.hexdigest()
hmap[h] = v
tmp = tmp.replace(k,h)
for k,v in hmap.items():
tmp = tmp.replace(k,v)
bn = fn.split("/")[-1]
ofn = f"../m/{bn}"
with open(ofn,"wt") as f:
print(tmp,file=f)
print("Written",ofn)
if __name__ == "__main__":
main()