The last few weeks, I have been working on some EPUB-files which have been generated from InDesign CS4 some while ago. I believe it was December 2010. Tweaking the files with Sigil, I noticed, that there are many paragraphs which include a class generated by InDesign which is absolutely useless. Here is an excerpt of the HTML-file:
Every paragraph contains a span class “generated-style” which has been exported from InDesign. With exporting from InDesign CS5 or InDesign CS5.5 this doesn’t happen! When you have a look at the CSS, nothing is defined for this automatically generated class. So you really don’t need it and it only bloats your file.
GREP to rescue
So what can you do about this issue? While this circumstance does not prevent your EPUB to work properly, you however may want to have a clean code. So I had this idea to do a search with GREP and delete every entry of this useless class in the HTML-files. You know, regular expressions can be really awesome to do such kind of complicated search/replace work. Many programmes already support GREP-Search, like InDesign too. But here in this case tweaking the EPUB-files, I’m working with Sigil and BBEdit which also understand regular expressions (or commonly also called regex).
Well, I’m not a great Regex expert. I only use it for little things, but this challenge was already a bit more complicated. I don’t know any people around who are good at regex, so I started a request for help on twitter. Twitter is really great to connect with geeks and experts around the world. So shortly after my request, Ahmad Moqanasa (@AbuGnais) answered with a great search string. He’s a developer and geek from Amman and he suggested the following:
Wow this looks complicated, doesn’t it? I can’t explain it to you, however I wanted to share this bit of GREP. So let’s see how this works. I entered this search string in BBEdit’s Find/Replace dialogue (do not forget to check the GREP option). And you replace with the second match: \2.
The expression finds the whole class “generated-style” including its content. But replacing with the second match only deletes the class, not the content. The result is this:
Ok, now we have a clean code and got rid of this superflous span class. I find this regex very helpful in this case. I don’t know if you can use this GREP too, but if you can use it, you know where to find it ;-)
Do you know other cases where GREP could be helpful with EPUB-tweaking?
As Kai points out in the comments, this GREP is shorter and does also a very decent job. Check it out:
Search: <span class="generated-style">(.+?)</span>