{"id":1359,"date":"2012-04-11T09:21:05","date_gmt":"2012-04-11T08:21:05","guid":{"rendered":"http:\/\/www.sachaheck.net\/blog\/?p=1359"},"modified":"2012-04-12T15:05:18","modified_gmt":"2012-04-12T14:05:18","slug":"using-grep-to-tweak-epub","status":"publish","type":"post","link":"https:\/\/www.sachaheck.net\/blog\/digital-media\/using-grep-to-tweak-epub","title":{"rendered":"Using GREP to tweak EPUB-files"},"content":{"rendered":"<p>The last few weeks, I have been working on some EPUB-files which have been generated from InDesign CS4 some while ago. I believe it was December 2010. Tweaking the files with Sigil, I noticed, that there are many paragraphs which include a class generated by InDesign which is absolutely useless. Here is an excerpt of the HTML-file:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-1360\" title=\"vorher\" src=\"https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/vorher.jpg\" alt=\"\" width=\"612\" height=\"348\" srcset=\"https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/vorher.jpg 612w, https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/vorher-300x170.jpg 300w\" sizes=\"(max-width: 612px) 100vw, 612px\" \/><\/p>\n<p>Every paragraph contains a span class &#8222;generated-style&#8220; which has been exported from InDesign. <strong>With exporting from InDesign CS5 or InDesign CS5.5 this doesn&#8217;t happen!<\/strong> When you have a look at the CSS, nothing is defined for this automatically generated class. So you really don&#8217;t need it and it only bloats your file.<\/p>\n<h2>GREP to rescue<\/h2>\n<p>So what can you do about this issue? While this circumstance does not prevent your EPUB to work properly, you however may want to have a clean code. So I had this idea to do a search with <a title=\"GREP\" href=\"https:\/\/www.sachaheck.net\/blog\/indesign\/grep1\" target=\"_blank\">GREP<\/a> and delete every entry of this useless class in the HTML-files. You know, regular expressions can be really awesome to do such kind of complicated search\/replace work. Many programmes already support GREP-Search, like InDesign too. But here in this case tweaking the EPUB-files, I&#8217;m working with <a title=\"Sigil\" href=\"http:\/\/code.google.com\/p\/sigil\/\" target=\"_blank\">Sigil<\/a> and <a title=\"Barebones BBEdit\" href=\"http:\/\/www.barebones.com\/products\/bbedit\/\" target=\"_blank\">BBEdit<\/a> which also understand regular expressions (or commonly also called regex).<\/p>\n<p>Well, I&#8217;m not a great Regex expert. I only use it for little things, but this challenge was already a bit more complicated. I don&#8217;t know any people around who are good at regex, so I started a request for help on twitter. <a title=\"Twitter\" href=\"http:\/\/www.twitter.com\" target=\"_blank\">Twitter<\/a> is really great to connect with geeks and experts around the world. So shortly after my request, Ahmad Moqanasa (@<a title=\"Ahmad Moqanasa\" href=\"http:\/\/www.twitter.com\/AbuGnais\" target=\"_blank\">AbuGnais<\/a>) answered with a great search string. He&#8217;s a developer and geek from <a title=\"Amman\" href=\"http:\/\/en.wikipedia.org\/wiki\/Amman\" target=\"_blank\">Amman<\/a> and he suggested the following:<\/p>\n<pre>(&lt;[^&lt;&gt;\/]*?class=\\\"generated-style\"[^&lt;&gt;\/]*?&gt;)([^&lt;&gt;]*?)(&lt;\/[^&lt;&gt;\/]*?&gt;)<\/pre>\n<p>&nbsp;<\/p>\n<p>Wow this looks complicated,\u00a0 doesn&#8217;t it? I can&#8217;t explain it to you, however I wanted to share this bit of GREP. So let&#8217;s see how this works. I entered this search string in BBEdit&#8217;s Find\/Replace dialogue (do not forget to check the GREP option). And you replace with the second match:\u00a0\\2.<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-1361\" title=\"GREP_Search\" src=\"https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/GREP_Search.jpg\" alt=\"\" width=\"650\" height=\"285\" srcset=\"https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/GREP_Search.jpg 650w, https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/GREP_Search-300x131.jpg 300w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/p>\n<p>The expression finds the whole class &#8222;generated-style&#8220; including its content. But replacing with the second match only deletes the class, not the content. The result is this:<\/p>\n<p><img loading=\"lazy\" class=\"alignnone size-full wp-image-1362\" title=\"nachher\" src=\"https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/nachher.jpg\" alt=\"\" width=\"615\" height=\"260\" srcset=\"https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/nachher.jpg 615w, https:\/\/www.sachaheck.net\/blog\/wp-content\/uploads\/2012\/04\/nachher-300x126.jpg 300w\" sizes=\"(max-width: 615px) 100vw, 615px\" \/><\/p>\n<p>Ok, now we have a clean code and got rid of this superflous span class. I find this regex very helpful in this case. I don&#8217;t know if you can use this GREP too, but if you can use it, you know where to find it ;-)<\/p>\n<p>Do you know other cases where GREP could be helpful with EPUB-tweaking?<\/p>\n<h2>EDIT<\/h2>\n<p>As Kai points out in the comments, this GREP is shorter and does also a very decent job. Check it out:<\/p>\n<pre>Search: &lt;span class=\"generated-style\"&gt;(.+?)&lt;\/span&gt;<\/pre>\n<pre>Replace: \\1<\/pre>\n<div class=\"shariff\"><ul class=\"shariff-buttons theme-default orientation-horizontal buttonsize-medium\"><li class=\"shariff-button twitter shariff-nocustomcolor\" style=\"background-color:#1e3050\"><a href=\"https:\/\/twitter.com\/share?url=https%3A%2F%2Fwww.sachaheck.net%2Fblog%2Fdigital-media%2Fusing-grep-to-tweak-epub&text=Using%20GREP%20to%20tweak%20EPUB-files\" title=\"Bei X (Twitter) teilen\" aria-label=\"Bei X (Twitter) teilen\" role=\"button\" rel=\"noreferrernoopener nofollow\" class=\"shariff-link\" style=\"; background-color:#000000; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" height=\"1em\" viewBox=\"0 0 512 512\"><!--! Font Awesome Free 6.4.2 by @fontawesome - https:\/\/fontawesome.com License - https:\/\/fontawesome.com\/license (Commercial License) Copyright 2023 Fonticons, Inc. --><style>svg{fill:#ffffff}<\/style><path d=\"M389.2 48h70.6L305.6 224.2 487 464H345L233.7 318.6 106.5 464H35.8L200.7 275.5 26.8 48H172.4L272.9 180.9 389.2 48zM364.4 421.8h39.1L151.1 88h-42L364.4 421.8z\"\/><\/svg><\/span><span class=\"shariff-text\">twittern<\/span>&nbsp;<\/a><\/li><li class=\"shariff-button facebook shariff-nocustomcolor\" style=\"background-color:#4273c8\"><a href=\"https:\/\/www.facebook.com\/sharer\/sharer.php?u=https%3A%2F%2Fwww.sachaheck.net%2Fblog%2Fdigital-media%2Fusing-grep-to-tweak-epub\" title=\"Bei Facebook teilen\" aria-label=\"Bei Facebook teilen\" role=\"button\" rel=\"noreferrernoopener nofollow\" class=\"shariff-link\" style=\"; background-color:#3b5998; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg width=\"32px\" height=\"20px\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 18 32\"><path fill=\"#3b5998\" d=\"M17.1 0.2v4.7h-2.8q-1.5 0-2.1 0.6t-0.5 1.9v3.4h5.2l-0.7 5.3h-4.5v13.6h-5.5v-13.6h-4.5v-5.3h4.5v-3.9q0-3.3 1.9-5.2t5-1.8q2.6 0 4.1 0.2z\"\/><\/svg><\/span><span class=\"shariff-text\">teilen<\/span>&nbsp;<\/a><\/li><li class=\"shariff-button linkedin shariff-nocustomcolor\" style=\"background-color:#1488bf\"><a href=\"https:\/\/www.linkedin.com\/sharing\/share-offsite\/?url=https%3A%2F%2Fwww.sachaheck.net%2Fblog%2Fdigital-media%2Fusing-grep-to-tweak-epub\" title=\"Bei LinkedIn teilen\" aria-label=\"Bei LinkedIn teilen\" role=\"button\" rel=\"noreferrernoopener nofollow\" class=\"shariff-link\" style=\"; background-color:#0077b5; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg width=\"32px\" height=\"20px\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 27 32\"><path fill=\"#0077b5\" d=\"M6.2 11.2v17.7h-5.9v-17.7h5.9zM6.6 5.7q0 1.3-0.9 2.2t-2.4 0.9h0q-1.5 0-2.4-0.9t-0.9-2.2 0.9-2.2 2.4-0.9 2.4 0.9 0.9 2.2zM27.4 18.7v10.1h-5.9v-9.5q0-1.9-0.7-2.9t-2.3-1.1q-1.1 0-1.9 0.6t-1.2 1.5q-0.2 0.5-0.2 1.4v9.9h-5.9q0-7.1 0-11.6t0-5.3l0-0.9h5.9v2.6h0q0.4-0.6 0.7-1t1-0.9 1.6-0.8 2-0.3q3 0 4.9 2t1.9 6z\"\/><\/svg><\/span><span class=\"shariff-text\">mitteilen<\/span>&nbsp;<\/a><\/li><\/ul><\/div>","protected":false},"excerpt":{"rendered":"<p>The last few weeks, I have been working on some EPUB-files which have been generated from InDesign CS4 some while ago. I believe it was December 2010. Tweaking the files with Sigil, I noticed, that there are many paragraphs which include a class generated by InDesign which is absolutely useless. So I had this idea to do a search with GREP and delete every entry of this useless class in the HTML-files.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0},"categories":[100],"tags":[280,191,102,276,88,279,181,78,424,277,278,192],"_links":{"self":[{"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/posts\/1359"}],"collection":[{"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/comments?post=1359"}],"version-history":[{"count":16,"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/posts\/1359\/revisions"}],"predecessor-version":[{"id":1381,"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/posts\/1359\/revisions\/1381"}],"wp:attachment":[{"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/media?parent=1359"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/categories?post=1359"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.sachaheck.net\/blog\/wp-json\/wp\/v2\/tags?post=1359"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}