Jonathan Tait wrote on 02/19/18 at 08:03:11:
Everyman already produces electronic versions via Kindle and their own chess-playing app. And the whole book goes to print as a PDF. What we want is to produce ChessBase compatible ebooks as well. Do you actually have any helpful suggestions as to how to do that without copying and pasting text manually? Or are you just being a smartarse?
I'm not trying to be a smart-arse, it probably just seems that way because of, you know, the internet.
As it happens, I just bought Hansen's Closed Sicilian MBM. Game 5 in the PGN file begins:
Quote:1. e4 c5 2. Nc3 Nc6 ({As a committed Najdorf player, Novikov opted for} 2... d6
{here, and only after} 3. g3 {then} Nc6 4. Bg2 g6 5. d3 Bg7 {etc.}) 3. g3 g6 4.
Bg2 Bg7 5. d3 d6 6. Be3 e6 7. Qd2 {The most consistent continuation;} ({
instead,} 7. f4 Nge7 8. Nf3 O-O 9. O-O {transposes to 6 f4 e6 lines in Chapter
Six.}) 7... Nd4 {This is generally considered premature. Black should normally
Now when I unpack the EPUB file, I see that index_split_008.html contains the following (I replaced the angle-braces with curly-braces, and bolded the text between the braces.) :
Quote:{span class="bold"}1 e4 c5 2 Nc3 Nc6{/span}{/span}
{/div}
{p class="calibre_1"}{span class="calibre1" xml:lang="EN-US"}As a committed Najdorf player, Novikov opted for 2 ... d6 here, and only after 3 g3 then 3 ... Nc6 4 Bg2 g6 5 d3 Bg7 etc.{/span}{/p}
{p class="calibre_1"}{span class="calibre1 bold" xml:lang="EN-US"}3 g3 g6 4 Bg2 Bg7 5 d3 d6 6 Be3 e6 7 Qd2{/span}{/p}
{p class="calibre_1"}{span class="calibre1" xml:lang="EN-US"}The most consistent continuation; instead, 7 f4 Nge7 8 Nf3 0-0 9 0-0 transposes to 6 f4 e6 lines in Chapter Six.{/span}{/p}
{p class="calibre_1"}{span class="calibre1 bold" xml:lang="EN-US"}7 ... Nd4{/span}{/p}
{p class="calibre_1"}{span class="calibre1" xml:lang="EN-US"}This is generally considered premature. Black should normally
So, if Everyman sold me the EPUB but did not include a PGN, I would write a script to extract all the HTML content to a TXT file, then in a text editor transform that to PGN by adding the tags and braces. But I would never advise Everyman to do it that way. Instead Everyman should create some html-classes to distinguish pgn-moves from pgn-comments from editorial-comments. Then create a custom css to render the html-classes as desired. Now the script that extracts the HTML content from the EPUB can read the html-classes to
automatically generate a valid PGN file.
Check this out from Wikipedia.
https://en.wikipedia.org/wiki/EPUB Quote:Data interchange
EPUB is a popular way to feed the ebook creation process, because it is an open format and is based on HTML, while Amazon's format is proprietary. EPUB is the "first step" (initial) format of the content in many production processes and supply chains.
BTW, the file MBMCSicilian.pgn is NOT valid pgn, it violates the specification in numerous ways. So even though I do not have to extract from the epub, I still have to spend a lot of time in a text editor getting their pgn file to conform to the pgn standard. The pgn I quoted above already contains a mistake according to the standard. See if you can spot it.