Fb2RSS

A Facebook to RSS conversion tool
git clone git://xatko.vsos.ethz.ch/Fb2RSS.git
Log | Files | Refs | Submodules

commit 43670b464fc237b113f32d592af18c218a02bf1f
parent 05d0bcf2a56a9585431226d2b381d69469a25325
Author: Dominik Schmidt <das1993@hotmail.com>
Date:   Wed,  1 Jul 2015 23:56:44 +0200

Fix UTF-8 fuckup in titleshortening.

Since the offset may fall exactly between a multibytecharacter, which then gets
ripped apart, we need to use a special function(toUTFindex) to calculate the offset,
where in the characterarray the title_cutoff-th character is.

This introduces a certain vagueness, how long the shortened string will be.
If the cont string has a lot of multibytecharacters, the title_cutoff will be matched sooner,
so the title will be shorter.

Diffstat:
Fb2RSS.d | 9+++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Fb2RSS.d b/Fb2RSS.d @@ -37,6 +37,7 @@ import std.string; import std.datetime; import std.range; import std.file; +import std.utf; import kxml.xml; /** @@ -307,11 +308,15 @@ struct Post{ ///The count of characters, until the title gets cut off. static ushort title_cutoff=80; - ///@return The title of the posting + /** + * @return The title of the posting + * @bug title_cutoff is reached with fewer characters when there are + * a lot of multibyte characters in the string. + */ @property string title(){ string cont=content.getChildren()[0].getCData(); if(cont.length>title_cutoff){ - cont=cont[0..title_cutoff]; + cont=cont[0..toUTFindex(cont,title_cutoff)]; cont~="..."; } return cont;