Fix UTF-8 fuckup in titleshortening. - Fb2RSS - A Facebook to RSS conversion tool

commit 43670b464fc237b113f32d592af18c218a02bf1f
parent 05d0bcf2a56a9585431226d2b381d69469a25325
Author: Dominik Schmidt <das1993@hotmail.com>
Date:   Wed,  1 Jul 2015 23:56:44 +0200

Fix UTF-8 fuckup in titleshortening.

Since the offset may fall exactly between a multibytecharacter, which then gets
ripped apart, we need to use a special function(toUTFindex) to calculate the offset,
where in the characterarray the title_cutoff-th character is.

This introduces a certain vagueness, how long the shortened string will be.
If the cont string has a lot of multibytecharacters, the title_cutoff will be matched sooner,
so the title will be shorter.

Diffstat:
Fb2RSS.d  | 9 +++++++--

1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/Fb2RSS.d b/Fb2RSS.d
@@ -37,6 +37,7 @@ import std.string;
 import std.datetime;
 import std.range;
 import std.file;
+import std.utf;
 import kxml.xml;
 
 /**
@@ -307,11 +308,15 @@ struct Post{
 	///The count of characters, until the title gets cut off.
 	static ushort title_cutoff=80;
 	
-	///@return The title of the posting 
+	/**
+	 * @return The title of the posting 
+	 * @bug title_cutoff is reached with fewer characters when there are 
+	 * 	a lot of multibyte characters in the string.
+	 */
 	@property string title(){
 		string cont=content.getChildren()[0].getCData();
 		if(cont.length>title_cutoff){
-			cont=cont[0..title_cutoff];
+			cont=cont[0..toUTFindex(cont,title_cutoff)];
 			cont~="...";
 		}
 		return cont;

	Fb2RSS A Facebook to RSS conversion tool
	git clone git://xatko.vsos.ethz.ch/Fb2RSS.git
	Log \| Files \| Refs \| Submodules