|
PDFsharp - moved to http://forum.pdfsharp.net/ Please visit the new PDFsharp forum at http://forum.pdfsharp.net/
|
View previous topic :: View next topic |
Author |
Message |
Mpasc
Joined: 03 Dec 2008 Posts: 6
|
Posted: Mon Dec 08, 2008 9:43 am Post subject: Migradoc: encoding ö, ä, ß, ü, etc from html text |
|
|
Hello all,
I had a German text from a textarea with ö, ä, ß, ü, etc. But they are written in the PDF like a square.
I create a Migradoc document and then I render it with PdfDocumentRenderer in the following way:
Code: |
//First I parse the HTML text
htmlText = htmlText.Replace("Ä ;", "Ä");
htmlText= htmlText.Replace("Ë ;", "Ë");
htmlText= htmlText.Replace("Ï ;", "Ï");
htmlText= htmlText.Replace("Ö ;", "Ö");
htmlText= htmlText.Replace("Ü ;", "Ü");
htmlText = htmlText.Replace("ä ;", "ä");
htmlText = htmlText.Replace("ë ;", "ë");
htmlText = htmlText.Replace("ï ;", "ï");
htmlText = htmlParagraphs.Replace("ö ;", "o");
htmlText = htmlParagraphs.Replace("ü ;", "ü");
htmlText = htmlParagraphs.Replace("ß ;", "ß");
Document document = new Document();
//Then I create the sections and paragraphs with the text
[...]
//Finally I create the PdfDocumentRenderer object like this:
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);
renderer.Document = document;
renderer.RenderDocument();
//And send it to the browser
Response.Clear();
Response.ClearContent();
Response.ClearHeaders();
Response.Buffer = true;
Response.ContentType = "application/pdf";
Response.AddHeader("content-length", stream.Length.ToString());
Response.BinaryWrite(stream.ToArray());
Response.Flush();
stream.Close();
Response.End();
|
[NOTE: in the original code there is no space between the code and the semicolon (ex: Ä but i did it like this to avoid the browser codes it.]
But as I said before the diaeresis and other special characters are displayed as empty square .
Thank you! _________________ MPasc |
|
Back to top |
|
|
Thomas Hoevel
Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany
|
Posted: Mon Dec 08, 2008 1:38 pm Post subject: |
|
|
Hi!
Could this be the error:
Code: | htmlText = htmlText.Replace("ï ;", "ï");
htmlText = htmlParagraphs.Replace("ö ;", "o");
|
All previous replacements at htmlText are overwritten with the new assignment from htmlParagraphs.
All ANSI characters should work (be sure to activate Unicode if you want to include non-ANSI characters). _________________ Regards
Thomas Hoevel
PDFsharp Team |
|
Back to top |
|
|
Mpasc
Joined: 03 Dec 2008 Posts: 6
|
Posted: Mon Dec 08, 2008 1:44 pm Post subject: |
|
|
Thomas Hoevel wrote: | Hi!
Could this be the error:
Code: | htmlText = htmlText.Replace("ï ;", "ï");
htmlText = htmlParagraphs.Replace("ö ;", "o");
|
All previous replacements at htmlText are overwritten with the new assignment from htmlParagraphs.
All ANSI characters should work (be sure to activate Unicode if you want to include non-ANSI characters). |
Sorry, the htmlParagraphs was the original name of the variable and I change it here to make my explanation clearer. So the original code is:
Code: |
htmlParagraphs = htmlParagraphs.Replace("Ä", "Ä");
htmlParagraphs = htmlParagraphs.Replace("Ë", "Ë");
htmlParagraphs = htmlParagraphs.Replace("Ï", "Ï");
htmlParagraphs = htmlParagraphs.Replace("Ö", "Ö");
htmlParagraphs = htmlParagraphs.Replace("Ü", "Ü");
htmlParagraphs = htmlParagraphs.Replace("ä", "ä");
htmlParagraphs = htmlParagraphs.Replace("ë", "ë");
htmlParagraphs = htmlParagraphs.Replace("ï", "ï");
htmlParagraphs = htmlParagraphs.Replace("ö", "o");
htmlParagraphs = htmlParagraphs.Replace("ü", "ü");
htmlParagraphs = htmlParagraphs.Replace("ß", "ß");
|
htmlParagrapsh is just a String with the html coded text.
However, you mentioned that I should make sure to activate Unicode. When I create the PdfDocumentRenderer I set it like:
Code: |
PdfDocumentRenderer renderer = new PdfDocumentRenderer(true, PdfSharp.Pdf.PdfFontEmbedding.Always);
|
Should I do any other thing to activate unicode then?
Thank you!![/code] _________________ MPasc |
|
Back to top |
|
|
Thomas Hoevel
Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany
|
Posted: Mon Dec 08, 2008 3:24 pm Post subject: Re: Migradoc: encoding ö, ä, ß, ü, etc from html text |
|
|
Mpasc wrote: | But as I said before the diaeresis and other special characters are displayed as empty square . |
I guess I was on the wrong track.
Which font do you use?
The empty square is normally the default character for anything that's not implemented in a font.
The default font for MigraDoc is "Verdana". _________________ Regards
Thomas Hoevel
PDFsharp Team |
|
Back to top |
|
|
Mpasc
Joined: 03 Dec 2008 Posts: 6
|
Posted: Tue Dec 09, 2008 8:58 am Post subject: |
|
|
Hello,
I use Arial.
Following the example HelloMigradoc I define the style in a method like this:
Code: |
public static void DefineStyles(Document
{
MigraDoc.DocumentObjectModel.Style style;
// Get the predefined style Normal.
style = document.Styles["Normal"];
// Modify the style
style.Font.Name = "Arial";
style.Font.Size = 10;
style.Font.Bold = false;
style.ParagraphFormat.Alignment = ParagraphAlignment.Justify;
style.ParagraphFormat.SpaceBefore = 12;
style.ParagraphFormat.SpaceAfter = 12;
//Style for Heading1
style = document.Styles["Heading1"];
style.Font.Name = "Arial";
style.Font.Size = 14;
style.Font.Bold = true;
style.Font.Color = Colors.DarkBlue;
style.ParagraphFormat.PageBreakBefore = true;
style.ParagraphFormat.SpaceAfter = 6;
// Create a new style called TextBox based on style Normal
style = document.Styles.AddStyle("TextBox", "Normal");
style.Font.Bold = true;
style.Font.Size = 40;
style.ParagraphFormat.Borders.Width = 2.5;
style.ParagraphFormat.Borders.Distance = 3;
}
|
And then, in another method, I create the paragraphs and set the style:
Code: |
public static Paragraph CreateParagraph(Document document, String text, String style)
{
//the style parameter is Normal or TextBox
Paragraph paragraph = document.LastSection.AddParagraph();
paragraph.Style = style;
paragraph.AddFormattedText(HTMLParser.getUntaggedText(text), style);
return paragraph;
}
|
The results are:
- The TextBox style does not work (all text has Normal style then)
- The vowels with diaeresis are still replaced by blank squares
Any clue?
Thank you very much. _________________ MPasc |
|
Back to top |
|
|
Thomas Hoevel
Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany
|
Posted: Tue Dec 09, 2008 10:00 am Post subject: |
|
|
Mpasc wrote: | And, besides, I still see the blank squares instead of the diaeresis. |
Umlaute do work with PDFsharp.
Are Umlaute handled correctly in your source code? Visual Studio I presume? Did you set file encoding to UTF-8?
Do you see correct strings in the Debugger?
Did you try to save a PDF file on the server? Check the Umlaute there.
Maybe they get lost while transfering the file from the server to the client.
Have you tried using HtmlDecode instead of replacing the characters?
BTW: "Normal" is the default style that is used if the Style of a paragraph is null. _________________ Regards
Thomas Hoevel
PDFsharp Team |
|
Back to top |
|
|
Mpasc
Joined: 03 Dec 2008 Posts: 6
|
Posted: Wed Dec 10, 2008 2:29 pm Post subject: |
|
|
Hello,
I already take the text from the textarea with HtmlDecode. However, I get the diaeresis with the code. For example, I get "ï ;"(with no blank spaces between the characters) for ï, so I tried to replace them as I did.
I have debugged the application and I can see that the string has the blank squares already in the server, so it is not due to that they are being lost during the transfer to the client.
Regarding to your other suggestions:
- What is umlaute?
- I work with Microsoft Visual Studio 2005.
- How can I encode to UTF-8? Should I set it somehow in the Migradoc document?
Thank you very much for all you help! _________________ MPasc |
|
Back to top |
|
|
Thomas Hoevel
Joined: 16 Oct 2006 Posts: 387 Location: Cologne, Germany
|
Posted: Thu Dec 11, 2008 8:56 am Post subject: |
|
|
Hello!
Mpasc wrote: | I have debugged the application and I can see that the string has the blank squares already in the server |
That leaves me rather clueless.
So I'd say there are two possible explanations:
- The special characters are already lost while replacing
- The special characters do not exist in the fonts on the server
- With remote debugging: maybe characters get lost between server and debugger client
AFAIK a "string" in C# is always Unicode. Special characters are no problem for C#.
There's no diaresis in German. We have ÄÖÜäöü and call them "Umlaute". The ligature ß is a different story, but PDFsharp handles all these characters correctly (Unicode mode or not).
As long as you see blank squares in the debugger try to cure the problem in the C# code on the server. It can't be a problem of the HTML response settings or the MigraDoc settings. _________________ Regards
Thomas Hoevel
PDFsharp Team |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|