Author Topic: Windows 10 character encoding in Notepad & Wordpad - Need help.  (Read 17681 times)

0 Members and 1 Guest are viewing this topic.

Online AndersJTopic starter

  • Frequent Contributor
  • **
  • Posts: 409
  • Country: se
Windows 10 character encoding in Notepad & Wordpad - Need help.
« on: February 27, 2019, 07:54:13 pm »
I have numerous text files created over the years.
I am in Sweden, so I use a handful of swedish characters encoded over 0x7F.
This has never been a problem, in various computer generations.

Until now, that is.

A brand new Dell,
with Windows 10/64 cannot properly open my  *.txt files in Notepad or Wordpad.
All swedish characters are shown as bold question marks.

New text is entered properly, and displays correctly.
It is the old characters that are incorrect.

Does anyone know what is going on here?
How can I fix this?

"It should work"
R.N.Naidoo
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1580
  • Country: de
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #1 on: February 27, 2019, 08:15:46 pm »
I'd guess your text files were edited in an ANSI character set with Swedish locale settings. I would think you just need to convert them to UTF-8.
A proper text editor like Notepad++ should be able to help you out.
You open your text files, select the correct ANSI encoding (Encoding->Character Sets) and finally convert the files to UTF-8 (Encoding->Convert to UTF-8).
Trying is the first step towards failure - Homer J. Simpson
 

Online AndersJTopic starter

  • Frequent Contributor
  • **
  • Posts: 409
  • Country: se
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #2 on: February 27, 2019, 08:58:08 pm »
I have many files of various kinds,
that have worked well in many computers over the years, and still do.

Now I purchase this ONE exception, that cannot display the files correctly.
I am looking for an explanation of why this suddenly happens.
After that I want to fix the problem in the NEW machine.

Please focus on that,
rather than suggesting I convert all files,
possibly creating new problems for each and every person and computer that use them.
"It should work"
R.N.Naidoo
 

Offline grizewald

  • Frequent Contributor
  • **
  • Posts: 612
  • Country: ua
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #3 on: February 27, 2019, 09:04:41 pm »
Install Notepad++.

Not only will it view your text files properly, it will let you change the encoding (which is probably ISO-8859-1) to UTF-8 (which I believe might be the default encoding in Win 10, but don't quote me on that as I run Linux).

Notepad++ is very flexible when it comes to encodings and lets you freely change from one to another and will also show you how a file is currently encoded.

  Lord of Sealand
 

Offline Masa

  • Contributor
  • Posts: 20
  • Country: fi
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #4 on: February 27, 2019, 10:24:26 pm »
You probably have same problem as this guy:

https://www.tenforums.com/software-apps/110704-unreadable-non-ansi-characters-notepad.html

Is your new Windows 10 in English or Swedish?
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1580
  • Country: de
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #5 on: February 27, 2019, 10:36:27 pm »
This again comes down to converting an ANSI with with some Swedish code page to Unicode (UTF-8). ANSI with code pages is a thing of the 20th century, just convert the file to UTF-8.
Trying is the first step towards failure - Homer J. Simpson
 
The following users thanked this post: alexanderbrevig, tooki, newbrain

Offline soldar

  • Super Contributor
  • ***
  • Posts: 3534
  • Country: es
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #6 on: March 02, 2019, 10:12:22 pm »
Try this: open a command line terminal window and type chcp. That should tell you what default page code the computer is using. If the bad one is different from the good ones then you can use that command to change it. See if that works.

Swedish code page is 20107.  https://en.wikipedia.org/wiki/Windows_code_page
« Last Edit: March 02, 2019, 10:18:45 pm by soldar »
All my posts are made with 100% recycled electrons and bare traces of grey matter.
 

Online AndersJTopic starter

  • Frequent Contributor
  • **
  • Posts: 409
  • Country: se
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #7 on: March 03, 2019, 09:33:45 am »
Thanks for a excellent suggestion.
Unfortunately it did not work.
Reboot restored the code page to 65001.
"It should work"
R.N.Naidoo
 

Offline Zero999

  • Super Contributor
  • ***
  • Posts: 19836
  • Country: gb
  • 0999
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #8 on: March 03, 2019, 10:16:14 am »
Do you have any files which you don't mind sharing? If so, then please post them as an attachment and someone might be able to help.]

Have you tried MS Word? Or you could try LibreOffice.
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1580
  • Country: de
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #9 on: March 03, 2019, 11:03:36 am »
IMHO, "chcp" just shows (or allows to change) the codepage that the (current) console (!) is using. It displays 850 for me btw. which is/was the Western Europe code page in times before UTF-8.
It's actually surprising that yours shows an UTF-8 Unicode codepage. My understanding was that the console is supposed to use a byte based code page for backward compatibility.
Besides, I wonder why the default Wester Europe code page 850 wouldn't work for Sweden. It should contain all the needed diacritics.

Anyway, as German shares most of the Swedish diacritics (apart from "Å" I believe), we're in the same boat. Most of my very old text files use the Western European codepage 850 though which is still displayed correctly in Notepad and Wordpad under Windows 10. E.g. in these files, the German "ä" is encoded 0x84. I have a very few text files which seem to be encoded in ISO 8859 which are not displayed correctly in Notepad/Wordpad. In these files, the "ä" is encoded 0xe4.

So first of all you need to find out which code page was used to create these files. Then you could check the default oem code page set in the registry (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP). For me, it's set to 850 which explains why I can still open 8bit ANSI text files which were created for/with code page 850. It's possible to edit this entry btw. but it should be your last resort as it can cause any kind of issues.

The most future safe approach would still be to convert your files to UTF-8. You might not be aware of this, but since (at least) ten years or so, UTF-8 is the de facto standard for text files. E.g. all source code created with somewhat modern tools like Eclipse are UTF-8. Languages like Java used UTF-8 at least 15 years ago.
Trying is the first step towards failure - Homer J. Simpson
 

Online AndersJTopic starter

  • Frequent Contributor
  • **
  • Posts: 409
  • Country: se
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #10 on: March 03, 2019, 11:13:10 am »
"All" my other machines which do not have this problem uses cp 850.
The one with the problem uses cp 65001.

And yes,
perhaps I should convert,
but first I would like to understand what is going on.
I would also like to understand why THIS particular W10 machine behaves like this and others don't.

I have attached a file, called TextFile_Encoding.txt.

Below is a screenshot of the file displayed correctly on "any" other machine.



Below is a screenshot of the file displayed in the new Dell W10/64.


« Last Edit: March 03, 2019, 11:20:34 am by AndersJ »
"It should work"
R.N.Naidoo
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1580
  • Country: de
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #11 on: March 03, 2019, 11:24:56 am »
OK, now we're getting somewhere. So there is a certain chance that the OEM code page is set wrong on this single machine.
Did you try to check the registry entry (with RegEdit)?
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP
The value is 850 for me and I bet it's the same on your "working machines".
If it's not 850 on the problem machine, this is most probably the culprit.
People report that changing this entry (to something invalid or Unicode) might stop Windows from booting though.

[EDIT]
Your text file is displayed correctly on my Win 10 machine btw. So it's obviously encoded in/for code page 850.

[EDIT2]
Before messing around with changing the registry, you could try to fix this using the system control panel. There is a "Time and Region" setting there (sorry, translated back from my German version) with a "Region" selection inside. Inside there (rightmost tab) there is a setting for "Unicode incompatible programs" which allows to change your location/language setting. Inside this dialog, there is a checkmark "Beta: UTF-8 support" which has to be unchecked. Also obviously, you should select "Swedish" or "Svensk" or whatever it's called in your version.
« Last Edit: March 03, 2019, 11:36:18 am by 0xdeadbeef »
Trying is the first step towards failure - Homer J. Simpson
 
The following users thanked this post: AndersJ

Online AndersJTopic starter

  • Frequent Contributor
  • **
  • Posts: 409
  • Country: se
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #12 on: March 03, 2019, 12:42:17 pm »
OxdeadBeef, you nailed it!!!

The registry key you describe is not to be found on any of my machines.

I did however find the "Time and Regions" settings that you suggested.
I unchecked the "Beta: UTF-8 support" and changed from english to swedish.

Now I get the correct encoding.

Thanks for putting me on the right track.
"It should work"
R.N.Naidoo
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1580
  • Country: de
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #13 on: March 03, 2019, 04:24:20 pm »
Ah, great to hear. Another mystery solved ;)
Trying is the first step towards failure - Homer J. Simpson
 

Offline frozenfrogz

  • Frequent Contributor
  • **
  • Posts: 936
  • Country: de
  • Having fun with Arduino and Raspberry Pi
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #14 on: March 03, 2019, 05:02:59 pm »
Congrats on solving the problem. :)
I would however strongly recommend getting rid of Windows 10 and stick to Windows 7 or a Linux flavor of your liking if that is an option. Windows 10 is so fcking bad and more problems will just be waiting to kick you in the balls.
He’s like a trained ape. Without the training.
 

Offline 001

  • Super Contributor
  • ***
  • Posts: 1170
  • Country: aq
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #15 on: March 04, 2019, 04:12:25 am »
Congrats on solving the problem. :)
I would however strongly recommend getting rid of Windows 10 and stick to Windows 7 or a Linux flavor of your liking if that is an option. Windows 10 is so fcking bad and more problems will just be waiting to kick you in the balls.

holywar detected  :-DD
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8640
  • Country: fi
Re: Windows 10 character encoding in Notepad & Wordpad - Need help.
« Reply #16 on: March 04, 2019, 12:28:25 pm »
Now I purchase this ONE exception, that cannot display the files correctly.
I am looking for an explanation of why this suddenly happens.
After that I want to fix the problem in the NEW machine.

The underlying problem is that a .txt format does not convey what the binary values mean - i.e., the .txt format doesn't tell which character set it uses!

So, the only ways they have when opening your .txt file are:
1) Ask the user every time you open a .txt file, or,
2) Assume, possibly using an operating system wide setting

To make everything appear easy, software designers often choose 2), especially for simplistic programs like MS Notepad. However, about every 20 years, the assumptions change. We (Scandinavian countries) had our de-facto assumed character encodings between about 1995 to about 2010, which worked fairly well. Now we have been in a fuzzy area again, but it seems this will be less of an issue as time goes by and everybody's just using UTF-8. Maybe simplistic (headerless, metadataless) formats such as .txt also phase out slowly.

There is nothing wrong in your computer, nor in your text files. Everything worked out for you by lucky assumption. Now it's just working as it's supposed to: with manual extra work of guiding the editor. You either are lucky, or you need to do this manual step. Use any text editor that allows you to choose the character encoding when opening a file, instead of just assuming it. This editor may not come with your operating system, so you may need to download an external, third-party program. Then, it's completely up to you whether you keep the files as they are, opening them with such an editor that can handle them; or whether you want to convert them to Unicode/UTF-8; or something else. You don't need to convert them.

Hope this helps.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf