Tillbaka till svenska Fidonet
English   Information   Debug  
ENET.SYSOP   33805
ENET.TALKS   0/32
ENGLISH_TUTOR   0/2000
EVOLUTION   0/1335
FDECHO   0/217
FDN_ANNOUNCE   0/7068
FIDONEWS   23541
FIDONEWS_OLD1   0/49742
FIDONEWS_OLD2   0/35949
FIDONEWS_OLD3   0/30874
FIDONEWS_OLD4   0/37224
FIDO_SYSOP   12847
FIDO_UTIL   0/180
FILEFIND   0/209
FILEGATE   0/212
FILM   0/18
FNEWS_PUBLISH   4193
FN_SYSOP   41525
FN_SYSOP_OLD1   71952
FTP_FIDO   0/2
FTSC_PUBLIC   0/13584
FUNNY   0/4886
GENEALOGY.EUR   0/71
GET_INFO   105
GOLDED   0/408
HAM   0/16053
HOLYSMOKE   0/6791
HOT_SITES   0/1
HTMLEDIT   0/71
HUB203   466
HUB_100   264
HUB_400   39
HUMOR   0/29
IC   0/2851
INTERNET   0/424
INTERUSER   0/3
IP_CONNECT   719
JAMNNTPD   0/233
JAMTLAND   0/47
KATTY_KORNER   0/41
LAN   0/16
LINUX-USER   0/19
LINUXHELP   0/1155
LINUX   0/22012
LINUX_BBS   0/957
mail   18.68
mail_fore_ok   249
MENSA   0/341
MODERATOR   0/102
MONTE   0/992
MOSCOW_OKLAHOMA   0/1245
MUFFIN   0/783
MUSIC   0/321
N203_STAT   900
N203_SYSCHAT   313
NET203   321
NET204   69
NET_DEV   0/10
NORD.ADMIN   0/101
NORD.CHAT   0/2572
NORD.FIDONET   189
NORD.HARDWARE   0/28
NORD.KULTUR   0/114
NORD.PROG   0/32
NORD.SOFTWARE   0/88
NORD.TEKNIK   0/58
NORD   0/453
OCCULT_CHAT   0/93
OS2BBS   0/787
OS2DOSBBS   0/580
OS2HW   0/42
OS2INET   0/37
OS2LAN   0/134
OS2PROG   0/36
OS2REXX   0/113
OS2USER-L   207
OS2   0/4785
OSDEBATE   0/18996
PASCAL   0/490
PERL   0/457
PHP   0/45
POINTS   0/405
POLITICS   0/29554
POL_INC   0/14731
PSION   103
R20_ADMIN   1117
R20_AMATORRADIO   0/2
R20_BEST_OF_FIDONET   13
R20_CHAT   0/893
R20_DEPP   0/3
R20_DEV   399
R20_ECHO2   1379
R20_ECHOPRES   0/35
R20_ESTAT   0/719
R20_FIDONETPROG...
...RAM.MYPOINT
  0/2
R20_FIDONETPROGRAM   0/22
R20_FIDONET   0/248
R20_FILEFIND   0/24
R20_FILEFOUND   0/22
R20_HIFI   0/3
R20_INFO2   2791
R20_INTERNET   0/12940
R20_INTRESSE   0/60
R20_INTR_KOM   0/99
R20_KANDIDAT.CHAT   42
R20_KANDIDAT   28
R20_KOM_DEV   112
R20_KONTROLL   0/13063
R20_KORSET   0/18
R20_LOKALTRAFIK   0/24
R20_MODERATOR   0/1852
R20_NC   76
R20_NET200   245
R20_NETWORK.OTH...
...ERNETS
  0/13
R20_OPERATIVSYS...
...TEM.LINUX
  0/44
R20_PROGRAMVAROR   0/1
R20_REC2NEC   534
R20_SFOSM   0/340
R20_SF   0/108
R20_SPRAK.ENGLISH   0/1
R20_SQUISH   107
R20_TEST   2
R20_WORST_OF_FIDONET   12
RAR   0/9
RA_MULTI   106
RA_UTIL   0/162
REGCON.EUR   0/2055
REGCON   0/13
SCIENCE   0/1206
SF   0/239
SHAREWARE_SUPPORT   0/5146
SHAREWRE   0/14
SIMPSONS   0/169
STATS_OLD1   0/2539.065
STATS_OLD2   0/2530
STATS_OLD3   0/2395.095
STATS_OLD4   0/1692.25
SURVIVOR   0/495
SYSOPS_CORNER   0/3
SYSOP   0/84
TAGLINES   0/112
TEAMOS2   0/4530
TECH   0/2617
TEST.444   0/105
TRAPDOOR   0/19
TREK   0/755
TUB   0/290
UFO   0/40
UNIX   0/1316
USA_EURLINK   0/102
USR_MODEMS   0/1
VATICAN   0/2740
VIETNAM_VETS   0/14
VIRUS   0/378
VIRUS_INFO   0/201
VISUAL_BASIC   0/473
WHITEHOUSE   0/5187
WIN2000   0/101
WIN32   0/30
WIN95   0/4277
WIN95_OLD1   0/70272
WINDOWS   0/1517
WWB_SYSOP   0/419
WWB_TECH   0/810
ZCC-PUBLIC   0/1
ZEC   4

 
4DOS   0/134
ABORTION   0/7
ALASKA_CHAT   0/506
ALLFIX_FILE   0/1313
ALLFIX_FILE_OLD1   0/7997
ALT_DOS   0/152
AMATEUR_RADIO   0/1039
AMIGASALE   0/14
AMIGA   0/331
AMIGA_INT   0/1
AMIGA_PROG   0/20
AMIGA_SYSOP   0/26
ANIME   0/15
ARGUS   0/924
ASCII_ART   0/340
ASIAN_LINK   0/651
ASTRONOMY   0/417
AUDIO   0/92
AUTOMOBILE_RACING   0/105
BABYLON5   0/17862
BAG   135
BATPOWER   0/361
BBBS.ENGLISH   0/382
BBSLAW   0/109
BBS_ADS   0/5290
BBS_INTERNET   0/507
BIBLE   0/3563
BINKD   0/1119
BINKLEY   0/215
BLUEWAVE   0/2173
CABLE_MODEMS   0/25
CBM   0/46
CDRECORD   0/66
CDROM   0/20
CLASSIC_COMPUTER   0/378
COMICS   0/15
CONSPRCY   0/899
COOKING   28499
COOKING_OLD1   0/24719
COOKING_OLD2   0/40862
COOKING_OLD3   0/37489
COOKING_OLD4   0/35496
COOKING_OLD5   9370
C_ECHO   0/189
C_PLUSPLUS   0/31
DIRTY_DOZEN   0/201
DOORGAMES   0/2014
DOS_INTERNET   0/196
duplikat   6000
ECHOLIST   0/18295
EC_SUPPORT   0/318
ELECTRONICS   0/359
ELEKTRONIK.GER   1534
ENET.LINGUISTIC   0/13
ENET.POLITICS   0/4
ENET.SOFT   0/11701
Möte FIDONEWS_OLD4, 37224 texter
 lista första sista föregående nästa
Text 36358, 146 rader
Skriven 2016-09-06 13:10:48 av Michiel van der Vlist (2:280/5555)
Ärende: A plea for UTF-8
========================
Hello All,

This is a rehatch of a Fidonews article I worte 5 years ago. On Request of Lee.

=============   begin =====


                A PLEA FOR UTF-8 IN FIDONET  Part 1
                By Michiel van der Vlist. 2:280/5555

First there was the spoken word. That was long time ago, nobody knows exactly
how long, but it must have been in the order some hundred thousand years ago.
Later, much, much later came the written word. In the order five thousand years
ago. To get a message from one place to another. A messenger needed to
physically transport an object with the text written on it from A to B.

Forget about the semaphore and let us jump straight to transporting messages
over electric wire. With that came the need for an encoding scheme. One of the
first encoding schemes was Morse Code. Named after its (co) inventor Samual
Morse. This was around 1840. Since this was invented in the western World,
mostly the USA, it is no surprise that Morse code only covers the digits 0-9, a
few special characters, such as the question mark and the period, plus 26
letters of the Latin alphabet. Nowadays Morse Code is used only by a small
group of radio amateurs but for over a century, it was a mainstream coding
method for telecommunication.

Next step was Baudot code. Used in the telex communication system. A five bit
code that covered the 26 letters of the Roman alphabet plus the digits 0-9 and
some punctuation and control signals. Like Morse code, no distinction between
upper and lower case.

In the fifties of the previous century, the first computers entered the scene.
At first these were bulky pieces of machinery filling an entire room. They were
programmed by entering the binary code directly into memory by so called sense
switches. This was cumbersome and error prone. Soon the need developed to have
a way to directly enter the mnemonics used to memorise the instructions into
the computer and let the commputer itself do the translation into binary form
instead of the operator manually entering the binary code.

with that came the need for a character encoding scheme for computers. Several
encoding schemes were used in the beginning, but in the end it  converged into
an 8 bit code that seemed to fit computers like a glove. Or to be more precise
a seven bit code. Used on 8 bit transport media, but only the lower seven bits
were used for encoding text. The highest bit was used as an error detection
mechanism: the parity bit.  This was ASCII, The American Standard Code for
Information Interchange. First introduced in 1960.

The "A" is "ASCII" stands for "American". So it is no surprise that as far as
the letters go, once again it only covers the 26 letters found in American
English. ASCII is much richer that all of its predecessors, it has many
punctuation and special characters, 32 - now mostly obsolete - control codes
and as a new feature, the distinction between upper and lower case.

That the character set is limited to what is found in American English, was no
great limitation in the beginning of the history of data processing. Computers
becasue of their bulk and cost were only to be found at government institutes,
large companies and universities. They were used by scientists and engineers.
Those could deal with ASCII only mnachines.

What nobody could foresee when ASCII was devised, happened some two decades
later. Computers became small enough and cheap enough to allow individuals to
have their own private computer ( a PC ) all for themselves in their own homes.

With affordable home computers, came affordable printers and that was the end
of the classic type writer. Computer use was no longer limited to research
workers who's employers could afford tons of research equipment, but by people
that could afford type writers. And then when those New type writers"  spread
around the world came the need for more than just ASCII. While ASCII was enough
for US Americans using type writers, it was not enough for the rest of the
world. ASCII only became a stranglehold. Those new computer users wanted to
write in their own language. A language that used characters with accents,
umlauts, slashes and even characters not at all resembling the Roman alphabet.
Cyrillics, or even more complex Asian and Arabic languages.

Microsoft and IBM were quick to respond. They introduced the concept of code
pages. ASCII is seven bit, but computers store information in lumps of eight
bits called a byte. The most significant bit, originally meant as a parity bit,
but obsoleted by more robust error checking mechanisms, was free to define
another 128 characters. IBM choose to not only include language specific
characters in that set of 128, but to also include some 30+ so called "graphic
characters" for line drawing. That may have been a good idea at the time, but
in retrospect it may have been a waste of valuable coding space.

Anyway, at the end of the DOS era, there were dozens of code pages, covering
the needs for hundreds of languages. One could write in German, Swedish,
Russian and Greek without problems. Well, one could not write in Greek and
Russian in the same article because on e could not change code pages in mid
stream. But who wanted that?

And then came the InterNet. And with the Internet came the World Wide Web. In
the beginning the web just copied the solution to language issues from DOS.
code pages and more code pages. It did not take much more than a decade to
realise that the eight bit barrier was the second stranglehold. Not being able
to write Russian and Greek in one and the same article was NOT acceptable.
Eight bits for a character set was NOT good enough.

Fortunately the price of memory had also dropped spectacularly. Also the price
of transporting bits had dropped steadily. Memory had become so cheap that it
became affordable to store pictures in digital form. Pictures take orders of
magnitude more storing space than text. So increasing the required storing
space for text by a factor of two by going from a one byte character encoding
scheme to a multi byte encoding scheme, no longer met with economic
restrictions.

Enter Unicode.

Unicode introduces the concept of The Universal Character Set. It is not a
static entity, it is still growing. Presently there are over a million
characters defined. While in the code page concept, character set and character
encoding scheme are one and the same, in Unicode they are decoupled. There is
ONE charceter set: the Universal Character Set. There are several encoding
schemes that all have their merits.

First there is UTF-7. Designed for stone age transport layers that are 7 bits
only. Next there is UTF-8. This is an 8 byte multibyte encoding that takes one
to six bytes to encode a character. Next there is UTF-16. Not suitable for byte
onrientated transport media that use NULL as a special character, but is is
used internally by Windows from XP and up. And finally there is UTF-32.

The obvious choice for FidoNet is UTF-8. The transport layer of FidoNet is
fully 8 bit transparent, with the exception of the NULL byte that is used as a
termination character. Since UTF-8 is fully downward compatible with ASCII, the
first 127 characters in the Universal Character set are the same as the ASCII
set and they are encoded in exactly the same way. So the NULL in UTF-8 is the
same as the NULL in ASCII, so no problem. Also there will be no conflict with
those that have no need for anything other than good old 7 bit ASCII. They can
keep using the software that they have been using all the time and everyone
will see the same text on his/her screen.

Next week we will go into some details on how to get UTF-8 encoded FidoNet
message on your screen.

To be continued....


¸ Michiel van der Vlist, all rights reserved.
Permission to publish in the FIDONEWS file scho and the FIDONEWS
discussion echo as originating from 2:2/2

======= end  ======


Cheers, Michiel

--- GoldED+/W32-MSVC 1.1.5-b20130111
 * Origin: http://www.vlist.org (2:280/5555)