White space is significant in all written languages.
No it isn't. Most phonetically written languages demand white space to break things up, but non-phonetic written languages, like Chinese, don't. I don't think Hangul (Korean) requires white space, and that is phonetic.
Amazing number of successful languages with no phonetic basis. Hmm.
We can throw art and a bit of psychology in here too. Negative space has contrast and meaning too.
And sometimes you can use it to take the piss
White space is significant in all written languages.
No it isn't. Most phonetically written languages demand white space to break things up, but non-phonetic written languages, like Chinese, don't. I don't think Hangul (Korean) requires white space, and that is phonetic.
Amazing number of successful languages with no phonetic basis. Hmm.
I'm not sure any surviving writing languages have no phonetic basis. If you show me a Chinese character I don't know I can take a guess at roughly how to say it, and be in the ballpark about half the time. However, I'll guess a Cantonese pronunciation, which might be quite different from the Mandarin, Korean or multiple Japanese readings of the same character. Of course, every spoken language has a strong phonetic basis. Each one has its own limited set of phonemes it recycles in different patterns to make a large vocabulary with a distinctive audible character for that language. Some languages even have multiple phoneme sets used in different areas - e.g. US English has a different phoneme set from the English spoken everywhere else. That's why most US voice recognition systems generally do quite well over a wide range of US regional accents, but work poorly for English speakers from other countries.
Ironically, Algol68 is, of classic 3rd generation languages, the one closest to being an algebraic formalism; almost everything is an expression that delivers a value, even declarations. Its orthogonality is unchallenged in that generation of languages.
If one has problems searching for something in a program written in a language like algol one needs either a better editor or better discipline. Given that modern IDEs (and even compilers) will even try to correct your spelling of a misnamed variable I think it's a null concern.
e.g.
int main (int argc, char const *argv[])
{
int a_variable;
a_varaible = 1;
return 0;
}
cerebus@shu:~/Desktop$ cc -o Mispelling Mispelling.c
Mispelling.c:5:5: error: use of undeclared identifier 'a_varaible'; did you mean 'a_variable'?
a_varaible = 1;
^~~~~~~~~~
a_variable
Mispelling.c:3:9: note: 'a_variable' declared here
int a_variable;
^
1 error generated.
I think the idea of having significant white space is so utterly incompetently removed-from-reality stupid that people should be forced to read L Ron Hubbard books for advocating it.
White space is significant in all written languages.
Yes, you nit-picker! With the exceptions mentioned. But, "whitespace" encompasses at least three characters, space, tab and newline (if we're limiting ourselves to the 7-bit ASCII subset of UTF-8, which is reasonable to most code). I'm a bit on the fence with the newline (depends on language), but I'm firmly of the opinion that one space or 20, one tab or two, all these must count as "whitespace" syntactically.
If the compiler cares beyond that, it is broken. Or written by an idiot.
Sometimes it can be really annoying if done wrongly.
So enforcing a standard and consistency is more important than whether or not it exists or not.
As for brogrammers, hmm, IDEs are invariably better at presenting concerns than humans are at noticing them. As someone adept at tearing a new asshole in peoples code for a living, the IDE is a winner.
Yes, yes, and yes. I understand all this. But undistinguishable characters carrying meaning, no.
White space is significant in all written languages.
No it isn't. Most phonetically written languages demand white space to break things up, but non-phonetic written languages, like Chinese, don't. I don't think Hangul (Korean) requires white space, and that is phonetic.
And as far as programming languages go Algol68 gives not one jot about white space, as this trivial and rather ridiculous example proves:
spaced.a68:
(
INT a variable name with spaces in it := 1;
INT avariablenamewithoutspacesinit := 2;
avariablenamewithoutspacesinit := a variable name with spaces in it;
avariablenamewithspacesinit := a variable name without spaces in it;
print (("I'm done:", avariablenamewithoutspacesinit))
)
unspaced.a68:
(INTavariablenamewithspacesinit:=1;INTavariablenamewithoutspacesinit:=2;avariablenamewithoutspacesinit:=avariablenamewithspacesinit;avariablenamewithspacesinit:=avariablenamewithoutspacesinit;print(("I'm done:",avariablenamewithoutspacesinit)))
cerebus@shu:~/Desktop$ a68g --clock spaced
I'm done: +1
Genie finished in 0.00 seconds
cerebus@shu:~/Desktop$ a68g --clock unspaced
I'm done: +1
Genie finished in 0.00 seconds
cerebus@shu:~/Desktop$
Those semicolons are just whitespaces with the pen down
Edit: I've actually come to the conclusion that Algol 68 is retarded there.
I mean imagine the shenanigans where "experts exchange" and "expert sex change" evaluate to the same identifier
Ironically, Algol68 is, of classic 3rd generation languages, the one closest to being an algebraic formalism; almost everything is an expression that delivers a value, even declarations. Its orthogonality is unchallenged in that generation of languages.
If one has problems searching for something in a program written in a language like algol one needs either a better editor or better discipline. Given that modern IDEs (and even compilers) will even try to correct your spelling of a misnamed variable I think it's a null concern.
e.g.
int main (int argc, char const *argv[])
{
int a_variable;
a_varaible = 1;
return 0;
}
cerebus@shu:~/Desktop$ cc -o Mispelling Mispelling.c
Mispelling.c:5:5: error: use of undeclared identifier 'a_varaible'; did you mean 'a_variable'?
a_varaible = 1;
^~~~~~~~~~
a_variable
Mispelling.c:3:9: note: 'a_variable' declared here
int a_variable;
^
1 error generated.
The compilers are getting pretty helpful. Editors not so much. The compiler is only being helpful because its looking for anomalies. If it flagged everything that was similar but not identical, you'd get a useless flood of reports. Having had a LOT of trouble comparing Unicode strings that can take numerous forms, I'd like to keep my life simple by making objects have only a single distinct form.
I think the idea of having significant white space is so utterly incompetently removed-from-reality stupid that people should be forced to read L Ron Hubbard books for advocating it.
White space is significant in all written languages.
Yes, you nit-picker! With the exceptions mentioned. But, "whitespace" encompasses at least three characters, space, tab and newline (if we're limiting ourselves to the 7-bit ASCII subset of UTF-8, which is reasonable to most code). I'm a bit on the fence with the newline (depends on language), but I'm firmly of the opinion that one space or 20, one tab or two, all these must count as "whitespace" syntactically.
If the compiler cares beyond that, it is broken. Or written by an idiot.
That was really my point though. All this is down to semantics and we took those semantics from written languages because there is parity, so whitespace is both important and unimportant depending on the context of the usage even if it has no obvious functional value. In written languages, we use leading spaces to define blocks, indentation to define quotes and subtexts etc so there is contextual functional value.
If you then throw humans at it, you're going to get this happen at least once. It did, with Python and YAML. And it works reasonably well.
The only inexcusable things that we did as a species was conflate the issue by introducing the tabulation meta-character which is about as well defined as an amorphous blob of dog poop stuck to your shoe and entirely invisible in situ. The abstract conversions between that and spaces are a war crime and drove much polarising thought around language design.
Now when I write a lexical analyser, which I do occasionally, it will as a rule consume insignificant whitespace as part of the language specification. Scopes will be defined as curly braces too. But that's not because it's semantically better but because it's a hell of a lot easier to write a simple parser stack if you actually hint it in some way.
We don't use whitespace as a rule, not because we can't or shouldn't, but because our historical compilers are stupid and the programmers were lazy. I know I am both
Came here to say that I find it nearly comical that there are professional programmers, proficient in multiple languages, that feel the use of white space for flow control is a deal breaker.
C'mon you guys. The recent thread about tabs vs spaces highlighted how oddly important this is to some of you, but every editor and IDE that I use makes consistent indenting effortless. Tab to go up one level and shift-tab to go down. Every one of those editors will insert spaces instead of tab characters if that's what you want, but that's nearly immaterial.
Considering that moving between the different languages always requires some degree of shift in thinking, the degree of passion seen about this particular detail is amazing. Programming languages are, by design, just a bunch of rules. Coding is an exercise in getting the job done using the rules at hand. None of this is news, nor is it news that different programmers prefer different sets of rules, or that different problems are better suited to different sets of rules. But the fact that so many programmers cling tightly to the idea that the rules should not EVER dictate how the code is arranged on the computer screen is baffling.
Came here to say that I find it nearly comical that there are professional programmers, proficient in multiple languages, that feel the use of white space for flow control is a deal breaker.
I don't think any of the
professionals have said that it's "a deal breaker", just that it's a dumb idea. If you've had to deal with people hunting around the keyboard for the "
any" key then you rapidly come to appreciate how important it is to favour the explicit over the implicit and the visible over the invisible. Indenting as flow control is invisibly implicit and hence inherently dangerous.
Came here to say that I find it nearly comical that there are professional programmers, proficient in multiple languages, that feel the use of white space for flow control is a deal breaker.
I don't think any of the professionals have said that it's "a deal breaker", just that it's a dumb idea. If you've had to deal with people hunting around the keyboard for the "any" key then you rapidly come to appreciate how important it is to favour the explicit over the implicit and the visible over the invisible. Indenting as flow control is invisibly implicit and hence inherently dangerous.
Hey I quite like it. My C is indented too...
IDLE by default transforms tabs into 4 spaces. Any editor can be configured to do the same and it is the Python standard.
You can do other indentations, and you'll have to deal with whatever problems that brings you....
https://peps.python.org/pep-0008/ Indentation: Use 4 spaces per indentation level.
Tabs or Spaces? Spaces are the preferred indentation method.
Any work team must define this to start working. In this case the Python team already gives you the preferred solution. Although you can change it if you want...
Throw some proper language in...
Better throw some real programming language in...
LDR R0, =100000000
EORS R1, R1
EORS R2, R2
SUBS R0, #1
loop
ADDS R1, R0
ADCS R2, #0
SUBS R0, #1
BNE loop
But 6.25 s to get correct result in R2:R1 = 0x0011c37934e58f80 (Cortex-M4@80 MHz).
Oh ho ho, here's some fun :-)
adding_arm.s:
.SYNTAX UNIFIED
.GLOBL main,printf
.THUMB_FUNC
main:
LDR R0, =100000000
EORS R1, R1
EORS R2, R2
SUBS R0, #1
loop:
ADDS R2, R0
ADCS R1, #0
SUBS R0, #1
BNE loop
LDR R0, =msg
B printf
msg:
.ASCIZ "%08x%08x\n"
adding_riscv.s
.GLOBL main,printf
main:
LI a0, 100000000
LI a1, 0
LI a2, 0
ADDI a0, a0, -1
loop:
ADD a2, a2, a0
SLTU a3, a2, a0
ADD a1, a1, a3
ADDI a0, a0, -1
BNEZ a0, loop
LA a0, msg
J printf
msg:
.ASCIZ "%08x%08x\n"
$ arm-linux-gnueabihf-gcc adding_arm.s -o adding_arm -static
$ time qemu-arm ./adding_arm
0011c37934e58f80
real 0m0.304s
user 0m0.304s
sys 0m0.000s
$
$ riscv64-unknown-elf-gcc adding_riscv.s -o adding_riscv -march=rv32ic -mabi=ilp32
$ time qemu-riscv32 adding_riscv
0011c37934e58f80
real 0m0.242s
user 0m0.238s
sys 0m0.004s
Who needs steeeekin' condition codes?
For the curious, the generated code:
000103e8 <main>:
103e8: 4808 ldr r0, [pc, #32] ; (1040c <msg+0xc>)
103ea: 4049 eors r1, r1
103ec: 4052 eors r2, r2
103ee: 3801 subs r0, #1
000103f0 <loop>:
103f0: 1812 adds r2, r2, r0
103f2: f151 0100 adcs.w r1, r1, #0
103f6: 3801 subs r0, #1
103f8: d1fa bne.n 103f0 <loop>
103fa: 4805 ldr r0, [pc, #20] ; (10410 <msg+0x10>)
103fc: f004 b956 b.w 146ac <_IO_printf>
00010400 <msg>:
10400: 78383025 .word 0x78383025
10404: 78383025 .word 0x78383025
10408: 0000000a .word 0x0000000a
1040c: 05f5e100 .word 0x05f5e100
10410: 00010400 .word 0x00010400
00010144 <main>:
10144: 05f5e537 lui a0,0x5f5e
10148: 10050513 addi a0,a0,256
1014c: 4581 li a1,0
1014e: 4601 li a2,0
10150: 157d addi a0,a0,-1
00010152 <loop>:
10152: 962a add a2,a2,a0
10154: 00a636b3 sltu a3,a2,a0
10158: 95b6 add a1,a1,a3
1015a: 157d addi a0,a0,-1
1015c: f97d bnez a0,10152 <loop>
1015e: 00000517 auipc a0,0x0
10162: 00c50513 addi a0,a0,12 # 1016a <msg>
10166: 1f60006f j 1035c <printf>
0001016a <msg>:
1016a: 3025
1016c: 7838
1016e: 3025
10170: 7838
10172: 000a
Thumb2: 44 bytes (as A32 it's 60 bytes)
RISC-V: 48 bytes
Came here to say that I find it nearly comical that there are professional programmers, proficient in multiple languages, that feel the use of white space for flow control is a deal breaker.
Say, you are a professional bus driver.
The bus has a feature than instead of steering wheel, you have this steering knob which you need to crank up to go left, and down to go right. The knob can also turn freely in every direction, and if you accidentally turn it left, full brakes will be applied, so you better be careful not to do it.
You can totally get used to it if you want, but almost every other bus uses steering wheel.
Is this a deal breaker?
Some design choices just are atrociously stupid. You can accept to work around them, of course, but don't be surprised if some find them deal breakers.
That was really my point though. All this is down to semantics and we took those semantics from written languages because there is parity, so whitespace is both important and unimportant depending on the context of the usage even if it has no obvious functional value. In written languages, we use leading spaces to define blocks, indentation to define quotes and subtexts etc so there is contextual functional value.
And no typographer would make the structure of the text dependent on small amounts of whitespace. It is not good form.
If you then throw humans at it, you're going to get this happen at least once. It did, with Python and YAML. And it works reasonably well.
The only inexcusable things that we did as a species was conflate the issue by introducing the tabulation meta-character which is about as well defined as an amorphous blob of dog poop stuck to your shoe and entirely invisible in situ. The abstract conversions between that and spaces are a war crime and drove much polarising thought around language design.
The only sensible solution is to treat it like it looks to the naked eye, as whitespace. Further significance (count, character code et c.) must be disregarded.
Now when I write a lexical analyser, which I do occasionally, it will as a rule consume insignificant whitespace as part of the language specification. Scopes will be defined as curly braces too. But that's not because it's semantically better but because it's a hell of a lot easier to write a simple parser stack if you actually hint it in some way.
We don't use whitespace as a rule, not because we can't or shouldn't, but because our historical compilers are stupid and the programmers were lazy. I know I am both
I am a simple parser; also both dumb and lazy. Therefore I humbly suggest the use of curly braces.
There are many of us who quite like the way Python takes into account the identation to separate blocks.
I think it's an elegant way to use identation, which everyone should do in any program, to get rid of annoying curly braces.
Anyway, there is no need to criticize Python to justify not using it. Almost anyone can decide whether to go into Python programming or decide to go into something else.
There are many of us who quite like the way Python takes into account the identation to separate blocks.
I think it's an elegant way to use identation, which everyone should do in any program, to get rid of annoying curly braces.
I use indentation to give the source some structure, but just a few spaces per block/step. Too many spaces would make the source harder to manage, at least for me. And I try to avoid tabs. If you write some complex code and use large indentations you would need an extra wide monitor or two to be able to see what you're doing - very unergonomic. However, I like curly braces and I will keep using them. This is the result of my personal experience of learning many (not just a few) programming languages over the years.
There are many of us who quite like the way Python takes into account the identation to separate blocks.
I think it's an elegant way to use identation, which everyone should do in any program, to get rid of annoying curly braces.
I use indentation to give the source some structure, but just a few spaces per block/step. Too many spaces would make the source harder to manage, at least for me. And I try to avoid tabs. If you write some complex code and use large indentations you would need an extra wide monitor or two to be able to see what you're doing - very unergonomic. However, I like curly braces and I will keep using them. This is the result of my personal experience of learning many (not just a few) programming languages over the years.
If you're getting too deep with indentation you probably should be creating more functions and calling them instead.
ALL my code is 1-2 levels deep in a function max, regardless of language. As complicated as it gets is a switch wrapped in a loop for a state machine and those are slowly moving into dispatch / function pointer tables.
Let's go on a bit of a tangent, then, related to blocks delimiters. (Python and Makefiles use indentation and not delimiters; and Makefiles' one is even more annoying, because it does distinguish between tab and space as the first character in indentation. I avoid that mess by using four spaces per indentation level in Python, and only a single tab in Makefiles.)
It seems that those who dislike indentation, prefer braces AKA curly brackets, { and } as the block delimiters. These are used in many languages (C, C++, Java, Javascript, Awk, CSS, Tex/LaTeX/MathJax) which makes them familiar to many. Are there any better suggestions?
Some languages (note I'm using 'languages' as the superset of 'programming languages', 'domain-specific languages', and 'markup languages' here) use different delimiters. The most different being the way XML has each block start paired with a corresponding end block, <tag ... >...</tag>; and later versions of Fortran's do/for/while/etc. paired with an end with an optional block type (i.e.end do,end for, etc.) The XML one has proven problematic for humans (especially since things like <a>...[/url] are typical human "errors"), leading to abandonment of XHTML in current HTML version, HTML5.
I've mentioned before in other threads that I myself like a variant of LaTeX, but with escape sequences that ensure the block structure characters do not occur anywhere else in the stream. For example, if { begins a block, | separates sibling blocks, } ends a block, and \ precedes an escape sequence, I'd prefer \/, \(, \), and \! to escape \ { } | in the content, correspondingly. It may not be a good idea for a programming language, because code must be maintainable and therefore easy for humans to understand at a glance; but for markup languages, it beats the shit out of XML and others in parser speed, efficiency, and possibilities (which include parsing the stream backwards), with just a couple of characters worth of buffering either way.
It is also notable that both ASCII and Unicode include dedicated control characters (and dedicated control glyphs in Unicode, in addition to the ASCII ones). Should we at some point switch from plain ASCII source code formats, into structured formats where the editor displays the code in whatever format the programmer wants – be that whitespace-indented or brace delimited or something more funky – and just uses a markup language underneath?
Interesting. I actually wrote a programming language inside Microsoft Word a few years back for document automation using VSTO. That was fully structured without ASCII and used nested document placeholders with expressions and a relatively powerful scheme-like meta-language. This was entirely processed server-side to generate PDFs for printing and mail outs for large financial companies. It allowed "normal users" to write documents programmatically. This was an extremely well received bit of tech that pushed the company into the market leading position overnight.
If you're getting too deep with indentation you probably should be creating more functions and calling them instead.
ALL my code is 1-2 levels deep in a function max, regardless of language. As complicated as it gets is a switch wrapped in a loop for a state machine and those are slowly moving into dispatch / function pointer tables.
That would be just two nested if-else? And yes, I already write a lot of functions. But in some situations an additional function would make things more cumbersome, waste ressources, or make an algorithm hard to comprehend.
I don't particularly like or dislike the enforced indentations. I wish they would have enforced a single way to do it though. Either exactly four spaces, a good compromise between easy to distinguish level and things falling off the screen to the right. Or a single tab character per indentation level, and let people choose themselves in their editor setting how it is displayed. Or something different, as long as it was absolute, universal and no exceptions.. And yes, the common advise is that you shouldn't nest very deep or very long code blocks anyway. Make more and more powerful abstractions instead.
I have no strong preference for python in general. It gives me the possibility to write efficient high level scripts, with a lot of power from tons of libraries. I understand there are other languages which provides this as well, perl being the one I've put the most effort towards. Perl has some nifty features to it, but the page long cartoon swear dialog just doesn't appeal to my code comprehension skills.
By the way, I didn't know until recently that you are "allowed" to
for word in ['hello','world]: print(word)
on a single line, as long as it is a single line. That's just seems like an unnecessary exception to me.
There are many of us who quite like the way Python takes into account the identation to separate blocks.
I think it's an elegant way to use identation, which everyone should do in any program, to get rid of annoying curly braces.
They're only annoying if you have difficulty comprehending the idea of
unit_sequence ::= unit_sequence ";" unit
| unit
unit :: = "{" unit_sequence "}"
| ...
the equivalent to discarding bracketing to indicate compound statements in written English would be to discard full stops and initial capitals on sentences and to then require a specific sequence of spacing to indicate the starts and ends of sentences it is doable but as you can see discarding just that element of punctuation makes written English harder to read I really don't get why some people are so frightened by a balanced brace or seven then again judging from the way some people write on here the idea of paragraphs sentences and restricting a paragraph to exploring a single idea are alien concepts of course the group that has most difficulty with the ideas in English are young children while they learn to marshal their thoughts and get to grips with the rules of writing whether that is an analogy for the people who are bothered by annoying braces or merely a stand alone observation I do not know
I don't particularly like or dislike the enforced indentations. I wish they would have enforced a single way to do it though. Either exactly four spaces, a good compromise between easy to distinguish level and things falling off the screen to the right. Or a single tab character per indentation level, and let people choose themselves in their editor setting how it is displayed. Or something different, as long as it was absolute, universal and no exceptions.
The first thing any team using python professionally does is (1) pick a rule for indentation that everybody uses (2) enforces it with pylint and check-in/build pipeline rules. The second thing they do is learn to use 'blame' in git to find the buggers who can't stick to the rules and/or find creative ways of circumventing the enforcement mechanisms.