Author Topic: is there someone here with experiences in building compilers ?  (Read 9140 times)

0 Members and 1 Guest are viewing this topic.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
is there someone here with experiences in building compilers ?
« on: November 21, 2015, 09:39:18 pm »
just to know  :)
 


Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #2 on: November 21, 2015, 10:44:11 pm »
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #3 on: November 21, 2015, 11:24:17 pm »
AST based ?
with the complexity of C language compiler ?

if so, can you give me a few points about fresh start up
(book, slide, whatever is useful)
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #4 on: November 22, 2015, 12:01:56 am »
AST based ?
with the complexity of C language compiler ?

if so, can you give me a few points about fresh start up
(book, slide, whatever is useful)

Hi Legacy, yes, I wrote/contributed to a lot of code generators and optimizers over the years. I think your goal wants to do a Lint/MISRA checker right? If so, you need to first categorize the MISRA rules into two sets: a) ones that a single file processor can handle, b) ones that require knowledge about multiple files or entire program. Then I would start with a known tools like splint or may be even GCC/CLang and look at the deficiencies. The toughest part would be the multiple file requirements.

As for books/slides etc., your needs are fairly specialized, e.g. you don't really need to know ins and outs of parsing etc. per se. You also need to specify whether your tool can handle incorrect input or only input that are accepted by compilers. I would highly recommend the latter.

Hope this helps.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #5 on: November 22, 2015, 12:08:34 am »
I think your goal wants to do a Lint/MISRA checker right?

bingo  :D

but I'd also like to create a compiler for my softcore (fpga), which has uncommon ISA

ones that require knowledge about multiple files or entire program

I have already written a db-oriented file checker, it collects information from every file.c
it's currently used to do
- deadcode check
- inter modules dependencies

I can reuse this project

GCC/CLang and look at the deficiencies

Gcc: too complex
Clang: may be, it looks better, and well documented!
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8240
Re: is there someone here with experiences in building compilers ?
« Reply #6 on: November 22, 2015, 03:53:42 am »
Yes, I have written a few compilers/interpreters.

Take a look at this for some inspiration, a C-like self-compiling interpreter in only 4 functions and under 1kLOC:
https://github.com/rswier/c4/blob/master/c4.c
 

Offline ivaylo

  • Frequent Contributor
  • **
  • Posts: 661
  • Country: us
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #8 on: November 22, 2015, 10:42:28 am »
The Dragon Book is classic - https://en.m.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools

Classic and "none-better", but not very useful for writing a MISRA checker.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #9 on: November 22, 2015, 04:15:30 pm »
I few months ago I experimented an RPN code gen (posted here ) :D
So I have already had a little taste about the background with such a things (and I can reuse my code)


Quote
I've never understood the crazy demand for people collecting books which should teach about how to write a compiler and then they simply invoke lord Bison and mistress Lex during their laboratory activities.

  • Bison: YACC-compatible Parser Generator
  • Lex: Lexical Analyzer Generator
  • Yacc: Yet Another Compiler-Compiler
  • Flex: Fast scanner generator
  • Coco/R Compiler Generator
Sure there's enough software and hardware floating around now that you can mess about and possibly crank out something good with them on linux (which has the above tools), but ... I wonder how many knowledgeable people have really eaten a salad of math.

These yet another compiler - tools are good but … they tend to hide all the magic behind. While the math is rigorous and well defined, in computer science you can have cold drinks of various mixtures of raw or cooked fruits. That's the real fun.

See the following expression

Code: [Select]
# eval 1+(2.2*3.0E9+ ( 0xdeadbeaf logicalAnd 0xffffffff ) )

eval has said …. it's a "good expression" ... ohhh my goodness, certainly a bug in my source code, but what a delicious salad, let me offer a shake of different kinds, floating points with hex, logical with arithmetic and evil code

Code: [Select]
yards analysis: PASSED
stack analysis: PASSED

rpn_v[]={ 0 3 5 4 8 10 9 6 1 }

[1] 1 token_DecValue, type12
[2.2] 1 token_FloatinPointENGValue, type15
[3.0E9] 1 token_FloatinPointSCIValue, type16
[*] -1 token_Asterisk, type55
[0xdeadbeaf] 1 token_HexValue, type13
[0xffffffff] 1 token_HexValue, type13
[logicalAnd] -1 token_LogicalAnd, type41
[+] -1 token_Plus, type53
[+] -1 token_Plus, type53

#########################
#    good expression    #
#########################
(it's a lie, it's not a good expression, it's simply a bug)

Reading books like "dragon book" and "computer science algorithms for dummies" I was tempted to do my best to produce the worst code, so I have been playing with lexers, parsers, and RPN for weeks. The result is a black mass. The lexer is working good, but It was extremely difficult to make it to understand floating point notation, I was tempted to reinvent the wheel, then I grabbed the idea from SCI and ENG notations. It's used in CASIO pocket calculators. It was also more difficult to play with stack to make the engine to understand operators, their precedence and associativity

Code: [Select]
/*
 *  precedence   operators       associativity
 * ---------------------------------------------
 *  1            !               right to left
 *  2            * / %           left to right
 *  3            + -             left to right
 *  4            =               right to left
 */
 

now it seems it's beginning to work

Code: [Select]
# eval 1.1 + 2.2 * 3.0E9

yards analysis: PASSED
stack analysis: PASSED

rpn_v[]={ 0 2 4 3 1 }

[1.1] 1 token_FloatinPointENGValue, type15
[2.2] 1 token_FloatinPointENGValue, type15
[3.0E9] 1 token_FloatinPointSCIValue, type16
[*] -1 token_Asterisk, type55
[+] -1 token_Plus, type53

#########################
#    good expression    #
#########################

Code: [Select]
# eval 1+2+

yards analysis: PASSED
stack analysis: FAILED

#########################
#    bad  expression    #
#########################

Code: [Select]
# eval 1+2+(2+3

Error: parentheses mismatched/case1

yards analysis: FAILED
stack analysis: PASSED

#########################
#    bad  expression    #
#########################

but …. as you can see in the first eval above, the calculator has also learnt how to tell "lies"

this happens because in computer science you have to care about
  • data size (8bit, 16bit, 32bit, ...)
  • data type (integer, floating point, ...)
  • data sign (unsigned, signed, ...)
and certainly other things.



Today I have implemented a few mechanisms to handle functions. Probably I am too strong data type addicted because I used Pascal and ADA for too long.

Code: [Select]
# eval Fa(Fb(1,2,3,4,5),Fc(6,7,8,9),0)

yards analysis: PASSED
stack analysis: PASSED

function_pool has 3 functions
 + function1 [Fa] { typeRet typeRet type12 } 3 args
 + function2 [Fb] { type12 type12 type12 type12 type12 } 5 args
 + function3 [Fc] { type12 type12 type12 type12 } 4 args

rpn_v[]={ 4 6 8 10 12 2 17 19 21 23 15 26 0 }

[1] 1 token_DecValue, type12
[2] 1 token_DecValue, type12
[3] 1 token_DecValue, type12
[4] 1 token_DecValue, type12
[5] 1 token_DecValue, type12
[Fb] -4 token_StrictAlphaNum, type19
[6] 1 token_DecValue, type12
[7] 1 token_DecValue, type12
[8] 1 token_DecValue, type12
[9] 1 token_DecValue, type12
[Fc] -3 token_StrictAlphaNum, type19
[0] 1 token_DecValue, type12
[Fa] -2 token_StrictAlphaNum, type19

#########################
#    good expression    #
#########################

I am considering the RPN method as the hypothetical way to transform a math expression into machine opcodes, crazy idea for AS. I do not like methods like Recursive Descent.




recipes of lexers, parsers, interpreters, compilers
anybody is cooking its own salad of math ?

let me know :D

Quote
I have added a new feature: codegen
It tries to generate opcodes from an RPN expression

here it is a simple example

Code: [Select]
# codegen 1+2+3

yards analysis: PASSED
stack analysis: PASSED

function_pool_show_all ... 0 functions

rpn_v[]={ 0 2 1 4 3 }

[1]  1 K3 cookie     token_DecValue, type12
[2]  1 K3 cookie     token_DecValue, type12
[+] -1 K5 operator   token_Plus, type53
[3]  1 K3 cookie     token_DecValue, type12
[+] -1 K5 operator   token_Plus, type53

#########################
#    good expression    #
#########################

##############[ code gen ]##############
load r1,1
load r2,2
uadd r1,r1,r2
load r2,3
uadd r1,r1,r2


Code: [Select]
# codegen 1+Fa(2)+Fb(3+4)

yards analysis: PASSED
stack analysis: PASSED
function_pool_show_all ... 2 functions
 + function1 [Fa] { type12 } 1 args ticket_rpn=2
 + function2 [Fb] { type12 } 1 args ticket_rpn=7

rpn_v[]={ 0 4 2 1 9 11 10 7 6 }

[1]  1 K3 cookie     token_DecValue, type12
[2]  1 K4 f_cookie   token_DecValue, type12
[Fa]  0 K2 function   token_StrictAlphaNum, type19
[+] -1 K5 operator   token_Plus, type53
[3]  1 K4 f_cookie   token_DecValue, type12
[4]  1 K4 f_cookie   token_DecValue, type12
[+] -1 K6 f_operator token_Plus, type53
[Fb]  0 K2 function   token_StrictAlphaNum, type19
[+] -1 K5 operator   token_Plus, type53

#########################
#    good expression    #
#########################

##############[ code gen ]##############
load r1,1
load_and_push r2,2
uadd SP,SP,2 /* arg_n, return address */
call Fa
load r2,(SP) /* load function result */
uadd r1,r1,r2
load_and_push r2,3
load_and_push r3,4
uadd r2,r2,r3
usub SP,SP,2
push r2
uadd SP,SP,2 /* arg_n, return address */
call Fb
load r2,(SP) /* load function result */
uadd r1,r1,r2
 

Offline zapta

  • Super Contributor
  • ***
  • Posts: 6189
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #10 on: November 22, 2015, 04:41:02 pm »
I have had good experience with javacc (Java Compiler Compiler). You write lexical, grammar and semantic rules in one files and it all type safe. It also has a mode called JJTree where it creates the AST using generic types but I prefer the custom tree with classes that represent the underlying language.

Here is an example of a C parser

https://java.net/downloads/javacc/contrib/grammars/C.jj
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #11 on: November 22, 2015, 09:35:08 pm »
The lexer is the easiest part of a compiler. Parsing expressions is easy. The most impressive I have ever seen is two mutually recursive function with a loop each. With that few lines, all the precedence and associativity issues were taken care of.

The hard part of parsing is the forward declaration issues, and any grammar that are LR-easy but LL-hard - this is why compiler parser generators are used. Otherwise, a top down parser wins everytime for cleanliness and ease of design/implementation. You also have to take care of the mechanics of maintaining name spaces and scopes, but those are trivial.

Then comes the semantic analysis.

Writing a toy compiler is easy. Writing a production compiler is hard.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline paf

  • Regular Contributor
  • *
  • Posts: 91
Re: is there someone here with experiences in building compilers ?
« Reply #12 on: November 23, 2015, 05:57:46 pm »
Software and/or Links for "making" compilers:

Java based:

AntLR
http://www.antlr.org/

JavaCC
https://java.net/projects/javacc

Jflex:
http://jflex.de/

CUP
http://www2.cs.tum.edu/projects/cup/


C based tools:

Ragel (not for writing a full compiler, but may be useful):
http://www.colm.net/open-source/ragel/ 

Portable C compiler
http://pcc.ludd.ltu.se/

LCC compiler:
https://sites.google.com/site/lccretargetablecompiler/

Tiny C compiler:
http://bellard.org/tcc/

SDCC - Small Device C Compiler
http://sdcc.sourceforge.net/

(Old) Books:
James Hendrix's  book "A Small-C Compiler: Language, Usage, Theory, and Design":
http://www.drdobbs.com/developer-network-small-c-compiler-book/184415519
https://en.wikipedia.org/wiki/Small-C

Allen Holub's book: "Compiler Design in C":
http://www.holub.com/software/compiler.design.in.c.html


 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #13 on: November 24, 2015, 12:32:39 am »
Software and/or Links for "making" compilers:

Java based:

AntLR
http://www.antlr.org/

JavaCC
https://java.net/projects/javacc

Jflex:
http://jflex.de/

CUP
http://www2.cs.tum.edu/projects/cup/


C based tools:

Ragel (not for writing a full compiler, but may be useful):
http://www.colm.net/open-source/ragel/ 

Portable C compiler
http://pcc.ludd.ltu.se/

LCC compiler:
https://sites.google.com/site/lccretargetablecompiler/

Tiny C compiler:
http://bellard.org/tcc/

SDCC - Small Device C Compiler
http://sdcc.sourceforge.net/

(Old) Books:
James Hendrix's  book "A Small-C Compiler: Language, Usage, Theory, and Design":
http://www.drdobbs.com/developer-network-small-c-compiler-book/184415519
https://en.wikipedia.org/wiki/Small-C

Allen Holub's book: "Compiler Design in C":
http://www.holub.com/software/compiler.design.in.c.html

thank you for this list  :-+
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8240
Re: is there someone here with experiences in building compilers ?
« Reply #14 on: November 24, 2015, 05:10:25 am »
The lexer is the easiest part of a compiler. Parsing expressions is easy. The most impressive I have ever seen is two mutually recursive function with a loop each. With that few lines, all the precedence and associativity issues were taken care of.
Something like this?

https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#more_climbing
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #15 on: November 24, 2015, 09:26:24 am »
If you want a copy of the Holub book, PM me. I will sell it cheap :-)
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline miguelvp

  • Super Contributor
  • ***
  • Posts: 5550
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #16 on: November 24, 2015, 09:49:26 am »
If you want a copy of the Holub book, PM me. I will sell it cheap :-)

I still have the hard cover one from 1990 that according to the big sticker it cost me $48 back then.

But as the link says, it's freely downloadable.
 

Offline paf

  • Regular Contributor
  • *
  • Posts: 91
Re: is there someone here with experiences in building compilers ?
« Reply #17 on: November 24, 2015, 12:51:27 pm »
Forgot about this one, very easy to read as an intro, with C/C++ and Java versions,  with updates and freely downloadable:

Seth D. Bergmann - Compiler Design Textbooks
http://elvis.rowan.edu/~bergmann/books.html



 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #18 on: November 24, 2015, 01:21:56 pm »
Seth D. Bergmann - Compiler Design Textbooks

Code: [Select]
url="http://elvis.rowan.edu/~bergmann/books/c_cpp/miniC"
list="
bisect.c
cos.c
exp.c
fact.c
gen.c
makefile
mini.c
mini.h
miniC.C
miniC.h
miniC.l
miniC.y
"

for item in $list
    do
        echo -n "$item ... "
        wget $url/$item 1>1 2>2
        echo "done"
    done

the above script gets sources, I just wanted to "check it out", but it seems I have a problem with Yacc

Code: [Select]
make
yacc -v miniC.y
miniC.y:65.42-43: $$ for the midrule at $5 of `ForStmt' has no declared type
miniC.y:66.69-70: $$ for the midrule at $5 of `ForStmt' has no declared type
miniC.y:67.42-43: $$ for the midrule at $8 of `ForStmt' has no declared type
miniC.y:70.42-43: $$ for the midrule at $8 of `ForStmt' has no declared type
miniC.y:73.42-43: $$ for the midrule at $8 of `ForStmt' has no declared type
miniC.y:88.42-43: $$ for the midrule at $2 of `WhileStmt' has no declared type
miniC.y:89.69-70: $$ for the midrule at $2 of `WhileStmt' has no declared type
miniC.y:90.42-43: $$ for the midrule at $6 of `WhileStmt' has no declared type
miniC.y:91.69-70: $$ for the midrule at $6 of `WhileStmt' has no declared type
miniC.y:97.42-43: $$ for the midrule at $5 of `IfStmt' has no declared type
miniC.y:98.72-73: $$ for the midrule at $5 of `IfStmt' has no declared type
miniC.y:99.42-43: $$ for the midrule at $7 of `IfStmt' has no declared type
miniC.y:100.70-71: $$ for the midrule at $7 of `IfStmt' has no declared type
make: *** [y.tab.c] Error 1
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #19 on: December 06, 2015, 04:32:59 pm »
Code: [Select]
private boolean_t is_ok;

/*
 * this is a comment
 */

/*
 * // this kind of comment is not accepted
 */
 

/*
this is ADA: does it look nicer than C ?

procedure Example is
  X: array (Integer range 1..5) of Integer;
begin
  for I in X'Range loop
    X(I) := I;
  end loop;
end Example;
 */

private uint32_t my_var1;
private uint32_t my_var2;

public boolean_t is_ok;

private char_t hAllo;
private char_t msg1[]="hAllo";
private char_t msg2[10];

public uint32_t ciao;

private boolean_t is_ok()
{
}

/*
 * hallo
 */

/*
// hAllo again
 */

private void_t hallo1
(
   p_my_t p_my
)
{

}

private void_t hallo2
(
   p_my_t p_my
)
{
    while (False)
    {
    ;
    }
}

private uint32_t my_var1;

public typedef struct
{
   {
       {
       }
   }
} my_t;


I am working on an interpreter

its grammar looks almost like the C89 grammar, but more simple and with a strict definition of what is public and what it's private
also every "type" must end with "_t", otherwise a syntax error is issued


consider it as an "home work" to learn more about "interpreters and compilers"
currently I have implemented only the parser without a semantical reaction
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #20 on: December 07, 2015, 08:13:01 am »

its grammar looks almost like the C89 grammar, but more simple and with a strict definition of what is public and what it's private
also every "type" must end with "_t", otherwise a syntax error is issued


consider it as an "home work" to learn more about "interpreters and compilers"
currently I have implemented only the parser without a semantical reaction

NEVER do that. Imposing a lexical convention onto language semantic is like dictating that spaces and indentation matter, or that variables must be upper case, or...  ;D

You need to learn about language grammar. For example, the declaration placement of the original C was defined that way is so that a) you only have to look for a declaration in certain places, e.g. right after a {, and b) the first "word" tells you whether it is a declaration or not (e.g. a type name of a typedef name). It's not that big of a deal to look for declaration everywhere but limiting it like the original C is still much better than using a lexical convention.
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: is there someone here with experiences in building compilers ?
« Reply #21 on: December 07, 2015, 04:34:42 pm »
but a constraint that I have in my job with avionic is: every type must be defined with typedef struct/union and MUST ends with "_t"

so you can't pass a parameter without a script type

foo(struct blabla var) is banned
foo (blabla_t var) is allowed

and you have to add "_t" to the end,

typedef struct { … } blabla_t;

and you have to use "p_" with pointer
 

Offline richardman

  • Frequent Contributor
  • **
  • Posts: 427
  • Country: us
Re: is there someone here with experiences in building compilers ?
« Reply #22 on: December 08, 2015, 08:08:09 am »
Yes, but YOUR language does not have to do it at the lexical level. The proper way to do it is to define a grammar where type if accepted is appropriate, then during the SEMANTIC phase, you add brain-fart constraints like the Avionic do-do-heads require.

This will allow your interpreter to be used in proper environment, without stupid constraints.

but a constraint that I have in my job with avionic is: every type must be defined with typedef struct/union and MUST ends with "_t"

so you can't pass a parameter without a script type

foo(struct blabla var) is banned
foo (blabla_t var) is allowed

and you have to add "_t" to the end,

typedef struct { … } blabla_t;

and you have to use "p_" with pointer
// richard http://imagecraft.com/
JumpStart C++ for Cortex (compiler/IDE/debugger): the fastest easiest way to get productive on Cortex-M.
Smart.IO: phone App for embedded systems with no app or wireless coding
 

Offline jsmith45

  • Newbie
  • Posts: 4
Re: is there someone here with experiences in building compilers ?
« Reply #23 on: December 13, 2015, 09:43:00 am »
Yes, but YOUR language does not have to do it at the lexical level. The proper way to do it is to define a grammar where type if accepted is appropriate, then during the SEMANTIC phase, you add brain-fart constraints like the Avionic do-do-heads require.

This will allow your interpreter to be used in proper environment, without stupid constraints.


More practically, implementing these rules as a  constrain that is applied after parsing often makes it easier to provide a more useful error message. (Such as being able to find the typedef for the type and show the correct typedef name they should have used in the error message).
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf