Author Topic: Python becomes the most popular language (Read 101524 times)

Picuino · « **Reply #100 on:** October 24, 2021, 05:24:52 pm »

Some time ago I made a small macro to occupy a disk space that can be freed later in case the disk becomes full. 6GB of space that cannot be compressed. It can be done with many languages, but knowing Python did not complicate me more.

Code: (Python) [Select]

python -c "import random; fo=open('volume.txt', 'wb'); d = bytearray(random.getrandbits(8) for _ in range(1024*1024)); [fo.write(d) for i in range(1024*6)]; fo.close()"

It may be slower than C (1 minute of runtime), but why do I want more speed in a one-time routine?

Edit:
I know. You can do similar work with
dd bs=1024 count=6000000 </dev/urandom >volume.txt
It is only one example of multiple works that need low time programming, not at runtime.
Developing time matters.

Ed.Kloonk · « **Reply #101 on:** October 24, 2021, 06:14:13 pm »

Performs poorly here..

Nominal Animal · « **Reply #102 on:** October 24, 2021, 09:41:04 pm »

So silly. You pick the tool for the job. Or do you use your screwdriver to clean your toilet?

If you do numerical computation in Python, you use numpy or scipy, which provide interfaces to efficient native libraries (written in C/C++/Fortran).
If you try to write say a linear algebra library in pure Python, it will be slow. So don't do that.

The main reason I use Python and not Ruby or Lua is that Python is so damned easy to interface to native libraries using either the built-in ctypes module, or python-gi (PyGObject) for GObject-based libraries (with "gir", gobject introspection data). All you need is some *Python* code, and you can interface to just about any native library you want.

For example, with the gir1.2-gtk-3.0 (.typelib files describing Gtk+-3.0 library interfaces, language-independent not specific to Python) and python3-gi Debian packages, you only need
import gi
gi.require_version("Gtk", "3.0")
from gi.repository import Gtk
to get access to Gtk+ 3.0 from Python; and e.g.
win = Gtk.Window(title="Example Window")
win.show()
win.connect("destroy", Gtk.main_quit)
Gtk.main()
to create an empty application window.

Very little of the above is actually Python code: the lines containing gi runs a little bit of Python code to set up stuff, but most of it is in the C library (_gi.cpython*-linux-gnu.so in Linux). So, when running that sort of stuff, it does not matter if Python is slow, because you end up running very little Python code for normal tasks, with all the work done in native libraries. That is why I like Python for the stuff I use it for.

If I had to use Python for everything, for stuff like pseudorandom number generation using my own functions, it'd feel like cleaning toilets with a toothbush. I'd rather give up computing and become a woodworker or something.

Just_another_Dave · « **Reply #103 on:** October 24, 2021, 09:48:03 pm »

Quote from: mansaxel on October 24, 2021, 02:14:33 pm

Quote from: Just_another_Dave on October 24, 2021, 08:03:16 am
Scripting doesn’t need to be that fast to be useful

Anything that is faster than manually rearranging Excel is useful. The impact of scripting mundane tasks that people do the "emulate paper way" is immense and there is a lot that remains the old way.

And at the same time, it is way easier to debug than excel formulas, which makes it really useful for automating complex calculations

Ed.Kloonk · « **Reply #104 on:** October 24, 2021, 10:58:35 pm »

Quote from: Nominal Animal on October 24, 2021, 09:41:04 pm

Or do you use your screwdriver to clean your toilet?

Be honest. We've all had to at least once at some point.

Ed.Kloonk · « **Reply #105 on:** October 24, 2021, 11:04:06 pm »

Quote from: Nominal Animal on October 24, 2021, 09:41:04 pm

Very little of the above is actually Python code: the lines containing gi runs a little bit of Python code to set up stuff, but most of it is in the C library (_gi.cpython*-linux-gnu.so in Linux). So, when running that sort of stuff, it does not matter if Python is slow, because you end up running very little Python code for normal tasks, with all the work done in native libraries. That is why I like Python for the stuff I use it for.

What I like is that I can change a few lines in some script, for cosmetic reasons or a simple tweak without having to locate the exact source and trying to figure out the whole build tool chain.

Nominal Animal · « **Reply #106 on:** October 25, 2021, 11:39:59 am »

Quote from: Ed.Kloonk on October 24, 2021, 10:58:35 pm

Quote from: Nominal Animal on October 24, 2021, 09:41:04 pm
Or do you use your screwdriver to clean your toilet?
Be honest. We've all had to at least once at some point.

I have a poo stick for that that does not scratch the enamel. TMI?

I do use a flathead screwdriver occasionally to open the sort of plastic enclosures that have tabs that need to be pushed in for it to open, but either that's the proper tool (in electrical stuff, those tabs are designed to be pushed in with a flathead screwdriver), or the proper tool is so similar it's the nearest approximation without the proper tool. See, I did think carefully about the analog, even if it is a lame duck!

Quote from: Ed.Kloonk on October 24, 2021, 11:04:06 pm

What I like is that I can change a few lines in some script, for cosmetic reasons or a simple tweak without having to locate the exact source and trying to figure out the whole build tool chain.

That's exactly why I like having the UI in Python with the actual work done in a native library, like I said. And even when the work is sooper seecret proprietary stuff, it is safe to license the UI code more loosely, allowing users to fix and experiment with it.

Using just an external .ui/.glade XML description for the interface widgets does not tend to really work for this. Typically, the bugs do not occur in the widget definitions/specs, but in the event handlers, and possibly setup code, like overriding style sheets based on display resolution, and such.
So, I do find it important to expose the UI even handler code too, not just the widget hierarchy.

This kind of use (in both our cases) is not really tied to Python per se, really; it's just that right now, Python seems to be best suited for this stuff overall, considering both development (ease of interfacing, portability across OSes, availability and maintenance of bindings to native libraries – provided by said libraries, not by Python developers) and use (not too slow, gentle learning curve for non-programmers). Definitely not optimal, just overall best suited for now. Yet, popularity among developers (topic of this thread) doesn't actually matter.

I do keep a lazy eye out for the newer languages, though, exactly because you never know when something better pops up, and is worthy of experimentation, perhaps help, and eventually adoption as the "new Basic". In my view, that's exactly what happened to Python, and it wasn't a step backwards.

DiTBho · « **Reply #107 on:** October 25, 2021, 12:01:43 pm »

Last monday I developed a program to compare text files. The algorithm is a bit complex for me, so I implemented in Python just to study how to correctly design it, and I ended up with less than 100 lines of Python code in a couple of days, spending 2 hours per day, so 4-5 hours in total to complete it.

When I then translated in C, I ended up with more than 900 lines of code, and I spend much more time, at least 5 days due to bugs and things that are "built-in" with Python but not in C.

As benefit the C-version of the code is something like 10X faster than the python version, and consume a quarter of ram.

But now there is a problem: the source of Python version is much easier to read and understood by my colleagues than the C version, so they first look at the Python version, and they consider it *THE* source of reference, and I think that I am in trouble now

Just_another_Dave · « **Reply #108 on:** October 25, 2021, 12:08:32 pm »

Quote from: DiTBho on October 25, 2021, 12:01:43 pm

Last monday I developed a program to compare text files. The algorithm is a bit complex for me, so I implemented in Python just to study how to correctly design it, and I ended up with less than 100 lines of Python code in a couple of days, spending 2 hours per day, so 4-5 hours in total to complete it.

When I then translated in C, I ended up with more than 900 lines of code, and I spend much more time, at least 5 days due to bugs and things that are "built-in" with Python but not in C.

As benefit the C-version of the code is something like 10X faster than the python version, and consume a quarter of ram.

But now there is a problem: the source of Python version is much easier to read and understood by my colleagues than the C version, so they first look at the Python version, and they consider it *THE* source of reference, and I think that I am in trouble now

Sometimes the C standard library feels too small. Being small can be useful for portability, but having to use another library (or develop your own function) just to be able to multiply vectors is sometimes a bit annoying. It is something so common that it could be handled by the compiler

Kjelt · « **Reply #109 on:** October 25, 2021, 12:13:55 pm »

Quote from: DiTBho on October 25, 2021, 12:01:43 pm

Last monday I developed a program to compare text files. The algorithm is a bit complex for me, so I implemented in Python just to study how to correctly design it, and I ended up with less than 100 lines of Python code in a couple of days, spending 2 hours per day, so 4-5 hours in total to complete it.

When I then translated in C, I ended up with more than 900 lines of code, and I spend much more time, at least 5 days due to bugs and things that are "built-in" with Python but not in C.

As benefit the C-version of the code is something like 10X faster than the python version, and consume a quarter of ram.

But now there is a problem: the source of Python version is much easier to read and understood by my colleagues than the C version, so they first look at the Python version, and they consider it *THE* source of reference, and I think that I am in trouble now

No just write your own library , mimicking the python function calls and you're fine.
If that is what you want is a nother thing. But that is the main problem with external libraries, you never know how much overhead you take with you.
Also the biggest problem with newbies in small embedded applications, you just don't want to use a float or you get bloat.

Just_another_Dave · « **Reply #110 on:** October 25, 2021, 12:55:33 pm »

Quote from: Ed.Kloonk on October 24, 2021, 06:14:13 pm

Performs poorly here..

I’ve been following these videos and the results of languages that aren’t C or C++ have sometimes been affected by under-optimizations caused by the influence of being the developer unfamiliar with them. For example, Ada performed really bad compared to Pascal, but someone at comp.lang.ada seemed to have gotten a 5x performance improvement by carefully tuning the implementation (using types that could be accessed faster, etc)

madires · « **Reply #111 on:** October 25, 2021, 01:03:22 pm »

Quote from: DiTBho on October 25, 2021, 12:01:43 pm

As benefit the C-version of the code is something like 10X faster than the python version, and consume a quarter of ram.

Great example! The point is that you can do both, C and python, know the differences and therefore are able to decide which one is the better solution for a given problem. Someone who learns just one programming language can't do that - he is locked in.

tszaboo · « **Reply #112 on:** October 25, 2021, 01:45:30 pm »

Quote from: Ed.Kloonk on October 24, 2021, 06:14:13 pm

Performs poorly here..

Because his code is written to be slow.
It is obvious code that is ported from C without fully understanding all the features of the language.
He doesn't even use list comprehension.

If you follow the github link he provided it leads here:
https://github.com/davepl/Primes/tree/drag-race/PrimePython/solution_3
So an actual python programmer rewrote it. If you look at this benchmark, C was ~2500, python ~1500, bigger better.
I will argue that even the new code with numpy is inefficient, there seems to be a lot of inefficient 1 line function calls, and the data structure is weird. But it is kept that way to keep the code reminiscent to the original. If you write this code from scratch you wouldn't write it this way.

Bad code, 350K views.

Nominal Animal · « **Reply #113 on:** October 25, 2021, 02:07:19 pm »

Quote from: DiTBho on October 25, 2021, 12:01:43 pm

Last monday I developed a program to compare text files.

Even in C, there are multiple ways to read lines from a stream (as in C stream: a file, pipe, socket, character device, or similar).

Most learn to do this kind of low-level stuff with fgets(), opendir(), readdir(), closedir(), but for Linux and other POSIXy systems, that's the wrong approach. (Limited line lengths, and directory traversal prone to issues if the directory tree is modified during traversal.)

One gets much better results with POSIX getline()/getdelim(), nftw(), scandir(), and glob(); and the fts functions that originated in 4.4BSD, but are nowadays available in many standard C libraries, including in Linux. With these, you get dynamic line length support and the most robust filesystem traversal. (With getdelim(&line, &size, '\0', stream) you can read nul-delimited streams; especially useful for handling file names and paths provided by a subprocess or utility, similar to e.g. find ... -print0 and xargs -0 utilities.)

When random access to each line is required, it is more efficient to read or memory-map the entire file into a continuous chunk of memory (via <unistd.h> open()/read()/close(), or more portably using <stdio.h> fopen()/fread()/fclose() (I've shown an example here); then separately terminate and trim each line, storing a pointer to each line in a separate array. This ensures the data remains continuous in memory, and that data locality can really help cache behaviour when doing e.g. diff-type work. (Personally, I use the actual diff utility to do diffs, though, and read the output via a pipe or a socket.)

Yet, all of that pales when considering the effect of properly choosing the algorithm one implements.

My favourite example (I've described at least three times on these forums already) is the traditional sort utility implementation in C.

The task is to read a source file, and output the lines in sorted order (ascending or descending, depending on command-line options).

The first question an implementer should ask, is whether the utility will be used in background processes, or by humans, because the two have different definitions of "efficiency". In processes running on the background, only when the machine is otherwise idle, you want to minimize the CPU time used and memory needed to accomplish the task. For human tools, you want to minimize the wall clock time used.

In a background task utility, you'd read the file or memory-map it, allocate and create an array of pointers to each line, then sort the array of pointers using a suitable sorting function, and finally emit the lines in the sorted order.

To minimize the wall clock time used by a sort utility, you trade a bit of CPU time used, to accomplish the task in shorter elapsed real world time. You do this by realizing that I/O, reading the data from the storage, is the main cause of latency; real-world delay, when no "work" is done. So, instead of reading the file, you read it line by line, and each line you obtain, you stick into a self-sorting data structure. I use a binary heap, because the code is simple, and it yields a very efficient data structure for this even in the worst cases. (For uniform random input, each addition involves approximately e≃2.718 swaps in the tree (in the percolate stage after insertion.)
Effectively, when the last line is read, the lines are already in sorted order, and can be output immediately. Since the binary heap (or whatever data structure you use) will use slightly more CPU time than an offline sort function, you end up using more CPU time; but that CPU time is used while the process would otherwise be waiting for data to arrive from the storage, so the real-world time taken by the entire process is shorter.

This kind of complete change in approach is easier in higher-level scripting languages like Python, Ruby, or Lua, because there is less code, and therefore less "inertia"; less perceived 'cost' of considering doing something different, just to see if it works better. Converting code from one language to another usually yields poor results, because every one has their "native" approach, their own paradigm, as to how things are done; "how things oughta look like". So don't do that either.

(I've mentioned I write a LOT of C code. Most of that is because I experiment with stuff, just to see if they work better. I don't always do it just for my own interest of for technical interest; I've also looked at e.g. what should a linear algebra library API look like in pure C, to let beginners in C write efficient code. In my observations on unsuspecting ~~victims~~ test users, I found that using a scripting language with lightweight but efficient bindings to the native library yields better results, unless the point is to help them become better programmers as opposed to just enhancing their workflow with customized tools.)

For the above sort utility, I haven't checked how a Python version would look like, because as I said, Python I/O is relatively slow, so it isn't the proper language for implementing this. For scripting text files/streams, I often reach for sed, gawk, or mawk, if there isn't a command-line tool already available to do the job. (mawk tends to be faster than gawk, when you do not need the GNU awk extensions.) Some of the utilities, like diff, implement surprisingly efficient algorithms, and are hard to beat in terms of speed and efficiency; others (like sort) are more geared towards multifunction use, and can easily be beaten in specific scenarious using specific comparison metrics.

Nominal Animal · « **Reply #114 on:** October 25, 2021, 02:08:10 pm »

Oh duck I write long posts. Sorry.

Picuino · « **Reply #115 on:** October 25, 2021, 02:33:12 pm »

I remember a time when you tried to save up to a bit to free up as much memory as possible and make good use of it.
Today such efforts are pointless because memory has been reduced in price by a factor of 1,000,000.

The same goes for the efficiency of the algorithms. In many cases it is irrelevant whether a user-initiated file comparison takes 0.2 seconds or 0.02 seconds. The user will not notice the difference.

That is the reason why many web pages are served by PHP or Python. Speed does not matter so much as ease and flexibility in software development.

Quote from: DiTBho on October 25, 2021, 12:01:43 pm

Last monday I developed a program to compare text files.
...
As benefit the C-version of the code is something like 10X faster than the python version, and consume a quarter of ram.

How long does Python take to do an average comparison between files?

nfmax · « **Reply #116 on:** October 25, 2021, 02:35:51 pm »

Quote from: Nominal Animal on October 25, 2021, 02:07:19 pm

Yet, all of that pales when considering the effect of properly choosing the algorithm one implements.

In chapter 7 of his 1986 book Programming Pearls* (ISBN 0-201-10331-1) Jon Bentley of Bell Labs gives a wonderful example. The problem is:

Quote

Given a vector X of N real numbers, find the maximum sum of any contiguous subvector of the input (the numbers in X may be positive, negative, or zero)

Jon describes four algorithms of increasing subtlety and deviousness. The slowest is O(N^3), the fastest is O(N). The fastest algorithm, implemented in Tandy BASIC on on Radio Shack TRS-80 model III, was pitted against the slowest, written in hand-tuned Cray FORTRAN on a Cray-1. For values of N (i.e. the length of the vector) above a couple of thousand, the TRS-80 was faster.

* The book is a collection of reprints from Jon's column in Communications of the ACM

tszaboo · « **Reply #117 on:** October 25, 2021, 02:46:55 pm »

Quote from: Picuino on October 25, 2021, 02:33:12 pm

I remember a time when you tried to save up to a bit to free up as much memory as possible and make good use of it.
Today such efforts are pointless because memory has been reduced in price by a factor of 1,000,000.

The same goes for the efficiency of the algorithms. In many cases it is irrelevant whether a user-initiated file comparison takes 0.2 seconds or 0.02 seconds. The user will not notice the difference.

That is the reason why many web pages are served by PHP or Python. Speed does not matter so much as ease and flexibility in software development.

Quote from: DiTBho on October 25, 2021, 12:01:43 pm
Last monday I developed a program to compare text files.
...
As benefit the C-version of the code is something like 10X faster than the python version, and consume a quarter of ram.

How long does Python take to do an average comparison between files?

Try it yourself:

Code: [Select]

import filecmp
print(filecmp.cmp("file1.txt", "file2.txt", shallow=True))

voltsandjolts · « **Reply #118 on:** October 25, 2021, 02:48:23 pm »

When I've created some new hardware with PC interface (serial, usb, ethernet..), Python is the perfect fit for doing a demo program to demonstrate hardware interfacing for the software guys. I don't care if they are developing their GUI stuff in C#, Visual C++, Delphi...whatever, they all find Python readable and I rarely get asked interface questions.

DiTBho · « **Reply #119 on:** October 25, 2021, 03:17:11 pm »

Quote from: Picuino on October 25, 2021, 02:33:12 pm

How long does Python take to do an average comparison between files?

on 200Mbyte of files, we are talking about 12 minutes vs 40-50 minutes.

DiTBho · « **Reply #120 on:** October 25, 2021, 03:33:40 pm »

Quote from: Nominal Animal on October 25, 2021, 02:08:10 pm

Oh duck I write long posts. Sorry.

it's always useful, an it's always a pleasure

Just, when posts are long, I save them in a draft-clipboard (it's a x11 program, like "x-stickers") to carefully read them later at home

SiliconWizard · « **Reply #121 on:** October 25, 2021, 05:24:14 pm »

Quote from: DiTBho on October 25, 2021, 03:33:40 pm

Quote from: Nominal Animal on October 25, 2021, 02:08:10 pm
Oh duck I write long posts. Sorry.

it's always useful, an it's always a pleasure

Just, when posts are long, I save them in a draft-clipboard (it's a x11 program, like "x-stickers") to carefully read them later at home

Well, that's what you should probably do when writing such long posts actually...
I just wrote a longish post about the RP2040 a couple minutes ago. And lost ALL its content upon the unfortunate press of a key. No amount of CTRL-Z or back/forward got me my text back. I just gave up, not feeling like rewriting it all. Seeing how Nominal Animal has the patience of writing such long posts, I'm wondering if that has already happened to him, or if he writes his posts first in a text editor... Losing a whole detailed post upon the press of a single key with no way to get it back is pretty infuriating.

DiTBho · « **Reply #122 on:** October 25, 2021, 05:46:12 pm »

Quote from: SiliconWizard on October 25, 2021, 05:24:14 pm

And lost ALL its content upon the unfortunate press of a key

I'm sorry to hear, I know how frustrating is because it happened to me too.

I recently designed a couple of web application programs and for one of them I also implemented a java-script helper which formats things correctly (list, arrays, url, pics, ...) and every second automatically selects and copies to the clipboard all text in the text-area

Not perfect, and it's just a small step, but it has already saved my day in a couple of unfortunate button presses.

The x11-sticker program is much more useful, and I really use it as stickers to keep notes of things.

Nominal Animal · « **Reply #123 on:** October 25, 2021, 05:59:46 pm »

I do have a new empty text document in Pluma (text editor I use for plain text, including programming short snippets) always within a single keystroke (and a single-character alias, g, in my shell).

I started doing that when I started writing complex MathJax answers at math.stackexchange.com; sometimes the post editor would just stop rendering the MathJax correctly (after a syntax error, usually), and copying the text to an editor, then canceling the post and reopening a new one, and posting the text back, resolves the issues.

Indeed, if in SMF (as is used here) the Preview button used POST instead of AJAX, it would mean that a copy of the last preview would be in cache, and would be recoverable by simply going back one page in history. Of course, it would be slower and more I/O on the server, as the rest of the page would be reloaded as well.

If I was prone to keystrokes that caused me to lose valuable content, I'd probably make a browser extension or X11/Wayland utility to copy the contents of the currently focused widget in the currently focused application to a helper buffer, like DiTBho described. Much like a screenshot utility, but only for the text content of the currently active text box or other widget. A browser extension would be simpler, and could use a time-based on-disk cache.

Ed.Kloonk · « **Reply #124 on:** October 25, 2021, 07:44:20 pm »

Quote from: Nominal Animal on October 25, 2021, 05:59:46 pm

I do have a new empty text document in Pluma (text editor I use for plain text, including programming short snippets) always within a single keystroke (and a single-character alias, g, in my shell).

I use Pluma for light scripting also. I've got a dark theme and it highlights code and shows line numbers. It the one that annoys me the least. It doesn't seem to like -really- big log files.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Poll

Do you like Python?

Author Topic: Python becomes the most popular language (Read 101524 times)

Share me