EEVblog Electronics Community Forum
General => General Technical Chat => Topic started by: kody on November 30, 2014, 12:53:53 am
-
Assume a directory tree with each child represent a directory inside the root directory, Write a program to calculate the total disk space for the root directory.
-
recursion: n. See recursion.
-
The trouble with this kind of question is it can't tell the difference between someone who has learned the answer and remembers it, and someone who can intuitively figure it out.
Someone who can intuitively figure it out will make a much better engineer than someone who can't.
-
Read the free space information from the MFT/superblocks?
How about read the disk capacity/usage from WMI?
Way too little information to give an actual answer, but they're looking for your thought process and how you answer it, rather than what your actual answer is.
-
Is it a trick question? It is only asking for "the total disk space for the root directory".
The fact that there are child directories may be a diversion.
-
Is it a trick question? It is only asking for "the total disk space for the root directory".
The fact that there are child directories may be a diversion.
Only if you pretend that its subdirectories aren't in it.
-
it's not clear whether the root of the tree equals to the root directory of the disk - therefore we should assume it's not the disk's root and go for the recursion. but in case of the roots are the same, then the used capacity of the disk is the answer. we should also assume it's occupied disk space because it makes no sense co calculate frees pace for a tree ;)
btw.. i'm interviewing all the tech guys (80% of the current staff went through my interviews) for our team (~ 30 members) and i always ask similar questions (IT industry - not EEs) .... and i'm not interested in the answer at all... i want to "see" how the candidates think, because the way of thinking is much much much more important than some learnt facts ;) and from experience i can tell you that anyone with the "correct" way of thinking has enough knowledge.
-
Only really seems valuable to see if the interviewee understands recursion ... with some bonus points if he understand why you'd sometimes implement it through iteration instead.
In the context of this board I think recursion is the wrong answer :p
-
Only really seems valuable to see if the interviewee understands recursion ... with some bonus points if he understand why you'd sometimes implement it through iteration instead.
Given the size of today's filesystems, I'd always implement it iteratively.
It's good practice to avoid recursion if you can anyway.
Just push everything on a stack and keep going through it until it's empty. Slightly more difficult than recursion, but still pretty easy, uses less resources, and much more reliable.
-
[Just push everything on a stack and keep going through it until it's empty. Slightly more difficult than recursion, but still pretty easy, uses less resources, and much more reliable.
and you will eat up all the memory just before your program will be killed the OOM killer :-DD (no offence, but your solution is literally stupid)
but anyways... could you please explain how your solution uses less resources compared to recursion ? with recursion you just crawling through the tree and adding up the sum of space ocupied by branches.
with your solution you need a shitload of memory - and you DONT KNOW how complex the tree is -you don't even know whether or not you have enough memory for your method !
-
but anyways... could you please explain how your solution uses less resources compared to recursion ? with recursion you just crawling through the tree and adding up the sum of space ocupied by branches.
with your solution you need a shitload of memory - and you DONT KNOW how complex the tree is -you don't even know whether or not you have enough memory for your method !
Yeh, I know.
With recursion you end up with thousands (or more) of copies of your function's state (on the stack, not a stack), and that eats available memory pretty quickly.
With a stack (note a stack, not the stack) you have a list of directories that you haven't processed yet. It doesn't matter how big your filesystem is, that stack isn't all that big.
And in case you're wondering I've tried both, because I'm lazy and recursion is easier.
Also depends on how badly your given language handles things as to how long recursion works for, but recursion is rarely the best solution.
(I'm an SA and among other thing s look after a couple 100TB for a state government department, yeh internet credentials don't mean anything, but anyway).
-
Can't rely on tail recursion optimization working with C(++).
-
but anyways... could you please explain how your solution uses less resources compared to recursion ? with recursion you just crawling through the tree and adding up the sum of space ocupied by branches.
with your solution you need a shitload of memory - and you DONT KNOW how complex the tree is -you don't even know whether or not you have enough memory for your method !
Yeh, I know.
With recursion you end up with thousands (or more) of copies of your function's state (on the stack, not a stack), and that eats available memory pretty quickly.
With a stack (note a stack, not the stack) you have a list of directories that you haven't processed yet. It doesn't matter how big your filesystem is, that stack isn't all that big.
And in case you're wondering I've tried both, because I'm lazy and recursion is easier.
Also depends on how badly your given language handles things as to how long recursion works for, but recursion is rarely the best solution.
(I'm an SA and among other thing s look after a couple 100TB for a state government department, yeh internet credentials don't mean anything, but anyway).
with recursion you end up exactly as many states on the stack as the depth of the tree is - and that's not that much for a filesystem ;) so the recursion is the way of less resources ;)
-
Stack depth might not be that generous for some of the micros used here either.
-
Stack depth might not be that generous for some of the micros used here either.
but you're not using filesystems on those micros either ;) so be realistic - if the system is able to handle a complex structure as the filesystem is , then it has some memory and stack available ;)
-
with recursion you end up exactly as many states on the stack as the depth of the tree is - and that's not that much for a filesystem ;) so the recursion is the way of less resources ;)
With a big filesystem that can be quite deep. Remember that we're not just talking about a list of strings, there's also function state, file handles and whatnot to consider.
With recursion the simplest version is:
start {
open directory handle
read directory entry
if { directory entry is another directory, recurse }
else { process the file }
close directory handle
}
Using a stack:
start {
push root onto stack
do {
open directory handle popped off stack
read directory entry
if { directory entry is another directory, push to stack }
else { process the file. }
close directory handle.
} loop until stack is empty
}
Depending on the filesystem you're reading you may also need some sort of depth/loop detection. You might not be able to tell the difference between a link and directory from your code depending on the mount point and OS's involved. Much easier to clean up after a stack based infinite loop than a recursive version.
-
with recursion you end up exactly as many states on the stack as the depth of the tree is - and that's not that much for a filesystem ;) so the recursion is the way of less resources ;)
With a big filesystem that can be quite deep. Remember that we're not just talking about a list of strings, there's also function state, file handles and whatnot to consider.
Depending on the filesystem you're reading you may also need some sort of depth/loop detection. You might not be able to tell the difference between a link and directory from your code depending on the mount point and OS's involved. Much easier to clean up after a stack based infinite loop than a recursive version.
1. it can't be too deep - there is a practical limit - the length of a absolute path.
2. i know what's put onto the stack during a function call - still the depth is limited by practical limits and for sure there is enough stack depth and space available to handle the depth of the filesystem.
3. why should i not be able to tell the difference between directory and link ? in a real OS and real filesystem (no offense to MS guys) everything is a file and the type of the file determines what is it (directory, "regular file", link, special device file....etc..)
4. what do you mean by cleaning up after recursion ? there is NOTHING to clean up with recursion (except the variable holding the result of course). so how could be a "no clenup" more difficult and resource intensive than cleaning up your shitload of malloc()s to make a stack ?
-
With a big filesystem that can be quite deep. Remember that we're not just talking about a list of strings, there's also function state, file handles and whatnot to consider.
With recursion the simplest version is:
start {
open directory handle
read directory entry
if { directory entry is another directory, recurse }
else { process the file }
close directory handle
}
Using a stack:
start {
push root onto stack
do {
open directory handle popped off stack
read directory entry
if { directory entry is another directory, push to stack }
else { process the file. }
close directory handle.
} loop until stack is empty
}
Depending on the filesystem you're reading you may also need some sort of depth/loop detection. You might not be able to tell the difference between a link and directory from your code depending on the mount point and OS's involved. Much easier to clean up after a stack based infinite loop than a recursive version.
Your two algorithms are exactly equivalent. Except the implementation using function recursion is going to be more reliable: simpler to write, simpler to understand, simpler to test.
As others have said the maximum stack depth will be the maximum depth of the file system, which for practical reasons is going to be in the double digits (not more than about 30 levels). There is no cost or resource worry for using a recursive implementation in this case and no need for special optimizations.
-
1. it can't be too deep - there is a practical limit - the length of a absolute path.
You haven't dealt with normal users . . .
(Sure Windows has a 260 character file path limit, but you can map a drive anywhere in the middle of that and keep going).
2. i know what's put onto the stack during a function call - still the depth is limited by practical limits and for sure there is enough stack depth and space available to handle the depth of the filesystem.
Depends on what state is being saved with each recursive call. Program state, file handles, etc. Some languages are brain-damaged with this (Powershell . . .)
3. why should i not be able to tell the difference between directory and link ? in a real OS and real filesystem (no offense to MS guys) everything is a file and the type of the file determines what is it (directory, "regular file", link, special device file....etc..)
Network mounts. Depends on the OS at each end, as well as how they're talking. I work with Windows, VMS, HPUX, Solaris, and various Linux's. Everyone does everything differently.
4. what do you mean by cleaning up after recursion ? there is NOTHING to clean up with recursion (except the variable holding the result of course). so how could be a "no clenup" more difficult
and resource intensive than cleaning up your shitload of malloc()s to make a stack ?
With recursion, how do you detect infinite loop? You can have a static variable as a depth counter, but that only stops you going deeper, it can''t fix the problem. Or you could use the static variable as a "kill switch", so when it's tripped all the layers bomb. Each layer only has it's own state though, it can't recover at all.
With a stack you have the entire set of directories in the one list. If it gets too deep you can close your single file handle, free the stack and abort. Or, given that you have all the information available, attempt to resolve the loop and continue.
Your two algorithms are exactly equivalent. Except the implementation using function recursion is going to be more reliable: simpler to write, simpler to understand, simpler to test.
My two algorithms give the same result depending entirely on the dataset they're aimed at.
The recursive example is simpler to write and simpler to understand, but has a habit of not working as expected when used against a large and complex dataset.
-
1. it can't be too deep - there is a practical limit - the length of a absolute path.
You haven't dealt with normal users . . .
i'm dealing with enterprise "users" - and that much worse than normal users ;) you won't believe how "smart" are some coders developing "enterprise" applications optimized by various ways ;)
and you can keep arguing for ages... but any other filesystem mounted is easily detectable (every tool is able to remain on a filesystem without jumping to a filesystem mounted in some sub-directory - e.g. GNU find , du , tar ) . if you don't know how to do it , then it doesn't necessarily mean it's not possible. and it definitely doesn't mean there is a problem of detecting infinite loops because of links, and doesn't mean that you have to shove everything onto a stack ;)
btw... how will you stack based solution (which is in fact again a recursion) handle hard-links ? if you don't know hard link is a yet "another" inode representing the same file on the same filesystem - in contrast to a symbolic or soft link which is a pointer to a path. i bet you'll end up in a loop if that hard link is representing a directory "above" - so your method is solving nothing regarding avoiding loops - you have to detect the type of files exactly the same way as with recursion ;)
so your solution got no benefit except the bigger memory consumption (if you call that a benefit) ;)
-
but any other filesystem mounted is easily detectable (every tool is able to remain on a filesystem without jumping to a filesystem mounted in some sub-directory - e.g. GNU find , du , tar )
Yeh, but I'm trying to read those filesystems too, not avoid them.
btw... how will you stack based solution (which is in fact again a recursion) handle hard-links ?
On a local filesystem? Detect and ignore them. On a remote filesystem? Depends really, can't always detect them.
so your solution got no benefit except the bigger memory consumption (if you call that a benefit)
That depends on dataset, the language you're using, and what exactly you're doing.
Recursion is simpler to implement and simpler to understand, but it doesn't always work the way you want it to. And if you're going to do anything properly, recursion is very rarely the right answer (like goto's -- they have their place, but they're dirty :)).
With regards to loops -- with recursion you only have the name of the current path you're looking at, and perhaps a call depth. You have no method of detecting a loop at all, and no method of correcting apart from a depth limit or a total abort.
With a stack, you have the entire path history. You can detect depth, you also can detect if your current path is just being repeated and ignore it. Or you can have a separate stack of previously visited paths and ignore those if they're called again.
The more correct solution depends on the complexity of the dataset, the purpose of the tool, the timeframe of implementation, the expected reuse and longevity, the expected users, and a bunch of other factors. Recursion is a nice quick and dirty option, and if it works, great.
Using a queue is much more flexible and much more resilient, just a bit more work to implement.
Also the resources it uses (heap vs stack) might matter depending on platform, with stack generally being much more constrained.
-
Stack depth might not be that generous for some of the micros used here either.
but you're not using filesystems on those micros either ;)
It's not too relevant now, but 8 bit micros running a MP3 player were not an uncommon use case a decade ago.