Author Topic: Photogrammatry/Visual SLAM  (Read 6344 times)

0 Members and 1 Guest are viewing this topic.

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Photogrammatry/Visual SLAM
« on: October 20, 2018, 03:27:16 am »
Is anybody doing it? I want to map out my small space so I can redecorate better, Using Photogrametry SW like Colmap or VSFM and generating a dense point cloud in MVS eventually bring the dense point cloud into Blender.

Anybody else doing this?
"What the large print giveth, the small print taketh away."
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8107
  • Country: fi
Re: Photogrammatry/Visual SLAM
« Reply #1 on: October 20, 2018, 12:07:46 pm »
Anybody else doing this?

Not photogrammetry, but I'm working on 3D time-of-flight measurements, tightly integrating everything together for a lower-cost solution (although it still cannot compete with a $5 camera module, or two $5 cameras comprising a stereo camera), while adding a few unique and novel "tricks" to circumvent the biggest accuracy and reliability issues associated with the continuous-wave 3DTOF. The intended application is indoor/outdoor SLAM and obstacle avoidance with about 10 meters of range.

You will hear more about this when we have something to actually put in production, but this is looking very good and I believe it is a game changer as a kind of "solid state LIDAR" as some people call it. As for the progress status, we are working on the mass production calibration setup, and on the other hand, on the calibration / data enhancement algorithms. Still, the great thing here is that unlike photogrammetry, time-of-flight sensors are not fundamentally tied to perfect, complex algorithms; we have the basis on direct distance measurement based on the speed of light, which only needs refinement and qualification, and some sensor fusion on software, making the algorithms simpler and less critical.
 
The following users thanked this post: cdev

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3781
  • Country: de
Re: Photogrammatry/Visual SLAM
« Reply #2 on: October 20, 2018, 12:47:57 pm »
Is anybody doing it? I want to map out my small space so I can redecorate better, Using Photogrametry SW like Colmap or VSFM and generating a dense point cloud in MVS eventually bring the dense point cloud into Blender.

Anybody else doing this?

I was working on similar things, what do you want to know?

Generally speaking, SLAM is not really what you want for this - I have yet to find a free/low cost application that would do any kind of reasonable job here. Most of them fail badly when they don't see enough texture (i.e. blank walls!) and will get confused by windows.

The tracking also drifts - if you stand in the middle of the room and turn 360 the generated map will not fuse properly, you will have a large discontinuity/step between the beginning and end of the scan.  The accuracy sucks - the best we have got from a room mapping using these was about 5-10cm, but most often much worse than that. That was both using phone/tablet cameras and dedicated hardware sensors such as the Google Tango or the Structure Sensor.

Also, what these generate is a point cloud which is not all that useful in Blender because it is literally just a bunch of points which is far from being sufficiently dense to allow for accurate measurements. The reconstructed meshes tend to be universally useless - the applications offering this are typically meant for 3d scanning of small objects and fail horribly when trying to scan a room. The automatically generated meshes are usually too rough/inaccurate and very polygon-heavy, making them very difficult to use for anything.

Photogrammetry works generally better as long as you calibrate your camera and are really careful about how you take the photos/video. However, taking good enough pictures inside of a small room can be a challenge unless you have a fisheye lens (both expensive and more difficult to work with during the reconstruction), arrange sufficient lighting, etc. And it also fails if the walls don't have sufficient texture on them because there aren't enough features for the algorithm to fuse the images.

If all you want is data for remodeling your home, you would be far better off with a measuring tape or a laser/ultrasonic range finder. Both are far more accurate and the measurements go actually faster than messing with any kind of scanner. Most rooms are rectangular, so you need only a few distances measured to get the size of the space. Scanning of an even small room takes 10-15 minutes at least, plus any setup and teardown (e.g. covering up windows, placing lights, etc.) and then you need all that post-processing to actually extract the data - if you even manage to do so in the first place and don't have to go back and re-scan.

Of course, if you have access to a proper commercial laser scanner that would be different matter. But those are a totally different league, both when capabilities and price are concerned.
« Last Edit: October 20, 2018, 12:51:36 pm by janoc »
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Photogrammatry/Visual SLAM
« Reply #3 on: October 20, 2018, 01:05:58 pm »
solid state LIDAR

eh, talking about business, a customer (for whom I am the consultant) is making a lot of money by providing three 3D-LIDARs as sort of high tech parking sensors for military helicopters.

They are in some way rival to SICK, which is a leader in the field of laser scanners, but they have added to the recipe a very intriguing recertification for avionics, for which SICK doesn't seem to have interest.

Anyway, these three 3D-LIDARs have been requalified as Aeronautical Radio INC devices to the MTC (mission tactical computer), hence they are connected to the navigation system to which they provides the contribute of being accurately and reliably able to detect and measure objects in good time and in multiple dimensions, and by collecting large volumes of data on multiple scan layers and from different angles, they can detect and respond to objects under the helicopter, like high voltage wires suspended on poles (extremely dangerous if the helicopter accidentally comes in touch with), as well as objects around that are obstructing the path.

There are four lasers with different wavelengths in the range of infrared, and photo arrays, and an intriguing use of RANSEC's algorithms (this is the part where I am involved) for using the LIDARs to detect objects that are not visible due to the fog or smoke.

I am not an expert of LIDARs, but as far as I have understood, basically, except for the algorithms part, they haven't invented anything, they have taken a technology similar to what is used by SICK and requalified it for being avionics compliant.

so, it would be intriguing to see where and who will use "solid state LIDAR" and for what. I know, I am too involved with weird customers, and for sure the micro-robotics is interested  :D
« Last Edit: October 20, 2018, 01:08:38 pm by legacy »
 

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #4 on: October 20, 2018, 01:35:45 pm »
All of you, thank you very much for your info!

This is an interesting area.

I've been plodding through the literature on SLAM, 3d reconstruction, 'visual odometry' etc.

Since we're on the subject, I've noticed that Kinect cameras are fairly cheap, and there seems to be a software library for it that runs on Linux, freenect.

Alternatively, it would be possible to make a rig that held two decent but cheap USB webcams a known distance apart, in a rigid manner, kind of like the eyes. It should be possible to calibrate that.

Some really cheap (under $5) USB webcams are surprisingly decent.

I have a longstanding interest in archaeology so all this is something that I had been following a bit in the past, but stopped following it for several years and in that time the use of it and applications that use it have really exploded. 

I've had some success, and as my current requirements are quite modest compared to others, I am confident that I should be able to get this worked out.

These programs need to do a lot of calculation and they need lots of memory and lots of disk space to do their jobs, especially when you have a bunch of images. (even though I am taking the images with the smallest resolution I can on my current camera, its not a smart phone camera, they are slightly bigger than that.)

As part of this project Ive downloaded at least 15 or 20 SLAM/photogrammetry related software projects and attempted to find a combination that I was able to successfully compile that was also stable which would let me go from a series of photos to a dense point cloud suitable for cleanup and importation into 3D programs via Meshlab. (which is available as a deb and is stable.)

It would be much easier if I was using Cuda, and not AMD/OpenCL, as at least for now, that seems to be where most of the software is using.  That said, I would hate to spend so much more for the same computer power. And the open software ideal is where I would like to go. So I bought an AMD card.

I can get a sparse point cloud from colmap. And VSFM works for me - although some of the programs it uses are not compiling for me right now, I have binaries which I can sort of hack to make them work, but its ugly. (would much rather compile current versions that ran normally) Still, I have several programs on my list of tools to try which I have not dug into in any depth yet, because I have not been able to successfully compile their dependencies. And some I have been able to compile, but which I have issues with, probably because some obvious thing I am not understanding. I.e. user error.

Right now I am trying MVS, Ive also been able to successfully compile MVE and micmac, when I leave out the QT user interface.

And SiftGPU which is working well in OpenGL mode.

You know, maybe I should just bite the bullet and get a second, small, cheap Nvidia card, or maybe a used better card, on ebay, in the past I have seen them go for really cheap, say $50 - just for this stuff.

Maybe one with passive cooling that didn't use a lot of power, so I didn't have to worry about additional current used up and heat. So, not so powerful. 

That would let me use a lot more software.

I am taking the pictures with a good camera and a prime lens thats been well characterized, with bright, even lighting, trying to work my way around so each picture contains a lot of common points with the previous one.

Also, the OS too is perhaps problematic. If I was on Windows, it would be a lot easier to find a working quick solution, especially if I was willing to buy it, but I really prefer Linux so much more, well, to be frank, the issues i have with Windows are a real joy killer for me.

All that said, it seems that I am very close to having a working setup.

Janoc, here is what I think I am going to do for flat featureless walls. temporarily put up some targets printed on paper which I have made which will add some detail, basically a 10 cm square with a cross in it.  Masking tape doesn't stain the walls. I'll only need to do it once and I can also use tape measurements to get the size exactly right. I think that doing this I should be able to get a better 3D model than I could do myself with the only just beginner Blender modeling skills that I have. (Its only been quite recently that I have been able to get accelerated 3D Blender to run on my machine, which makes a huge huge difference in the learning curve, I realize, so I am really just a beginner.)

I have to say, so much of the cool software in this area seems to come from France.

My goal is to create a 3D model of my entire interior. This will be especially useful in the work areas, in getting things right in a fairly small space.

Lidar is really amazing. I live in an area fairly close to where a large hurricane (Hurricane Sandy) did a lot of damage several years ago, and there was a project done in some places there to see if it could be used to very rapidly assess the damage done after this massive storm came in and knocked down or seriously damaged a great many houses along the shore.

They were able to do a rapid scan of houses by driving through these areas and do a very high quality rapid damage assessment based on that 3D data.
« Last Edit: October 20, 2018, 02:27:01 pm by cdev »
"What the large print giveth, the small print taketh away."
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #5 on: October 20, 2018, 02:29:11 pm »
If you want to scan rooms and get dense reconstruction you need at least semi-dense if not fully dense SLAM. Because of the sheer amount of data beefy GPU is pretty much a must, especially for "real-time" operation.

I wouldn't recommend passive mono/stereo, like DTAM:
https://www.robots.ox.ac.uk/~vgg/rg/papers/newcombe_davison__2011__dtam.pdf

Way too much compute and often poor results in practice, lighting changes, textureless-areas etc.

Depth cameras help here a lot (small scale, indoors). One approach is the classic KinectFusion, but because of the way the map is represented you might hit space size / resolution limits pretty quickly:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ismar2011.pdf

ElasticFusion is a bit better, as it uses a cloud of surfels to represent the map, it also has loop closures:
http://www.roboticsproceedings.org/rss11/p01.pdf
code here: https://github.com/mp3guy/ElasticFusion/ (needs CUDA and OpenGL)

Meshlab can load these surfels maps and one can use Poisson surface reconstruction there to remesh the cloud into a fully dense, albeit sometimes too smooth model.

On the downside the tracking is not that great and as janoc said, as with most SLAM/VO systems it will drift. Loop closure might correct some drift, but not always.

The perfect solution is to have an external system for tracking and just use ElasticFusion for mapping. I did that back in the day to obtain full 3D room models for ground truth as the lab was equipped with VICON tracker (sub-mm accuracy), see figure 18:
https://www.imperial.ac.uk/media/imperial-college/research-centres-and-groups/dyson-robotics-lab/lukierski_etal_icra2017.pdf

(it is a can of worms on its own, calibration from VICON to Kinect, synchronising the sensors etc).

Other models from real rooms were taken with just ElasticFusion and it's ICP+Lukas-Kanade built-in tracking.

LIDARs like Velodyne is a completely different league, but also orders of magnitude higher price (than PC with nVidia GPU + Kinect).

I cannot stress the importance of geometric camera calibration for any kind of SLAM enough. For dense systems even more as they rely on pixel data, therefore photommetric calibration should be done as well. Here's an example of SLAM system that does that:
https://jakobengel.github.io/pdf/DSO.pdf
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Photogrammatry/Visual SLAM
« Reply #6 on: October 20, 2018, 02:30:38 pm »
Cuda

have you ever tried or considered the JK1 kit?  :D
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #7 on: October 20, 2018, 02:33:32 pm »
have you ever tried or considered the JK1 kit?  :D

TK1 is long in the tooth and obsoleted by nvidia AFAIR since CUDA 7.0 (they went 64bit only). TX1/TX2 are much better, but even more expensive. For the price of either one can buy quite beefy GPU second hand.
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3781
  • Country: de
Re: Photogrammatry/Visual SLAM
« Reply #8 on: October 20, 2018, 03:11:28 pm »
Also, the OS too is perhaps problematic. If I was on Windows, it would be a lot easier to find a working quick solution, especially if I was willing to buy it, but I really prefer Linux so much more, well, to be frank, the issues i have with Windows are a real joy killer for me.

Actually nope. Yes, you could get pre-compiled code. But that's all. Building anything computer vision-related on Windows is an horrendously uphill battle because most computer vision people use Linux for development due to its better development tools and libraries. Just try to compile things like OpenCV or PCL in Windows and you will see what it takes ...

Janoc, here is what I think I am going to do for flat featureless walls. temporarily put up some targets printed on paper which I have made which will add some detail, basically a 10 cm square with a cross in it.  Masking tape doesn't stain the walls. I'll only need to do it once and I can also use tape measurements to get the size exactly right.

Sorry but that's nowhere near enough unless you plaster the entire wall with these markers. Worse, they need to be distinct from each other because otherwise the SLAM algorithm gets confused. E.g. if all the corners of your markers look the same because they come from black squares on a white background (or your crosses) and there are no other features around, it is completely useless - the algorithms using features such as SIFT or ORB won't be able to distinguish between them and will get lost. Basically what works best is something that is almost random, with no apparent structure and repetition in it.

I think that doing this I should be able to get a better 3D model than I could do myself with the only just beginner Blender modeling skills that I have. (Its only been quite recently that I have been able to get accelerated 3D Blender to run on my machine, which makes a huge huge difference in the learning curve, I realize, so I am really just a beginner.)

Seriously, this is a completely ridiculous way of trying to build a model of a room. That's a bit like needing to travel to Tokio and you started with chopping wood to build a plane while trying to learn aeronautical engineering from a textbook because you don't want to learn how to buy a plane ticket.

Even if you manage somehow to obtain a scan, the 3D scans are not usable directly. So you will need to learn to use Blender (or whatever other tool) to actually rebuild the 3D models from them. There is no magic solution where you just take a bunch of pictures and a ready to go 3D model comes out. Not even when using the expensive commercial tools (regardless of what their marketing brochures show).

Also, if you don't have a lot of RAM and a powerful GPU, don't even bother - all these algorithms need both in order to get the processing done in a reasonable time. Large, dense scans can take many hours to process even on a powerful machine.

My goal is to create a 3D model of my entire interior. This will be especially useful in the work areas, in getting things right in a fairly small space.

Why? What is that model going to be useful for (apart from having something to look at) that you couldn't achieve by taking a bunch of measurements and making a plan?

They were able to do a rapid scan of houses by driving through these areas and do a very high quality rapid damage assessment based on that 3D data.

Yes and those devices cost hundreds of thousands of USD ... Also a typical LIDAR like the Velodyne mentioned before doesn't really produce useful scan - it is meant for navigation/collision avoidance, not for dense mapping of the environment. There is also a huge difference in resolution/accuracy - for a navigation system 10cm or even a meter of error is no big deal (and considered "dense" already) but that is an enormous deal if you are actually trying to take accurate measurements for construction purposes.

What these guys were likely using were laser scanners, such as machines from FARO. Those can scan huge areas and go down to millimeters of accuracy. Just if you need to ask about the price you can't afford it.

When it comes to SLAM you need to understand that while SLAM stands for "simultaneous localization and mapping", the emphasis is on localization. The mapping part is often only for the machine to be able to "find itself" in the space by comparing what it "sees" right now with what it has seen in the past, not to produce an usable 3D model. That's the case of most of the published SLAM algorithms, such as the mentioned DTAM, Monoslam, PTAM, etc. Also keep in mind that these are research systems, often unable to properly do loop closure (to recognize that it has returned to a point it has been to before and fuse the data), don't manage memory in a meaningful way so they will be limited to short runs, etc.

Someone has also mentioned visual odometry - that is actually SLAM without building any map at all. The advantage is that it is much faster and needs much less memory, the disadvantage is that once it "gets lost", it has no way to recover. This is e.g. what Apple has in their ARKit.

Systems that focus on actual model production work differently because the emphasis is not real time performance but resolution and accuracy. The working principle is roughly the same but the actual math and the way the processing is done is often very different.

« Last Edit: October 20, 2018, 03:13:10 pm by janoc »
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Photogrammatry/Visual SLAM
« Reply #9 on: October 20, 2018, 03:19:46 pm »
TK1 is long in the tooth and obsoleted by nvidia AFAIR since CUDA 7.0 (they went 64bit only). TX1/TX2 are much better, but even more expensive. For the price of either one can buy quite beefy GPU second hand.

umm, not sure about this. I have seen on eBay USA and eBay German a couple of TK1s second hand for ~70 euro each, the original price was 250  :-//

TX2 is ~500-600 Euro.
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #10 on: October 20, 2018, 03:58:17 pm »
umm, not sure about this. I have seen on eBay USA and eBay German a couple of TK1s second hand for ~70 euro each, the original price was 250  :-//

That is precisely why it is cheap second hand, it is not supported anymore and not very fast (compared to TX1/TX2 not to mention Xavier).
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #11 on: October 20, 2018, 04:06:59 pm »
Sorry but that's nowhere near enough unless you plaster the entire wall with these markers. Worse, they need to be distinct from each other because otherwise the SLAM algorithm gets confused. E.g. if all the corners of your markers look the same because they come from black squares on a white background (or your crosses) and there are no other features around, it is completely useless - the algorithms using features such as SIFT or ORB won't be able to distinguish between them and will get lost. Basically what works best is something that is almost random, with no apparent structure and repetition in it.

That is not entirely true, any (sparse in this context) SLAM system worth its salt does feature matching not only by descriptor distance (that could be fooled by similar corners), but also pixel distance from landmark reprojection, then various triangulation tests and then RANSAC based filtering.

Most simple camera calibration tools use a chessboard pattern and most of the time there is no problem with assigning the point correspondences because there is prior knowledge (of the pattern that is).

When it comes to SLAM you need to understand that while SLAM stands for "simultaneous localization and mapping", the emphasis is on localization. The mapping part is often only for the machine to be able to "find itself" in the space by comparing what it "sees" right now with what it has seen in the past, not to produce an usable 3D model. That's the case of most of the published SLAM algorithms, such as the mentioned DTAM, Monoslam, PTAM, etc. Also keep in mind that these are research systems, often unable to properly do loop closure (to recognize that it has returned to a point it has been to before and fuse the data), don't manage memory in a meaningful way so they will be limited to short runs, etc.

It depends, it is important to understand the differences between systems. Monoslam and PTAM are sparse systems and the map is quite useless, but the tracking is better. Dense systems like DTAM (or ElasticFusion) have rather poor tracking, almost as an afterthought, and focus more on mapping. That's why coupling ElasticFusion with external VICON tracker can get sub-cm model accuracy (if I calibrated Kinect depth - flat field correction and offset/scale it could be even better perhaps).

For more thorough explanation and analysis of sparse vs dense and direct vs indirect I recommend Jakob Engel's PhD thesis:
https://jakobengel.github.io/pdf/JakobEngelPhD.pdf
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3781
  • Country: de
Re: Photogrammatry/Visual SLAM
« Reply #12 on: October 20, 2018, 06:47:18 pm »
That is not entirely true, any (sparse in this context) SLAM system worth its salt does feature matching not only by descriptor distance (that could be fooled by similar corners), but also pixel distance from landmark reprojection, then various triangulation tests and then RANSAC based filtering.

I didn't want to go to such detail. However, you can do as much reprojection and RANSAC as you want if all (or a majority) of features seen in the current frame were incorrectly identified. Reprojection and RANSAC will only eliminate the incorrectly matched features so that the algorithm doesn't try to estimate the pose from bad data (which would make it diverge) but it will not "invent" new correct features where there aren't any.

At that point the algorithm is basically doing dead reckoning and is only accumulating uncertainty. And after a while of this the tracking will be lost because even if there are enough visible features it won't be able to match them properly anymore.

Some systems are capable of performing relocalization (to "find themselves" once they got "lost") from the past data in such situation but that assumes both that the map didn't get corrupted by the bad data (you will only know that the data were bad once the algorithm diverges - but that is too late to reject them from updating the map) and that the pose of the camera is not too different from what was seen before. I have yet to see a SLAM system that is 100% reliable when it comes to this. Even something like Hololens can get pretty badly confused in certain situations.

Most simple camera calibration tools use a chessboard pattern and most of the time there is no problem with assigning the point correspondences because there is prior knowledge (of the pattern that is)

That's not quite comparable because that chessboard pattern constrains your search a lot. You don't care about identifying the individual features (corners in this case) from their descriptors, you only need to find them and then you use the known geometry and planarity of the pattern to label them.

That's why coupling ElasticFusion with external VICON tracker can get sub-cm model accuracy (if I calibrated Kinect depth - flat field correction and offset/scale it could be even better perhaps).

Yes, of course. But that's hardly SLAM anymore and the setup/calibration are quite non-trivial. Also an external tracker like that is not exactly a cheap option. Vicon starts at around $20k for the basic set. Even a cheap Optitrak starts at about $10k or so.

It could probably be done using the Valve's Lighthouse system (that the HTC Vive HMD uses), that is relatively cheap (~$300 for 2 bases + one tracking "puck"). On the other hand, I am not sure about how something like Kinect would handle the intense IR flashes and lasers of the tracking system.
« Last Edit: October 20, 2018, 07:05:46 pm by janoc »
 

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #13 on: October 20, 2018, 09:45:01 pm »
Using RTKlib in PPP mode relative GPS accuracy outdoors typically becomes accurate to sub-decimeter even with an imperfect sky view (PPP is a no base station required, one receiver standalone method, that uses one or multiple frequency GPS, usually plus SBAS  however this is assuming a very long warmup period, you cant just turn it on from a cold start and expect to get that.) When given short baseline correction data (RTK) then the accuracy becomes fairly absolute, not relative and even indoors its my experience that the accuracy is surprisingly good. (wood frame building.) This wouldnt hold true for any kind of large concrete building though. However where I live, the GPS in RTK mode is perfectly capable of localizing me inside the house to an accuracy of a meter or less, usually better, but only after some time- after its had a long time of continuous operation to warm up (many hours). Actually, that warmup is not necessary if you use AGPS to fetch up to date iono data, the missing part - the reason it takes so long is the extra precise ionospheric correction data, which can be fetched via the network too (phone-collected GPS data has access to post processing using that data) Also, cell network cells now collect DGPS data. Sometimes the baseline involved there can be very short. So increasingly, as more and more cell handsets support raw data its going to be increasingly accurate from the very beginning if they can collect raw data and whomever is using that info has realtime access to that cell site data stream especially.

 Anyway, the point I am trying to make is that when you combine GPS data with these other methods, the chance of being able to accurately localize something (to just a few centimeters or even less) using a combination of methods that includes SLAM - especially if you combine other methods to eliminate ambiguous situations seems pretty good unless they are deep inside a substantial building, (how about a building whose walls ceilings and floors are either totally flat and featureless, or mirrors? that would be interesting!)

But if you use dead reckoning or something like that too, physical odometry, plus one of those electronic gyroscope devices that can keep track of rotation...

I bet navigation inside of buildings will be a solved problem within a few years. Especially if they put visual cues of some kind in rooms or on the floor for the robot to pick up on. (Cheap and simple solution for things like mail delivery and cleaning robots.)
What this means is that autonomous delivery will become a reality, even in difficult situations, inside of urban areas and even apartment buildings.

Sometimes low-tech solutions work as well or better than high tech ones. Plus they create jobs which are going to become scarcer and scarcer due to automation with terrifying consequences for society.

We shouldn't automate everything just for the sake of doing so, I guess is what I am trying to say.

 "Back in the day" - to my recollection this must have been in the late 90s, there was a short lived company "webvan" that tried to do the food delivery thing and at least in my area the word was that it worked well.

Two guys who lived in my building - computer people - used webvan. This meant they had a small lockable refrigerator installed in the back of our building on the ground floor. The webvan delivery people had a key (the same key the garbage men, etc. had to get into the common areas of the building) and thats all they needed to get to this refrigerator. So they didn't need to be home to accept deliveries and not have them get snatched by some hungry person.
« Last Edit: October 20, 2018, 10:36:06 pm by cdev »
"What the large print giveth, the small print taketh away."
 

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #14 on: October 20, 2018, 10:44:53 pm »
Way too much $$$ for me, unfortunately. This is why I dislike Nvidia, they are much more expensive "than I feel they should be".

"Harrumph"

Sucks that they don't see how this is a big barrier to most people. Or maybe they do.

Cuda

have you ever tried or considered the JK1 kit?  :D
« Last Edit: October 20, 2018, 11:26:31 pm by cdev »
"What the large print giveth, the small print taketh away."
 

Offline Dubbie

  • Supporter
  • ****
  • Posts: 1114
  • Country: nz
Re: Photogrammatry/Visual SLAM
« Reply #15 on: October 21, 2018, 12:06:34 am »
I’ve got direct experience with what you’re trying to do and like janoc above, can tell you it will never work like you are planning.

At work we use expensive ($100k+) lidar units from FARO and Leica. While they give really nice accurate and dense point clouds, there is no real way to translate that to a useable mesh. Instead we treat it like a set of measurements and model everything from scratch.  For certain detailed objects such as rocks and tree trunks, we can use photogrammetry successfully, however this generates large models that usually need to be cleaned up and processed by a skilled 3D artist in a sculpting application afterwards.

There isn’t really a shortcut for this kind of stuff, if there was we would be using it and saving thousands.

Learn how to use SketchUp, buy a $5 tape measure and you will get accurate and useful results in a day of work.
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #16 on: October 21, 2018, 10:21:27 am »
I didn't want to go to such detail. However, you can do as much reprojection and RANSAC as you want if all (or a majority) of features seen in the current frame were incorrectly identified. Reprojection and RANSAC will only eliminate the incorrectly matched features so that the algorithm doesn't try to estimate the pose from bad data (which would make it diverge) but it will not "invent" new correct features where there aren't any.

But we are not talking about blank walls, if you have blank walls without any texture visual SLAM would fail, if it also has no structure then even Kinect based SLAM will fail. IMU assisted SLAM will drift massively. What I wanted to say is that some features are better than no features, because that's what cdev wanted to add:
Janoc, here is what I think I am going to do for flat featureless walls. temporarily put up some targets printed on paper which I have made which will add some detail, basically a 10 cm square with a cross in it.

And even if the features are "similar" they are in different spatial (therefore reprojection) locations so some of them won't get mismatched, the ones that will should be filtered out anyway. Therefore adding random markers shouldn't hurt and could always help, you'll need just a handful to estimate the pose (well the more the better, but still). That's contrary to what you've said:
Sorry but that's nowhere near enough unless you plaster the entire wall with these markers. Worse, they need to be distinct from each other because otherwise the SLAM algorithm gets confused.

One should give as much help to SLAM as possible and practical. Anyway, the proof of the pudding is in the eating, so I made a simple example, ORB-SLAM2 in RGBD mode with Kinect, looking at a chessboard pattern displayed on a monitor. The pattern has a lot of pretty much identical corners, so you would say they are not distinct and the system would get confused. In reality you can see the reconstructed landmarks in the 3D view resembling the pattern structure well, triangulated correctly, and the tracking works without trouble.

Of course if the drift is big enough (after 360 deg scan) then the reprojection error matching criteria would fail as well, so loop closure will be difficult. Using wide angle (fisheye) optics might help a bit when doing full 360 deg scan, as there might be more point to "anchor" and therefore less drift and chance of mismatches. Anyway, the more features the better, compared to blank flat white walls.

Some systems are capable of performing relocalization (to "find themselves" once they got "lost") from the past data in such situation but that assumes both that the map didn't get corrupted by the bad data (you will only know that the data were bad once the algorithm diverges - but that is too late to reject them from updating the map) and that the pose of the camera is not too different from what was seen before. I have yet to see a SLAM system that is 100% reliable when it comes to this. Even something like Hololens can get pretty badly confused in certain situations.

Nothing is 100% reliable, especially SLAM. Every now and then somebody comes and proclaims "SLAM is a solved problem" and it is somehow still not the case.

Yes, of course. But that's hardly SLAM anymore and the setup/calibration are quite non-trivial. Also an external tracker like that is not exactly a cheap option. Vicon starts at around $20k for the basic set. Even a cheap Optitrak starts at about $10k or so.

Sure, that's not SLAM, but it kind of works OK for the purpose of this thread - room scanning. Still cheaper than Velodyne (not to mention FARO).

If going lower cost, personally, I would run ORB SLAM2 in RGBD mode for sparse tracking and something like ElasticFusion for dense mapping. Combining the best of both worlds.

It could probably be done using the Valve's Lighthouse system (that the HTC Vive HMD uses), that is relatively cheap (~$300 for 2 bases + one tracking "puck"). On the other hand, I am not sure about how something like Kinect would handle the intense IR flashes and lasers of the tracking system.

Kinect has quite good pass-band filters and defined pattern structure, so I had no problems interference with stuff like VICON for example. I think Vive was OK as well (didn't play with it much back in the day). The problem is the other way around, VICON sometimes treated Kinect IR projector as a marker and I've heard that Vive doesn't have pass-band filters and gets disturbed by Kinect.  :(

I’ve got direct experience with what you’re trying to do and like janoc above, can tell you it will never work like you are planning.

At work we use expensive ($100k+) lidar units from FARO and Leica. While they give really nice accurate and dense point clouds, there is no real way to translate that to a useable mesh. Instead we treat it like a set of measurements and model everything from scratch.  For certain detailed objects such as rocks and tree trunks, we can use photogrammetry successfully, however this generates large models that usually need to be cleaned up and processed by a skilled 3D artist in a sculpting application afterwards.

There isn’t really a shortcut for this kind of stuff, if there was we would be using it and saving thousands.

Learn how to use SketchUp, buy a $5 tape measure and you will get accurate and useful results in a day of work.

From cdev's posts I guess that he wants to play with 3D reconstruction more for fun than business, given the reluctance to even buying a consumer-grade Nvidia GPU. In that sense various open source 3D reconstruction/SLAM libraries are a fun way to learn and play.

If this was business and real accuracy is needed, cm or better, right angles being square then sure, get FARO or, cheaper Matterport (looks 3 kinects glued together with some software). Dubbie is right, you just get point clouds, denser or sparser but point clouds. It's a long way ahead to practical blueprint like model. There are some meshing algorithms, there are various plane fitting tricks (like Manhattan DTAM does), don't get me even started on deep learning based methods, but ultimately it is still a lot of hand crafting.
« Last Edit: October 21, 2018, 10:47:09 am by lukier »
 

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #17 on: October 21, 2018, 09:27:25 pm »
What is the cheapest respectably functional Cuda card for under $200? Is it possible to get one that isn't crippled. (I hear Nvidia cripples the Cuda performance of some of their cards)

My card was less than that and it is still the equivalent of a much more expensive Nvidia card in terms of horsepower, (using OpenCL) even though its technology is a few years old.
"What the large print giveth, the small print taketh away."
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #18 on: October 21, 2018, 09:45:44 pm »
What is the cheapest respectably functional Cuda card for under $200? Is it possible to get one that isn't crippled. (I hear Nvidia cripples the Cuda performance of some of their cards)

They do in terms of double performance or nowadays the number of TensorCores, but that doesn't matter that much except either scientific simulations or 16 bit floating point deep learning.

Under $200 you should be easily able to get something like GTX1060 that has 3.5 TFLOPs of fp32 performance:
https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series

Maybe even GTX1070 if you find a good deal.

GTX900 or older cards might be cheaper, but are sometimes more power hungry and don't support new CUDA features.
 

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #19 on: October 22, 2018, 03:08:17 am »
Actually, that is exactly why I would be getting one, that and photogrammetry.

I'd really like to spend less though. Maybe a GTX1050 with passive cooling. Planned obsolescence is a crucial issue. Vendors like to arbitrarily cut off support for older cards - figure you only get a few years out of each one in a specialty use like machine learning- and they get away with it.

They probably don't even support applications like that on non-pro cards.

Also, have any of you run two video cards, from different vendors at the same time?

What is the cheapest respectably functional Cuda card for under $200? Is it possible to get one that isn't crippled. (I hear Nvidia cripples the Cuda performance of some of their cards)

They do in terms of double performance or nowadays the number of TensorCores, but that doesn't matter that much except either scientific simulations or 16 bit floating point deep learning.

Under $200 you should be easily able to get something like GTX1060 that has 3.5 TFLOPs of fp32 performance:
https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_10_series

Maybe even GTX1070 if you find a good deal.

GTX900 or older cards might be cheaper, but are sometimes more power hungry and don't support new CUDA features.
"What the large print giveth, the small print taketh away."
 

Offline janoc

  • Super Contributor
  • ***
  • Posts: 3781
  • Country: de
Re: Photogrammatry/Visual SLAM
« Reply #20 on: October 22, 2018, 09:38:11 am »
I'd really like to spend less though. Maybe a GTX1050 with passive cooling. Planned obsolescence is a crucial issue. Vendors like to arbitrarily cut off support for older cards - figure you only get a few years out of each one in a specialty use like machine learning- and they get away with it.

If you want the card to reliably die on you, do insist on passive cooling.

Modern GPUs are not designed for passive cooling and you won't find a reasonable (i.e. not heavily crippled) one that runs with a passive cooler only. There is simply no way to dissipate that much heat these things generate using only a passive heatsink. Even with fans the graphic cards are notorious for dying because of problems linked to thermal cycles and heat.

Also, what you likely don't realize that stuff like CUDA loads the card to the limit, probably even more than playing latest games. Those tend to be limited by the monitor framerate because it makes little sense to render much faster than what the monitor can display. With CUDA style calculations there is no such limit so that card is going full blast all the time.

Also you want a card that is not crippled - and then you want to buy a 1050 which is the same chip as a 1060 but with half of the CUDA cores disabled and lower clock speed.  :palm: Even 1060 is a low end card. Again, this is penny pinching in the wrong place.

I am not quite sure what you mean by planned obsolescence - if you are buying into this stuff, you must expect that you will have to replace your hardware every two years or so. A lot of the software simply doesn't run on the older cards. These are not applications designed for compatibility - a lot of this code is research code, the researchers don't care whether or not this runs for everyone. They only need the code to run for them so that they can crunch their data and publish papers.

They probably don't even support applications like that on non-pro cards.

No, CUDA and similar stuff runs just fine on GeForce consumer grade cards. You are only not allowed to use them in datacenters professionally (recent license restriction on CUDA) but the hardware runs this just fine. You don't need to buy into Quadro (those are optimized for CAD and visualization) or their Titan (specifically CUDA and machine learning oriented) cards.

Also, have any of you run two video cards, from different vendors at the same time?

That works fine unless you want to run SLI. Then it is not recommended/supported if the cards are different.

GTX900 or older cards might be cheaper, but are sometimes more power hungry and don't support new CUDA features.

I wouldn't buy a GTX9xx - they are notorious for their flakiness (black screen, freezes, crashes, etc.) and are much more power hungry (and running hotter) than the newer 10xx series. The price and performance difference between a 970/970Ti and a 1060/1070 is negligible and you get a newer generation card that runs cooler and will be supported for longer.
« Last Edit: October 22, 2018, 09:45:15 am by janoc »
 
The following users thanked this post: cdev

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #21 on: October 22, 2018, 10:00:15 am »
Actually, that is exactly why I would be getting one, that and photogrammetry.

What 'that', you mean scientific simulations? Then you'll need Tesla cards for double performance, GTX series have some double but very slow.

For machine learning fp32 performance is king, not many people use fp16 (that are used with TensorCores) yet although nvidia promotes it heavily. If you need TensorCores you need at least 20 series or TITAN V (or newest Teslas).

I'd really like to spend less though. Maybe a GTX1050 with passive cooling. Planned obsolescence is a crucial issue. Vendors like to arbitrarily cut off support for older cards - figure you only get a few years out of each one in a specialty use like machine learning- and they get away with it.

Pretty much what janoc said, find best bang per buck, no passive cooling. I suggest you research the GPUs before buying to familiarize yourself.

I don't get the planned obsolescence as well, only recently Nvidia dropped support of 500 (Fermi) series, these were released in 2011 - that's 7 years.

They probably don't even support applications like that on non-pro cards.

Like what? Like ElasticFusion? That should run on 1060 no problem and if you don't like CUDA you can always rewrite the tracking (ICP and LukasKanade) to OpenCL or OpenGL compute shaders.
 

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #22 on: October 22, 2018, 06:47:55 pm »
My current ('Ellesmere' i.e. Polaris family) card does FP16 (half precision), BTW.
Performance is good.

If you know of any portable cross-platform benchmark I can run I'd be happy to try it if I can figure it out.

There is no way I can justify buying another card until I am further along, although I would like to.

I have to wait until I have tested the waters a bit more and been successful with OpenCL based HW. Trying to work around the lack of plug and play is a good way to learn more.

I am very conservative when it comes to spending these days.

Back to photogrammetry/SLAM/3D reconstruction.

If any of you can help me by suggesting either OpenCL or OpenGL /shader based workflows I would be very grateful.

With my AMD card, I am hoping that the code will also be open, no proprietary vendor lock in. Thats very important now when we consider that this technology is poised to take so many jobs once it matures a bit. There needs to be accountability on at least some level.

Look at how evil it is for Youtube videos to be arbitrarily demonetized with no info as to why. Thats what we're going to be dealing with on a massive scale as people are replaced by AI and its binary blobs.

AI making arbitrary decisions is like carte blanche to screw anybody and everybody - as Dave says "without even a by or leave".

« Last Edit: October 22, 2018, 07:04:42 pm by cdev »
"What the large print giveth, the small print taketh away."
 

Offline lukier

  • Supporter
  • ****
  • Posts: 634
  • Country: pl
    • Homepage
Re: Photogrammatry/Visual SLAM
« Reply #23 on: October 22, 2018, 07:22:37 pm »
Back to photogrammetry/SLAM/3D reconstruction.

If any of you can help me by suggesting either OpenCL or OpenGL /shader based workflows I would be very grateful.

With my AMD card, I am hoping that the code will also be open, no proprietary vendor lock in. Thats very important now when we consider that this technology is poised to take so many jobs once it matures a bit. There needs to be accountability on at least some level.

Sorry, I played a bit with OpenCL back in 2012 and as much as I hate Nvidia's vendor lock in I hate OpenCL even more.

The reason why most research uses CUDA and pretty much entire deep learning is because OpenCL sucks big time.

OpenCL is still not yet at the level CUDA was 10 years ago (with CUDA 2.0 let's say). OpenCL 2.1 that supports more or less normal C++ is still partially rolled out and the same with SYCL (that allows more seamless mixing of host and device code). CUDA was there ages ago!

For example, with CUDA back in 2013 I was able to run fixed size dense matrix algebra using Eigen inside the kernels, with vectorization, then some Lie group algebra that's based on Eigen and then a bit later even use automatic differentiation to compute partial jacobians on the GPU, get the results back to the CPU to do Levenberg-Marquardt steps.

At that time OpenCL was only C99 with pretty much no functions, every matrix multiplication had to be written by hand and for defined types (no templates). Don't get me even started about these moronic disjoint address spaces that actually changed the type system, even if you wanted to reuse some code (with non-portable AMD C++ extension, kind of defying the point of OpenCL) it broke everything. I ran away from OpenCL as quickly as I can.

Also, I've heard that the performance-wise it is not that great, as the idea was to have the same code across the CPU, GPU and other compute platforms (DSP?FPGA?), but in practice (not surprising) one has to write and tune kernels to the type of compute device (and it's operating principles) that will execute these kernels.
 
The following users thanked this post: cdev

Offline cdevTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 7350
  • Country: 00
Re: Photogrammatry/Visual SLAM
« Reply #24 on: October 22, 2018, 09:38:12 pm »
Thank you for this much more specific information about what kinds of math are more difficult for you in OpenCL. My current setup supports version 2.1 according to the ViennaCL benchmarks but some programs just recognize it as 1.2 because I have cobbled together a sort of Frankenstein system because I don't want to have to switch to Ubuntu just to run OpenCL apps. But, I have it working, so I'm happy. The usual acceleration when an app supports OpenCL is huge, 10 times is fairly typical. I'm sure that is the case for Cuda too. Note that my current card cost me less than $200.

Do you know of any cross-platform GPU computing aware benchmarks that can be used to compare the two platforms in a hardware agnostic way?



"What the large print giveth, the small print taketh away."
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf