How can AI know what part of an image is important if it does not know what the entire image is?
What if the part that AI does not recognize is the important part?
That is "the system as a whole" example I noted in my last comment, and (IMO) also applies to the Apollo example.
A key factor in the Apollo flights was fuel consumption. The system has to know or be told which parts of the system need to be in use right now, and turn off the others that aren't in use. If not the system will a) do things it shouldn't and/or b) use precious fuel doing things it shouldn't.
FSD AI has to know the whole of the situation when driving to know which subsystems can be ignored or turned off (e.g. don't worry about running the windshield wipers if it is not raining, don't worry about ice when the ambient temperature is 25°C).
Exchanging information between cars will require (what we call in the systems industry) "handshaking." My car will have to establish communication with your car, confirm that we are able to exchange data and then exchange data to (for example) avoid or attenuate the damage from a head-on collision. This would need to happen in milliseconds. No trivial task.
HOW to avoid that head-on collision would still require that each car have at least some autonomous systems and awareness.
Getting to an industry standard for such a handshaking protocol alone will take years or decades. If you take an example on how long it has taken to establish a single phone charger standard you can see it took - oh wait, we still don't have one. Nor do we have a single standard for EV plugs.
35+ years in systems design has taught me all to well that the devil really is in the details. We've come a really long ways with autonomous driver's aids. Moving to FSD and/or auto-interaction is a huge task orders of magnitude more difficult than the (again, very impressive) gains we've made so far.