
Vision-Language-Action Model
The VLA model combines vision, language and action together, enabling the smart driving system to not only recognize and describe the road, environment, traffic signs, and road participants, but also understand complex scenarios such as negotiation and hidden semantic information, with advanced logical reasoning capabilities.





End-to-end Model
Perception, prediction, planning and other modules are combined into one neural network in end-to-end model. Trained with numerous video clips, the smart driving system is capable to learn, think and analyze on its own to handle complex driving tasks.










-
Detection
-
Object tracking
-
Late fusion
-
Prediction
-
Decision
-
Planning
-
Control
-
Mapping
-
Localization
-
Prediction
-
Mapping
-
Localization
-
Decision
-
Planning
-
Multi-sensor fusion
-
Control
General Perception Net
Prediction Planning Net
-
Control
Initial road test of end-to-end model
Deploy VLA Model on consumer cars
Rule-based
More engineering, adequate data
Learning-based
Less engineering, more data
Data loop
With the help of map providers, We have complete data process including collection, labeling, cleansing, tagging, quality assurance, model training, test validation, and more. The data loop learns continuously, enabling smart driving systems to iterate and improve autonomously.

Model training

Numerous data

Data mining