Huang Renxun’s digital person debuts at GTC, and NVIDIA shows off the “meta universe” muscle

Huang Renxun’s “hands-to-hands” answering fluently, NVIDIA strongly kicked off the meta-universe and helped the company develop large models.

It was reported on November 9th that today, the NVIDIA GTC conference, the top technology event for artificial intelligence and high-performance computing, is here as scheduled.

This top AI player with a market value of $770 billion has just launched the world's smallest, most powerful, and energy-efficient next-generation AI supercomputer NVIDIA Jetson AGX Orin, which has a computing power of 200 TOPS, which is comparable to servers with built-in GPUs.

In the virtual keynote speech in the afternoon, NVIDIA founder and CEO Huang Renxun, wearing a familiar leather jacket, announced a series of latest AI technologies and products and launched a new virtual avatar platform carrying his "meta-universe" vision.

The "mini toy version of Huang Renxun" Toy-Me generated by this platform can naturally ask questions and communicate with people.

In helping companies lower the threshold for AI development and deployment, NVIDIA can be said to be the ultimate, such as providing a framework that facilitates companies to build large AI models, as well as customizing virtual assistants with exclusive voices.

Huang Renxun said that the number of NVIDIA developers is close to 3 million. CUDA has been downloaded 30 million times in the past 15 years, and it has reached 7 million downloads a year.

In addition, NVIDIA continues to express its enthusiasm for the medical and health field, launching Clara Holoscan, an AI computing platform equipped with a new generation of Orin chips, seamlessly connecting medical devices and edge servers.

Huang Renxun also announced that NVIDIA will build a digital twin model to simulate and predict climate change. The new supercomputer will be called E-2, which is Earth-Two, the digital twin of the earth that can use Million-X in the virtual world simulation engine Omniverse. It runs at a million times speed.

Jetson AGX Orin: Palm-sized, computing power comparable to servers

Since the launch of Jetson TK1 in 2014, the NVIDIA Jetson series has accumulated 850,000 developers.

Today, NVIDIA launched the world’s smallest, powerful, and most energy-efficient new-generation AI supercomputer NVIDIA Jetson AGX Orin, which is used in robotics, autonomous machines, medical devices, and other forms of edge embedded computing.

Jetson AGX Orin maintains the same form factor and pin compatibility as the previous model Jetson AGX Xavier. The processing power is increased by 6 times and the computing power reaches 200TOPS per second, which is comparable to servers with built-in GPUs, but the size is only the palm of your hand.

It uses NVIDIA Ampere architecture GPU, Arm Cortex-A78AE CPU and a new generation of deep learning and vision accelerators. High-speed interfaces, faster storage bandwidth, and support for multi-modal sensors deliver data for multiple parallel AI application pipelines.

Like previous Jetson computers, customers using Jetson AGX Orin can use the NVIDIA CUDA-X accelerated computing stack, NVIDIA JetPack SDK, and the latest NVIDIA tools for application development and optimization, including cloud-native development workflows.

The pre-trained models from the NVIDIA NGC catalog have been optimized and can be fine-tuned using the NVIDIA TAO toolkit and customer data sets. This reduces the deployment time and cost of production-level AI, while cloud-native technology enables seamless updates throughout the product life cycle.

DRIVE AGX Orin is also supported by NVIDIA Ampere architectures such as Jetson AGX Orin. It is the advanced processor behind the newly released NVIDIA DRIVE Concierge and DRIVE Chauffeur. These two AI platforms provide power for safe autonomous driving.

Software frameworks for specific use cases include NVIDIA Isaac Sim for robotics, NVIDIA DRIVE for autonomous driving, and NVIDIA Metropolis for smart cities. The latest Isaac version includes important support for the Robot Operating System (ROS) developer community.

NVIDIA also released the new NVIDIA Omniverse Replicator for Isaac Sim, which is used to generate synthetic training data for robots. These hardware acceleration software packages make it easier for ROS developers to build high-performance AI robots on the Jetson platform.

The NVIDIA Jetson AGX Orin module and developer kit will be available in the first quarter of 2022.

Huang Renxun also said in his speech: "By 2024, most new electric vehicles will have strong autonomous driving capabilities."

He showed a new autonomous driving platform DRIVE Hyperion 8 GA, which is the architecture of the 2024 model. Its sensor kit contains 12 cameras, 9-millimeter wave radars, 12 ultrasonic radars, and 1 forward lidar, all of which are processed by 2 NVIDIA DRIVE Orin chips.

According to him, NVIDIA has collected PB-level road data from all over the world and has about 3,000 well-trained markers to create training data. Nevertheless, synthetic data is still the cornerstone of NVIDIA's data strategy.

NeMo Megatron: Let companies develop their own large models

In order to facilitate the development and deployment of large-scale language models for enterprises, NVIDIA has introduced NeMo Megatron, an acceleration framework optimized for training language models with trillions of parameters.

NVIDIA NeMo Megatron is developed on the basis of Megatron. Megatron is an open-source project led by NVIDIA researchers to study the efficient training of large-scale Transformer language models. Megatron 530B is the world's largest customizable language model.

Using advanced data, tensor, and pipeline parallelization technology, it can effectively distribute the training of large language models on thousands of GPUs.

Companies can use the NeMo Megatron framework to further train it to serve new fields and languages. After optimization, the framework can be extended on the large-scale accelerated computing infrastructure of NVIDIA DGX SuperPOD.

In addition to NeMo Megatron, NVIDIA also introduced a framework for developing Physics-ML models, NVIDIA Modulus.

It uses physical principles and data derived from principle-based physics and observations to train the Physics-ML model supports multi-GPU multi-node training, and the resulting model has a physical simulation speed 1000-100,000 times faster than simulation.

Scientists can use Modulus to create digital twin models to solve important scientific problems such as predicting climate change.

For example, researchers use the ERA5 atmospheric data from the European Center for Medium-Term Weather Forecast to train the Physics-ML model. The model requires 4 hours to train on 128 A100 GPUs. The trained model can predict the severity and path of a hurricane with a spatial resolution of 30 kilometers. .

The prediction, which originally took 7 days to complete, now only takes 0.25 seconds on a GPU, which is 100,000 times faster than simulation.

To help companies accelerate their AI journey, NVIDIA announced the expansion of its LaunchPad program worldwide, which allows users to instantly access NVIDIA AI software running on accelerated infrastructure. Enterprises can use NVIDIA LaunchPad to experience the development and deployment of large-scale language models for free.

The LaunchPad program is supported by Equinix services, including data center, connectivity, and bare metal products. After getting the LaunchPad experience, companies can run their NVIDIA-accelerated artificial intelligence workloads at Equinix locations around the world.

Riva custom voice: quickly create a customized version of the brand voice

NVIDIA's Riva voice AI software also has new developments. Huang Renxun announced a new feature of the software-Riva custom voice.

Riva can recognize 7 languages ​​including English, Spanish, German, French, Japanese, Mandarin, and Russian, and can generate closed captions, translations, abstracts, answer questions and understand intent.

With only 30 minutes of audio data training, companies can build their own brand ambassador's voice and obtain human-like expressiveness.

In other words, users can customize virtual assistants with unique voices according to specific fields or industry terms.

In less than three years, NVIDIA’s Dialogue AI software has been downloaded more than 250,000 times and has been widely adopted in various industries.

For small-scale R&D, the NVIDIA NGC container registry provides NVIDIA Riva for free, and developers can join the Riva open beta program to try out the software.

For customers who have large-scale deployments and seek NVIDIA expert technical support, NVIDIA announced the NVIDIA Riva Enterprise plan, which is expected to be launched early next year.

Omniverse Avatar: Build a vivid and intelligent virtual avatar

The next step of the virtual assistant is to have common sense, reasoning ability, and vivid visual image.

At the GTC conference, Huang Renxun announced the launch of an all-around virtual avatar platform-Omniverse Avatar.

It is the master of NVIDIA's series of advanced AI technologies, integrating the perception capabilities of Metropolis, the voice recognition capabilities of Riva, the recommendation capabilities of Merlin, and the animation rendering capabilities of Omniverse.

This allows developers to build a fully interactive virtual avatar that is vivid enough to respond to voice and facial prompts, understand multiple languages, and give intelligent suggestions.

Huang Renxun showed some examples.

For example, its doll replicas can be fluent with people.

Metropolis engineers used Maxine to create the Tokkio smart console application, which makes the console highly interactive and can quickly respond to conversations.

In the restaurant, when two customers order food, a customer service avatar can talk to them and understand their needs.

These demonstrations are supported by NVIDIA AI software and Megatron 530B, which is currently the world's largest customizable language model.

In the demonstration of the DRIVE Concierge AI platform, the digital assistant on the central dashboard screen can help the driver choose the best driving mode to reach the destination on time, and then follow his request when the car's cruising range drops below 100 miles Set reminders.

The Maxine project emphasizes real-time translation and transcription in multiple languages.

With Maxine, this person's words are not only transcribed but can also be converted into German, French and other languages ​​in real-time with the same voice and intonation.

Maxine uses computer vision to track people's faces and recognize their expressions. 3D animation can make virtual and realistic avatars for them.

As you can imagine, in enterprises and developers, every industry needs some form of an avatar.

Using the Omniverse Avatar platform, you can build customized AI assistants for video conferencing and collaboration platforms, customer support platforms, content creation, application revenue and digital twins, robotic applications, and more.

NVIDIA's virtual world simulation Omniverse is the key platform for creating virtual worlds. From robots, self-driving fleets, warehouses, industrial plants to entire cities, they can be created, trained, and run in the Omniverse digital twin.

Huang Renxun said that Omniverse is designed for data center scale and is expected to reach global data scale one day.

Ericsson is building a digital twin environment throughout the city to help determine how to place and configure each site for optimal coverage and network performance. It can perform realistic remote simulations of the entire 5G network.

AI reasoning: Triton reasoning, server helps real-time large model reasoning

At present, more than 25,000 customers such as Microsoft, Samsung, and Snap are using NVIDIA's AI inference platform.

Today, NVIDIA launched the NVIDIA Triton inference server with a multi-node distributed inference function and the NVIDIA A2 Tensor Core GPU accelerator.

NVIDIA A2 GPU is an entry-level, low-power compact accelerator suitable for inference and edge AI in edge servers, with inference performance 20 times higher than CPU.

This update of the NVIDIA AI inference platform includes new features of the open-source NVIDIA Triton inference server software and an update to NVIDIA TensorRT.

The multi-GPU and multi-node features in the latest NVIDIA Triton inference server enable large-scale language model inference workloads to be expanded on multiple GPUs and nodes in real-time.

With the Triton inference server, Megatron 530B can run on two NVIDIA DGX systems, reducing the processing time from more than 1 minute on the CPU server to 0.5 seconds, making it possible to deploy large language models in real-time.

In terms of software optimization, the model analyzer of Triton inference server, the new tool can automatically select the best configuration for the AI ​​model from hundreds of combinations to achieve optimal performance while ensuring the quality of service required by the application.

RAPIDS FIL is a new backend for GPU or CPU inference of random forest and gradient boosting decision tree models. It provides a unified deployment engine for developers to use Triton for deep learning and traditional machine learning.

Triton integrates with AWS, Alibaba Cloud and other platforms, and supports optimization of AI inference workloads on various generations of GPUs, x86 CPUs, and Arm CPUs. NVIDIA AI Enterprise also integrates Triton.

NVIDIA AI Enterprise is an end-to-end software suite optimized, certified and supported by NVIDIA for the development and deployment of AI. Customers can use it to run AI workloads on mainstream servers in local data centers and private clouds.

NVIDIA’s flagship TensorRT inference engine has also been updated and is natively integrated into TensorFlow and PyTorch. With just one line of code, it can provide 3 times faster performance than in-frame inference.

NVIDIA TensorRT 8.2 is the latest version of the SDK, which can run language models with billions of parameters in real-time.

NVIDIA also announced the use of NVIDIA AI and Azure Cognitive Services in Microsoft’s meeting software Teams.

Microsoft Azure Cognitive Services provides cloud-based APIs for high-quality AI models to create smart applications. They are using Triton to run a speech-to-text model to provide Microsoft Teams users with accurate real-time subtitles and transcriptions.

Microsoft Teams has nearly 250 million monthly active users. The NVIDIA GPU and Triton inference server on Microsoft Azure Cognitive Services use 28 languages ​​and dialects, combined with AI models to help improve the cost-effectiveness of real-time subtitles and transcription.

Mavenir announced the MAVedge-AI intelligent video analysis powered by the NVIDIA Metropolis AI-on-5G platform to accelerate enterprise artificial intelligence. The solution is expected to be available to customers in early 2022.

Datacenter: new network security features

For data centers, Huang Renxun announced the launch of BlueField DOCA 1.2 to support new network security functions, hoping to make BlueField an ideal choice for the industry to build a zero-trust security platform.

There are currently 1,400 developers developing on BlueField, and now the network security company that adopts BlueField can provide zero-trust security as a service.

NVIDIA also invented Morpheus, a deep learning network security platform, to monitor and analyze network behavior.

It is built on NVIDIA RAPIDS and NVIDIA AI, and its workflow will create AI models and digital fingerprints for each combination of applications and users, and learn its daily patterns and find abnormal operations. These abnormal operations will trigger a safety warning and remind the analyst to respond.

Bluefield, DOCA, and Morpheus are all part of a full-stack acceleration AI solution for data centers. NVIDIA will provide its network security partners with a zero-trust security platform to improve security and application performance.

Bluefield sits on the network and provides the Morpheus AI platform with all the activities that occur in the data center. Morpheus is a deep learning network security platform that can monitor and analyze all information from each user, machine, and service.

NVIDIA also announced the Morpheus Early Access 2 version today.

Morpheus created a pre-trained user activity fingerprint model. When these fingerprints change, it can recognize in real-time that an abnormal transaction is taking place, create a security alert that suspicious behavior is taking place, and isolate the activity, and be reminded.

Medical and health: cooperate with the cancer center, launch new robot platform

In the field of medical and health care, NVIDIA announced cooperation with a number of advanced cancer centers to bring the power of AI to cancer treatment. These cancer centers will use NVIDIA DGX to accelerate the development of AI models.

Many medical device companies are integrating AI and robotics into them, using NVIDIA accelerated computing platforms in robotic surgery, mobile CT scanning, and bronchoscopy.

In order to accelerate the application of AI medical equipment, NVIDIA launched a new computing platform NVIDIA Clara Holoscan for the medical and health industry.

Holoscan is NVIDIA's third robotic platform after Isaac and Drive. It can provide the computing infrastructure needed for scalable, software-defined, end-to-end streaming media data processing medical devices.

The platform integrates NVIDIA AGX Orin and ConnectX-7, FP32 computing power reaches 5.2TFLOPS, AI computing power reaches 250TOPS, and 740Gbps high-speed IO is used to connect sensors.

After adding RTX A6000 Ampere GPU, you can get another 39TFLOPS (FP32) and more than 600TOPS AI inference performance.

Clara Holoscan is an end-to-end platform that seamlessly connects medical devices and edge servers. It can help developers create AI microservices to run low-latency streaming applications on the devices and transfer more complex tasks to data center resources. .

With Clara Holoscan, developers can customize applications and fully add or reduce computing and input/output functions in their medical devices as needed, thereby balancing the requirements for latency, cost, space, performance, and bandwidth.

Clara Holoscan SDK supports this work through acceleration libraries, AI models and reference applications such as ultrasound, digital pathology, and endoscopy to help developers take advantage of embedded and scalable hybrid cloud computing.

In terms of drug discovery, Canadian AI pharmaceutical startup Entos invented OrbNet, a deep learning architecture that uses physical machine learning methods to train graph neural networks, replacing the expensive interatomic forces in molecular simulations, and increasing the speed of molecular simulations by 1,000 times.

Quantum-2: The most advanced end-to-end network platform in history

During the GTC, NVIDIA also announced the next-generation NVIDIA Quantum-2 platform, which can perform cloud-native supercomputing.

The network platform consists of NVIDIA Quantum-2 switch, ConnectX-7 network adapter, BlueField-3 data processing unit (DPU), and all software supporting the new architecture. ConnectX-7 will come out in January next year.

Among them, the Quantum-2 InfiniBand switch is based on the new Quantum-2 ASIC, using TSMC's 7N node, containing 570 transistors, more than the A100 with 54 billion transistors.

Quantum-2 InfiniBand has 400Gbps, the network speed has doubled, the switch throughput has increased by 2 times, the cluster scalability has been increased by 6.5 times, and the power consumption of the data center has been reduced.

Its multi-tenant performance isolation utilizes an advanced telemetry-based congestion control system to ensure reliable throughput, regardless of a surge in users or a surge in workload demands, so as to prevent the activities of one tenant from interfering with the activities of other tenants.

Compared with the previous generation, the third-generation SHARPv3TM network computing technology has a switch computing capacity that is 32 times higher than the original one, which is used to accelerate AI training.

New acceleration library: optimize route planning and accelerate quantum simulation

Finally, let's take a look at 3 new acceleration libraries launched by NVIDIA.

The first is NVIDIA ReOpt, which is an accelerated solver for operations research optimization problems that can realize real-time route planning and optimization.

Take Domino’s Pizza, which is cooperating with NVIDIA, as an example. There are 87 billion routes for delivering 14 pizzas, which means that it’s not easy for Domino to deliver pizzas within 30 minutes.

Operational optimization is necessary for "last mile" delivery. Route planning is an extremely difficult logistics problem. If applied to the industry, even small-scale route optimization can save billions of dollars.

Huang Renxun demonstrated an NVIDIA Omniverse virtual warehouse to show the impact of optimized routes in automatic order picking scenarios. The optimized planning can save half the time and distance for order picking.

After the current route optimization solver receives a new order, it takes hours to re-run and respond, while ReOpt can continue to run and dynamically re-optimize in real time, responding to and expanding to thousands of positions in just a few seconds.

The second is the cuQuantum DGX device, which is equipped with an acceleration library for quantum computing workflow and can use state vector and tensor network methods to accelerate quantum circuit simulation.

Google Cirq will be the first quantum simulator to be accelerated.

With the aid of this equipment, simulations that used to take several months can now be completed in a few days.

The NVIDIA research department has achieved an important milestone in quantum algorithm simulation, using 1688 qubits for 3375 vertex sets to solve the MaxCut problem.

This is the largest accurate quantum circuit simulation ever, 8 times more qubits than previous simulations.

The cuQuantum DGX device will be launched in the first quarter.

The third acceleration library is the large-scale acceleration calculation cuNumeric in the PyData and NumPy ecosystem, which allows users to transparently accelerate and extend NumPy workflows on supercomputers with Python code without changing the code.

It belongs to the NVIDIA RAPIDS open source Python data science suite. RAPIDS has been downloaded more than 500,000 times this year, which is more than 4 times more than last year. NumPy has been downloaded 122 million times in the past 5 years. Don't use it for nearly 800,000 projects on GitHub.

In the famous CFD Python teaching code, cuNumeric can be expanded to 1000 GPUs, and the expansion efficiency is only 20% less than the linear expansion efficiency.

Concluding remarks

Riding on the east wind of AI, high-performance computing, and meta-universe, NVIDIA has had ups and downs this year, and its market value has skyrocketed, surpassing $770 billion. Its Omniverse platform is regarded by analysis organizations as an important platform expansion strategy for NVIDIA.

Behind this superficial scenery, NVIDIA's vision and foresight should not be underestimated. Whether it is AI in full swing or the virtual world in the ascendant, NVIDIA can become a direct beneficiary of the technology boom, and it is inseparable from the polishing of its software and hardware products over the past years.

During the NVIDIA GTC conference, we will also see more recent developments in deep learning, data science, high-performance computing, robotics, and other fields. Accelerated computing starting with NVIDIA CUDA is catalyzing efficiency improvements in these fields and promoting modern technology. Evolve rapidly and move towards the future.

Post a Comment