Skip to content

Huawei asserts superior AI training approach compared to DeepSeek, thanks to its Ascend processors.

Huawei's advancements in Artificial Intelligence (AI) model architecture could be substantial, as the company aims to lessen its dependence on American technologies.

Advancements in AI model architecture by Huawei could be substantial, as the company aims to lessen...
Advancements in AI model architecture by Huawei could be substantial, as the company aims to lessen its reliance on American technologies.

Huawei asserts superior AI training approach compared to DeepSeek, thanks to its Ascend processors.

Update on Huawei's Advanced AI Breakthrough

A game-changing paper, penned by Huawei's Pangu team, has been unveiled recently. Comprising a core crew of 22 contributors and 56 associated researchers, the team has introduced the revolutionary concept of Mixture of Grouped Experts (MoGE). This innovation is an upgrade on the Mixture of Experts (MoE) method, which has been pivotal in DeepSeek's affordable AI models.

Although MoE offers reduced execution costs for massive model parameters and heightened learning capacity, it occasionally encounters inefficiencies, as per the paper. This issue arises due to the unbalanced activation of so-called experts, causing performance issues when operating on multiple devices simultaneously.

In contrast, the revamped MoGE method见Footnote* groups the experts during selection and better distributes the workload amongst them – results that are a stark contrast to the inefficiencies often seen in MoE.

In the realm of AI, "experts" refer to specialized sub-models or components within a broader model, each responsible for managing specific tasks or types of data. By leveraging this diverse expertise, the system can achieve enhanced performance overall.

01:38

Nvidia's CEO Huang highlights China as a crucial market during his Beijing visit, remarks on the US AI chip ban

Nvidia

*Note: Seen in Enrichment Data

The Edge of MoGE: Balancing Experts' Workloads for AI Excellence

  • Balanced Expert Workloads: MoGE sorts experts during selection, ensuring a fair distribution of workload across devices during parallel operation. This leads to superior efficiency over MoE, where certain experts are frequently activated more than others, leading to inefficiencies.
  • Boosted Throughput: By redistributing the computational load more evenly, MoGE can substantially enhance the performance of AI models, particularly in the crucial inference phase, which is vital for real-time applications.
  • Improved Scalability: MoGE proves more suitable for distributed computing environments, as it ensures each device processes a fair share of workload, amplifying overall system effectiveness when multiple devices are employed.
  • Customized for Specific Hardware: MoGE can be customized for specific hardware configurations, like Huawei's Ascend NPUs, enabling more efficient training and inference processes tailored to the capabilities of the underlying hardware.
  • Scalable Large Language Models (LLMs): MoGE is particularly advantageous for implementing complex tasks in LLMs by utilizing a diverse set of specialized sub-models or "experts," grouped for superior performance.
  • Cost-Effective AI Training: By enhancing the efficiency of AI model training, MoGE can diminish the expenses associated with large-scale AI model development and deployment, making it an essential technique for companies aiming to optimize their AI infrastructure.
  • Hybrid Approaches: MoGE endorses hybrid approaches in AI, enabling the combination of multiple techniques for better results compared to using a single approach like MoE.

Applications

Advantages

MoE vs. MoGE: Comparing the Two Techniques

| Feature | Mixture of Experts (MoE) | Mixture of Grouped Experts (MoGE) ||-----------------------|-----------------------------------------------------------|-------------------------------------------------------------------|| Expert Activation | Activated based on inputs, resulting in uneven usage. | Grouped and activated for a balanced workload. || Efficiency | Can be inefficient due to uneven expert usage. | More efficient due to better load balancing across devices. || Scalability | Less scalable due to uneven load distribution. | Highly scalable for parallel processing environments. || Hardware Optimization | Not customized for specific hardware configurations. | Optimizable for specific hardware like Ascend NPUs. |

In summary, MoGE provides significant enhancements over MoE, offering superior efficiency, scalability, and hardware optimization for AI models – paving the way for future advancements in the field.

  • The groundbreaking innovation, MoGE (Mixture of Grouped Experts), addresses the inefficiencies found in the Mixture of Experts (MoE) method by balanced workload distribution among handling specific tasks or types of data, leading to a more efficient and effective AI system.
  • Leveraging the MoGE method offers cost-effective AI training, endorses hybrid approaches, and is particularly useful for implementing complex tasks in large language models (LLMs), making it an essential technique for companies aiming to optimize their AI infrastructure.

Read also:

    Latest