Add neo chat #1161

SangChengC · 2025-12-30T14:53:00Z

No description provided.

gemini-code-assist · 2025-12-30T14:53:31Z

Summary of Changes

Hello @SangChengC, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends LightLLM's capabilities by integrating the Neo Chat multimodal model. The changes encompass modifications to the KV cache management, the introduction of a specialized inference state, and the implementation of custom transformer layers and weight loading logic tailored for Neo Chat. A key feature is the advanced image processing pipeline, which includes 2D Rotary Position Embeddings and optimized Triton kernels, allowing the model to efficiently process and understand visual information alongside text.

Highlights

Neo Chat Model Integration: Introduced full support for the 'Neo Chat' multimodal model, including its unique KV cache handling, inference state, transformer layers, and weight loading mechanisms.
Multimodal Position Embeddings: Implemented a 2D Rotary Position Embedding (RoPE) mechanism for images, allowing separate positional encoding for height and width dimensions within the attention mechanism.
Custom Triton Kernels: Developed specialized Triton kernels for Neo Chat's context attention and position ID generation, optimizing performance for multimodal inputs.
Image Preprocessing Pipeline: Added a robust image preprocessing pipeline that includes smart resizing, dynamic resolution handling, and conversion to patch embeddings with 2D RoPE application.
Tokenizer and Visual Model Registration: Integrated the Neo Chat tokenizer and visual model components into the LightLLM server, enabling proper instantiation and usage of the new model.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the neo_chat multimodal model, adding new model definitions, inference logic, custom Triton kernels, and vision processing components. The changes are extensive and well-structured for integrating the new model.

My review highlights several areas for improvement to enhance code quality and robustness. I've pointed out a number of debug print statements that should be removed from the production code. There is a bare except block that could hide potential errors and should be made more specific. Additionally, a local import within a function should be moved to the top level of the file.

More critically, I've identified a comment in the tokenizer logic that indicates uncertainty about a calculation, which could point to a potential bug. I've also found that the visual model's data type is hardcoded, ignoring the configured data_type, which could lead to issues. Addressing these points will improve the correctness and maintainability of the new model's implementation.

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/llama/model.py

+        try:
+            ntk_alpha = float(os.environ.get("LIGHTLLM_NTK_ALPHA", 1))
+            assert ntk_alpha >= 1
+            if ntk_alpha > 1:
+                logger.info(f"Note: NTK enabled, alpha set to {ntk_alpha}")
+            max_seq_len *= ntk_alpha
+            base = base * (ntk_alpha ** (partial_head_dim / (partial_head_dim - 2)))  # Base change formula
+        except:
+            pass


Using a bare except: is generally discouraged as it can catch and silence a wide range of unexpected errors, making debugging difficult. It's better to catch specific exceptions that you expect might occur, such as ValueError or AssertionError, and log them for better diagnostics.

Suggested change

try:

ntk_alpha = float(os.environ.get("LIGHTLLM_NTK_ALPHA", 1))

assert ntk_alpha >= 1

if ntk_alpha > 1:

logger.info(f"Note: NTK enabled, alpha set to {ntk_alpha}")

max_seq_len *= ntk_alpha

base = base * (ntk_alpha ** (partial_head_dim / (partial_head_dim - 2))) # Base change formula

except:

pass

try:

ntk_alpha = float(os.environ.get("LIGHTLLM_NTK_ALPHA", 1))

assert ntk_alpha >= 1

if ntk_alpha > 1:

logger.info(f"Note: NTK enabled, alpha set to {ntk_alpha}")

max_seq_len *= ntk_alpha

base = base * (ntk_alpha ** (partial_head_dim / (partial_head_dim - 2))) # Base change formula

except (ValueError, AssertionError) as e:

logger.warning(f"Could not apply NTK scaling: {e}")

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/neo_chat/model.py

+        )
+        grid_h, grid_w = resized_height // self.patch_size, resized_width // self.patch_size
+        token_num = int((grid_h * grid_w) * (self.downsample_ratio ** 2))
+        # 这里的grid_h和grid_w需要* self.downsample_ratio么？再仔细看下代码


This comment indicates uncertainty about the correctness of the calculation for grid_h and grid_w. This should be investigated and clarified to prevent potential bugs. If the logic is correct, the comment should be removed. If it's incorrect, it needs to be fixed.

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/server/visualserver/model_infer/model_rpc.py

            elif self.model_type == "gemma3":
                self.model = Gemma3VisionModel()
+            elif self.model_type == "neo_chat":
+                self.model = NeoVisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval().bfloat16()


The model's data type is hardcoded to bfloat16, which ignores the data_type provided in kvargs. This can lead to unexpected behavior or errors if a different data type like float16 is intended. The model should be cast to the data type from the configuration to ensure correctness.

Suggested change

self.model = NeoVisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval().bfloat16()

self.model = NeoVisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval()

if self.data_type in ["bf16", "bfloat16"]:

self.model.bfloat16()

elif self.data_type in ["fp16", "float16"]:

self.model.half()

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/llama/model.py


        base = self.config.get("rope_theta", float(default_base))
-
+        print(f"base is {base}")


This print statement appears to be for debugging purposes and should be removed before merging.

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/llama/model.py

+            rope_scaling_factor = self.config.get("rope_scaling", {}).get("factor", 1.0)
+
+        base = self.config.get("rope_theta_hw", float(default_base))
+        print(f"hw_base is {base}")


This print statement appears to be for debugging purposes and should be removed before merging.

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/neo_chat/infer_struct.py

+                a = img["start_idx"]
+                print(f"img start_idx: {a}")


These lines, including a print statement, appear to be for debugging and should be removed.

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/neo_chat/layer_infer/transformer_layer_infer.py

+        from lightllm.models.llama.triton_kernel.token_attention_softmax_and_reducev import (
+            token_softmax_reducev_fwd,
+        )


Imports should be at the top of the file for better readability, to follow standard Python conventions, and to avoid the overhead of re-importing on every function call. Please move this import to the top of the file.

gemini-code-assist · 2025-12-30T14:55:50Z

lightllm/models/neo_chat/neo_visual.py

+
+            # must devide merge_length
+            cur_num = int(img_tensors[-1].shape[0] * (self.downsample_ratio ** 2))
+            print(f"cur_num is {cur_num}")


This print statement appears to be for debugging and should be removed.

gemini-code-assist · 2025-12-30T14:55:51Z

lightllm/models/neo_chat/vision_process.py

+    )
+    pixel_values, grid_hw = preprocess_pixel_values(transform(new_image).to(torch.float32), patch_size=patch_size)
+
+    print(f"Transfer image_size from ({image.height, image.width}) to ({new_image.height, new_image.width})")


This print statement seems to be for debugging purposes. It should be removed or replaced with a proper logger call if this information is valuable for diagnostics.

sangchengmeng added 2 commits December 26, 2025 10:21

add-neo-chat

8a67a47

add-neo-chat

fdc1369

gemini-code-assist bot reviewed Dec 30, 2025

View reviewed changes

sangchengmeng added 4 commits December 31, 2025 05:12

add-neo-chat

e8e7416

add-neo-chat

ba44983

add-neo-chat

4d41a33

fix-neo-chat

0e8845c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add neo chat #1161

Add neo chat #1161

Uh oh!

SangChengC commented Dec 30, 2025

Uh oh!

gemini-code-assist bot commented Dec 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

gemini-code-assist bot Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                self.model = NeoVisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval().bfloat16()
+                self.model = NeoVisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval()
+                if self.data_type in ["bf16", "bfloat16"]:
+                    self.model.bfloat16()
+                elif self.data_type in ["fp16", "float16"]:
+                    self.model.half()


		base = self.config.get("rope_theta", float(default_base))

		print(f"base is {base}")

Add neo chat #1161

Are you sure you want to change the base?

Add neo chat #1161

Uh oh!

Conversation

SangChengC commented Dec 30, 2025

Uh oh!

gemini-code-assist bot commented Dec 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants