Research partnerships
We work with teams that need a measurement-grade view of how models behave on real configurable hardware—not only on GPUs or approximate analytical models. We are especially interested in hardware, compiler, and software–hardware research where the stack can be co-tuned end to end.
Partnership shapes (not limited to)
The following are examples—we are open to other structures that fit your institution or program.
- Joint proposals to public or private funding — co-authored programs where LeoGreenAI contributes hardware, compiler, and measurement expertise.
- Shared evaluation on FPGA (or agreed platforms) with defined metrics and CSM capture
- Technology access, benchmarking, or reference designs under NDA
- Open or shared datasets and reproducibility artifacts tied to hardware-accurate runs
- Workshops, tutorials, or curriculum materials built on the configurable stack
Research themes & fronts
We are interested in a wide range of hardware–software research—not limited to the categories below. They are examples of what partners could explore with us today.
If your team builds agents, foundation models, or specialized networks, LeoGreenAI is a strong fit when you need faster inference, lower energy, or tighter power–accuracy tradeoffs grounded in real silicon. We offer a measurement-grade path: configurable LEO hardware plus compiler and CSM visibility so you compare architectures and mappings against actual execution, not only GPU baselines or abstract cost models.
If you care about hardware effects not yet reflected in CSM, our architectural design team can work with you to add the access and visibility you need on short turnaround—focused spin-outs (extra counters, signals, or light RTL hooks matched to your question) are often feasible in days, not quarters. That responsiveness is part of the value of partnering directly with the team that owns the core stack.
Model design, agents, NAS & efficiency search
A major partnership focus: companies and labs that develop models and agents and want to raise efficiency, cut latency, or optimize under power budgets with hardware-in-the-loop evidence.
- Neural architecture search (NAS) — search loops closed with hardware-accurate latency and throughput from CSM, not surrogate objectives alone.
- Hardware-aware NAS — joint search over architectures and compiler mappings using graph metadata plus CSM.
- Energy-aware NAS — Pareto fronts for accuracy vs measured energy per inference or per operation on target configurations.
- Quantization & mixed precision — QAT and operator-level precision choices validated on real arrays and memory paths.
- Sparsity, pruning & compression — what actually accelerates on silicon vs what looks good in simulation.
- Efficient attention & long context — sliding-window, low-rank, and hybrid layouts; stress on memory and instruction issue.
- Edge vs datacenter regimes — different bottlenecks; same toolchain, different measurement stories.
Scientific & numerical computing
Beyond mainstream CNNs and transformers, many computational science pipelines reduce to large-scale linear algebra—dense or sparse matrix–matrix multiply, matrix–vector products, tensor contractions, or convolution-like stencils on structured grids. Where those kernels dominate runtime, the same compiler-to-silicon observability applies.
- Genomics & sequence analysis — alignment, indexing, and scoring workflows (e.g. seed-and-extend, pairwise alignment, k-mer statistics) that spend their time in batched linear algebra and irregular memory access amenable to structured mapping studies.
- Sparse & structured linear systems — Krylov solvers, preconditioning, and graph-structured operators arising from discretized PDEs or scientific simulations—candidates when expressed in forms the toolchain can target.
- Spectral & transform methods — FFT-class and related transforms that stress memory bandwidth and regular datapaths.
- Other domain codes — physics, chemistry, or engineering kernels expressible as heavy GEMM, tensor ops, or stencil iteration; partners use LEO as an acceleration and measurement substrate comparable in spirit to HPC GPU evaluation, with LeoGreenAI’s ISA and counters in the loop.
System integration & scale-out
- 3D integration & vertical stacks — 3D-stacked silicon, cache-on-die and DRAM-on-die (or closely stacked memory), interposers, and hybrid bonding; how proximity changes bandwidth, latency, thermals, and power delivery for sustained compute.
- Photonic interfaces — optical links at core-to-core, core-to-memory, and cluster distances; electrical–optical boundaries and what the compiler and CSM can observe.
- Pods and networks of pods — multi-accelerator partitions, placement, and collectives; measurement narratives that span a pod and beyond.
- Heterogeneous racks — mixing link technologies and offload assumptions; end-to-end latency under realistic schedulers and traffic.
Memory, data movement & near-data compute
- AI-enabled DRAM and HBM — smarter memory controllers, on-die or near-die logic, and how the host and accelerator see bandwidth, latency, and reliability.
- Near-memory and in-memory compute — staging and light compute next to DRAM or HBM; traffic classes for weights, activations, and partial results.
- Tiered memory — HBM, DDR, and expansion-class attach; capacity–bandwidth tradeoffs for large models and batches.
- Memory-wall studies — burst policies, interface width, and scheduling that hide or expose DRAM/HBM latency behind compute.
Sparse arrays, analog & hybrid compute
- Networks of sparse arrays — structured and unstructured sparsity mapped across many units; hardware-visible skip and utilization.
- Photonic or analog co-acceleration — fast analog or photonic front ends for bulk linear phases, paired with the digital LEO core where programmability matters (e.g. attention-heavy regions).
- Digital–analog partitioning — precision, noise, and calibration limits; where configurable digital work must take over.
Compiler, mapping & co-design
- Fusion, tiling & schedule exploration — maximizing realized parallelism under structural hazards.
- Cross-stack optimization — co-tuning graph shape, precision, and hardware configuration with compiler feedback.
- ISA evolution — new macros, fusion rules, or counter extensions grounded in partner workloads and silicon experiments.
Trust, reliability & deployment-oriented research
- Robustness under corners — voltage, temperature, and process stress; inference behavior and error profiles on configurable hardware.
- Security-sensitive evaluation — repeatable captures, bitstream provenance, and controlled lab workflows.
Again, these themes are not exhaustive—we welcome directions that are not listed. For compile-time knobs across core, memory, and toolchain, see Configurability.
Hardware-accurate feedback
Ownership of compiler lowering, ISA generation, and the CSM path means partners can line up intended tilings and streams with observed issue, stalls, memory effects, and end-to-end time. That discipline supports fair comparison of models or compiler variants on a given silicon configuration.
Configurable stand-in for other targets
The LEO execution core is compile-time configurable across core dimensions, data-path widths, buffers, and memory interfaces. Partners use that freedom to explore families of hardware behaviors—within LeoGreenAI’s architecture—while keeping feedback grounded in FPGA or future silicon rather than pure simulation of an external training environment.
How engagements usually start
- NDA and technical deep-dive on your model classes and measurement goals.
- Evaluation plan on FPGA (or agreed platform) with defined CSM capture formats.
- Optional roadmap toward custom counter extensions or compiler hooks for your workflow.