David Demitri Africa

Research

Below is all currently published research work I'm an author on. Each entry includes a link to the paper, a short summary, and some personal thoughts.

  1. Lag and Duration of Leader–Follower Relationships in Mixed Traffic Using Causal Inference

    Summary: This paper implements a causal inference approach to analyze leader-follower dynamics in an arterial road in Chennai, India. We quantify the temporal lag and duration of interactions using transfer entropy metrics.

    My thoughts: This was my first paper. I learned a lot about how to write research in general from here, and in the future I think I would like to do more papers in this style—analyzing some weird real world phenomenon with an interesting method. Published in Chaos.

  2. Batayan: A Filipino NLP benchmark for evaluating Large Language Models

    Summary: This paper introduces Batayan, a benchmark to evaluate LLMs on NLP tasks in Filipino.

    My thoughts: Most of the work here was in actually writing/re-translating the entries. Would be nice to do some in-depth error analysis ala Parser Showdown at the Wall Street Corral. Published in ACL 2025 Main Conference.

  3. Identifying a Circuit for Verb Conjugation in GPT-2

    Summary: Looking for a circuit in GPT-2 that does subject-verb agreement. We find one, but it gets progressively larger as the SVA task gets more complicated.

    My thoughts: Final project for L193: Explainable Artificial Intelligence. Thinking of a place to submit this.

  4. Learning Modular Exponentiation with Transformers

    Summary: We teach a small 4-layer transformer modular exponentiation. PCA on embeddings doesn't show any clear structure, but we do find a cool example of grokking by multiples of moduli. Also, we find a small circuit that performs regular/normal exponentiation.

    My thoughts: Final project for R252: Theory of Deep Learning. Thinking of a place to submit this.

  5. Learning Dynamics of Meta-Learning in Small Model Pretraining

    Summary: If you replace half of the steps in language model pretraining with a meta-task, what does the model learn? Model achieves better loss, improves the vanilla model's F1 on NER, and has this really interesting phase transition.

    My thoughts: This is one half of my MPhil thesis. Really proud of Figure 6 here.

  6. Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

    Summary: At what point in pretraining does meta-pretraining start to improve zero-shot cross-lingual named entity recognition (NER) in Filipino and Tagalog? If you fine-tune every checkpoint from pretraining step 0 to 6000, you find some actual reuse of knowledge from the model's backbone.

    My thoughts: This is the other half of my MPhil thesis. Have submitted this to a workshop somewhere. I think Figures 4 to 7 look nice.

  7. No Answer Needed: Predicting LLM Answer Accuracy from Question-Only Linear Probes

    Summary: Can we predict the accuracy of LLM answers using model internals, even before the answer is generated? We find that a simple linear probe on activations can achieve surprisingly good performance.

    My thoughts: Worked on this with MARS 2.0 people. Nice graphs.

  8. Investigating ReLoRA: Effects on the Learning Dynamics of Small Language Models

    Summary: We study the effects of ReLoRA on the learning dynamics of small language models. Our experiments show that ReLoRA isn't that helpful.

    My thoughts: Yuval's thesis. I like the conclusions.

  9. Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

    Summary: We investigate the phenomenon of inoculation, where appending even a short system prompt in fine-tuning suppresses this behaviour in general deployment.

    My thoughts: Daniel Tan is very agentic.