On Child Learning And Machine Learning | 2 months

By Gabriel, 30 Apr 2024 , updated 05 May 2024

This blog post is a collection of observations I have made on the intellectual development of children and what is the insights it can give us for artificial intelligence research. My third child is born in the beginning of that year 2024 and I'm fascinated by the glimpse of emerging intelligent behavior one can see when observing a baby slowly growing into a kid over several years. This is the first part, 2 months old.

Introduction

Using child learning as an analogy for machine learning or more generally artificial intelligence research is a common practice. We do not have a lot of other form of intelligence so close from us to study. Adult behavior is probably the result of years of learnings, teaching experiments (years of life really!) and it can be difficult to reverse-engineer what logical chain of thoughts led from an stimulus to a reaction. On the other side a kid can be seen more like a blank canvas and that same logical chain of thoughts will be easier to understand. So it can provide good insights for artificial intelligence research. At least it may be one way to reach human-level intelligence with a machine.

I have 3 children: one 5-years-old, one 3-years-old and one 2 months-old. As a father, I’m fascinated by the glimpse of emerging intelligent behavior one can see when observing a baby slowly growing into a kid over several years.

Recently I have read insightful hypothesis about child learning by Yann LeCun, co-receiver - with Yoshua Bengio and Geoffrey Hinton - of the Turing Award for his work on deep learning. He discusses the order of magnitude in the quantity of information handled by today popular biggest LLM on one hand and a child on the other hand.

There is a lot of hype at the moment about the possible applications of Large Language Models, or LLM (with chatGPT still being the most popular of them all), hence this comparison is a good illustration today.

Large Language Models are traditionally trained on text data and when used: take text data as input and produce text data as output. Today a product like GPT-4 from OpenAI blurs the line between text-only-based models and models using other media: video, image, audio: that product have image as input or output. But let’s put that aside for now, we will assume that the language or text part is still the main part of a LLM.

In this hypothesis, LeCun highlights the fact that text data is only small quantity-wise compared to all the data available to a child through his vision. It worth to mention that LeCun breakthrough work in deep learning was in the field of computer vision. So he is probably sensitive to the importance of visual data in learning.

He repeated more or less this idea in several media, but here is one occurrence in a linkedIn post in February(?) 2024

I’ve made that point before:

LLM: 1E13 tokens x 0.75 word/token x 2 bytes/token = 1E13 bytes.

4 year old child: 16k wake hours x 3600 s/hour x 1E6 optical nerve fibers x 2 eyes x 10 bytes/s = 1E15 bytes.

In 4 years, a child has seen 50 times more data than the biggest LLMs.

1E13 tokens is pretty much all the quality text publicly available on the Internet. It would take 170k years for a human to read (8 h/day, 250 word/minute).

Text is simply too low bandwidth and too scarce a modality to learn how the world works.

Video is more redundant, but redundancy is precisely what you need for Self-Supervised Learning to work well.

Even if we (including LeCun) know that a child will process data from his other senses (4 others) as well, this is a great idea to suggest some metrics here.

Observations

So I decided to compare those data to… my 2-month-old daughter! Someone with whom I naturally spend a lot of time now! Not a thorough scientific review but there are interesting facts that all parents will probably recognize.

The first month of her life seems a write-off when considering sight-powered learnings: she slept most of the time and wake up to feed every 2-3 hours, and for 30-45 min. And she would go back to sleep a few minutes after the feed. Day and Night. Very little time to process vision data through her sight. Hence focusing around the 2 months mark, when she starts having more significant awake time, bring us more interesting observations. It is really fun to see her sight starting to focus on subject and follow them.

During feeding time she would have eyes open, but it looks like she’s just staring into the void. It seems that there is no “vision recognition” in place at that time. Maybe the eyes are just getting used to be opened and receiving light or maybe something else, I don’t know. And it appears that to find the nipple to attach she doesn’t use vision at all, maybe more smell and touch senses driven by newborn reflexes …and Mum’s help!

At night she would have 1 or 2 feeds, but those go quickly and the goal is to put her back to sleep just after, so will consider no awake time available to process vision data at night time. During the day, between 8am to 8pm. she’ll have 5 cycles of feed, awake and sleep. She’ll be awake during 1 or 1.5 hour each time. Consider the higher estimation: that is 1.5h X 5 cycles X 30 days = 225 hours of cumulative wake hours for a 2 month-old baby.

The favorite subject seems to be people faces, which she would observe for several minutes at a time. There is a lot of redundancy in her observations, how she diligently observes Mum (and Dad) faces multiple times per day. And she does look very focus when doing that: eye wide open and her point of gaze seems to fluctuate between the eyes, the nose and the mouth of the subject.

In a normal awake state with static objects in front of her, or a known faces with light smile or neutral face, she would casually observe in her “normal” focus, often gesticulating arms and legs. If the face start to displayed a new or more active expression, or maybe making new noise, I can see that her limbs gesticulation slow down a lot or stop and she starts to focus more intensively, with more open eyes.

Once or twice when we visited a health professional wearing a mask, we could see a slight different expression on her face when she was looking at her/him: confusion.

I don’t know how she is getting the smile, but at this age I found as well that she would respond to smiley face by smiling too. Unexplainable for me (for she doesn’t have a mirror) but quiet magic as a parent!

Beside the learning from the sight sense that we discussed so far, I suspect that she simultaneously have to deal with a lot of learning from the touch sense, both external and internal (Not sure if internal touch make sense, but I meant what one feel inside his body). That learning is probably felt in a more dramatic way for her, as it is directly connecting to comfort/discomfort and pain: hungriness, too hot, too cold, wet nappy. Here again there is a lot of repetition or redundancy as one would call it when applied to machine learning. And the fact that is so directly linked to pleasure and pain signal, may reinforce the learning somehow.

Findings

Vision data input certainly increases with the age but it does start small. When applied LeCun back-of-the-envelope calculation, that is 225 hours x 3600 s/hour x 1E6 optical nerve fibers x 2 eyes x 10 bytes/s = 1.62E13 bytes. And as he suggests, I feel that there are a lot of redundancies in the data ingested.

Interestingly that is on par with his estimation of quantity of data ingested by today’s LLM, not much less or not much more. A 2 month-old baby is not a final product :-) and much more data will be processed until the child reach 4 years old and will be in a position to demonstrate clear human-intelligent behaviors. I intend to write other observations as she grows.