A photosensitive patch of cells could be wired directly to motor cells/muscles on the opposite side, which would allow the organism to swim toward the light (maybe useful for feeding or migrating, etc.)
The "wiring to muscles" is derived from the ability of adjacent cells to communicate by chemical signals.
This communication ability has evolved before the multicellular animals, in the colonies of unicellular ancestors of animals (e.g. choanoflagellates).
The intercellular communication is a prerequisite for the development of multicellularity, like a common language is a prerequisite for a group of humans to be able to work as a team.
In an unicellular organism, a part of the cell senses light and another part, like flagella or contractile filaments reacts, moving the cell. In a multicellular organism, a division of labor appears, the cells from the dorsal side of the animal sense first light and other stimuli from the environment, so some of them specialize as sensory cells. Originally, the cells from the ventral side were more effective for locomotion, by using either cilia or propulsive contraction waves, so some of them specialized for locomotion, becoming motor cells, either muscles or ciliary bands (which in many simple animals are more important than muscles).
With this division of labor, the older intercellular communication methods have been improved, resulting in synapses between the sensory cells and the motor cells, which ensure that a chemical message that is sent reaches only the intended recipient, instead of being broadcast into the neighborhood.
For better reactions to external stimuli, the behavior of the sensory cells had to be coordinated, e.g. even when light is sensed only on one end of the animal, for the entire animal to move an appropriate command must be sent to all motor cells, not only to some of them, which has lead to synapses between the sensory cells themselves, not only between sensory cells and motor cells.
Eventually, there was a further division of labor, a part of the sensory cells has specialized to be middlemen, i.e. to relay the sensory information between the cells that have actually received it and the motor cells. These third kind of cells have become neurons. Initially the neurons were in the skin, together with the sensory cells from which they had derived, but later they migrated inside the body, where eventually they formed ganglia instead of a diffuse net, because this minimizes the reaction times, by shortening the connections between neurons, leading to a centralized nervous system.
They didn't need to come about at the same time. Photosensitive proteins (opsins) and cellular motility both predate multicellular life entirely. Even single-celled euglena detect light and swim toward it with no nervous system at all.
In early multicellular animals, cells were already chemically signaling their neighbors. A photosensitive cell releasing a signaling molecule near a contractile cell isn't a coordinated miracle. It is just two pre-existing cell types sitting next to each other in tissue, which is what bodies are. Natural selection then refines that crude coupling because even a tiny, noisy light response is better than none.
Each piece, light-sensitive proteins, cell-to-cell signaling, contractile cells, evolved independently and for other reasons long before being co-opted into anything resembling vision. The question "how could A and B arise simultaneously?" dissolves once neither A nor B was new.
Stated clearly (0) has recently started a fantastic series about evolution that aims to explain bacterial flagella. It starts from basic principles and aims to answer questions like yours in evolutionary biology.
A fairly simple chemical reaction could cause an organism to turn or move toward or away from light in the ocean, with various imaginable benefits.
And note that box jellyfish have 24 eyes, some of them highly complex, but no brain. You can look into their behavior to find out what they do with the information.