Most MMORPG/RPG cities could be described as extended villages. There is little complexity, few NPCs and minimal infrastructure.
However a more realistic approach would require huge cities, town with thousands of NPCs, interactive buildings(could be instanced) and comparatively more complex infrastructure.
The drawback is that the above creates a strain on hardware and lowers performance, with the much higher investment at level design.
Instead an approach would be proposed that extends the scale itself.
Instead of small buildings = multi-screen buildings that span hundreds of char-length and char-height. Much larger distance between content points:instead of all content crammed into "town square/center", the city is itself a huge urban area(a significant part on the map) that has thousands of buildings and objects all over the place. Characters should not feel like giants in a dwarf village: they should feel small.
The most common objection to high buildings is that they take too much screen space and can't bee seen at entirety: this is solved by allowing far zoom levels comparable to "bird-eye view of the city".
Buildings could house instances more realistic to their size if they were larger: not resorting to being portals to underground dungeons, a building that is realistic(a cathedral) should take as much space as real cathedral compared to character body size. Proportions that are realistic create a better sense of immersion: scaling down things to present a uniform "interface" is wrong. In general any object that could be larger in real-life, should be larger. Small objects like rings on the ground, could use a pop-up description for pick-up while retaining almost-invisible size.